I'm saving a text based document to pdf using aspose Words. When I'm extracting the pdf later for re-indexing purposes (functionality in my product) positioning of values as they are in the original text file is different. There are spaces missing in lines or spaces are added to lines. This makes it impossible for me to re-index the document.
This is the code I'm using to extract the pdf document.
PdfExtractor extractor = new PdfExtractor();
//Set Password for input PDF file
extractor.Password = "";
//Bind the input PDF document to extractor
extractor.ExtractTextMode = 1;
extractor.BindPdf(DocName);
//Extract text from the input PDF document
extractor.ExtractText();
//Save the extracted text to a text file
extractor.GetText(stream);
stream.Seek(0, SeekOrigin.Begin);
StreamReader streamr = new StreamReader(stream, true);
string NewIndex = getSearchCriteria(SearchFor, streamr.ReadToEnd())
I'm using Aspose.Pdf.Kit (.NET) version 4.1 and Words version 8.0
Attached:
- Original text document (OrdersPerCustomer.txt)
- Pdf (634024311133303646OrdersPerCustomer1.Pdf)
- Text after extract (PDF_Extract.txt)
Any help on this issue is apriciated.
Ron