Extract skipping spaces or adding spaces

ronvanleeuwen · February 22, 2010, 3:30am

I'm saving a text based document to pdf using aspose Words. When I'm extracting the pdf later for re-indexing purposes (functionality in my product) positioning of values as they are in the original text file is different. There are spaces missing in lines or spaces are added to lines. This makes it impossible for me to re-index the document.

This is the code I'm using to extract the pdf document.

PdfExtractor extractor = new PdfExtractor();
//Set Password for input PDF file
extractor.Password = "";
//Bind the input PDF document to extractor
extractor.ExtractTextMode = 1;
extractor.BindPdf(DocName);
//Extract text from the input PDF document
extractor.ExtractText();
//Save the extracted text to a text file
extractor.GetText(stream);
stream.Seek(0, SeekOrigin.Begin);
StreamReader streamr = new StreamReader(stream, true);
string NewIndex = getSearchCriteria(SearchFor, streamr.ReadToEnd())

I'm using Aspose.Pdf.Kit (.NET) version 4.1 and Words version 8.0

Attached:

Original text document (OrdersPerCustomer.txt)
Pdf (634024311133303646OrdersPerCustomer1.Pdf)
Text after extract (PDF_Extract.txt)

Any help on this issue is apriciated.

Ron

shahzadlatif · February 22, 2010, 6:48am

Hi Ron,

I have noticed the same problem at my end and logged it as PDFKITNET-14550 in our issue tracking system. Our team will look into this issue and you’ll be updated via this forum thread once the issue is resolved.

We’re sorry for the inconvenience.
Regards,

aspose.notifier · May 5, 2010, 7:24pm

The issues you have found earlier (filed as 14550) have been fixed in this update.

This message was posted using Notification2Forum from Downloads module by aspose.notifier.

ronvanleeuwen · May 6, 2010, 8:37am

Works nice now.

Please close ticket

Thanks,

Ron