Free Support Forum - aspose.com

OCR text issues in converted docs

Using the very latest eval version of words .Net. We ran many tests side by side comparing word 2103 printing to PDF versus ASPOSE and then comparing the 2 resultant PDFs. In many cases we found that the PDF’s background text (the non-visible but searchable text) was wrong (extra characters added or dropped) in the aspose words version. Any suggestions on solutions to this?

Thanks!

Hi Andrew,


Thanks for your inquiry. Could you please attach your input Word document and output PDF file showing the undesired behavior here for testing? We will investigate the issue on our end and provide you more information.

Best regards,

Hi - attached are the files to see if you have any ideas on the problems.

attached are
1. source doc file
2. aspose converted file
3. the file that word converted
4. a comparison (using acrobat pro built in comparison) of the 2 converted files.

problems.
1. aspose has this strange “* MERGEFORMAT” added after some mathtype equations (note that I guess you guys might not have mathtype fonts installed on your test machine … not sure)
2. for equations built using the built in word equation editor - they often don’t render properly. See the equation near the end in the sentence “Therefore the total step height of the SPP …” it is clearly messed up.
3. Not sure if this is an issue or not, but looking at the comparison of the word converted versus the other one, the hidden OCR text behind in the aspose words version runs words together. The selectable text in the document is good though. My publication team is worried that they will need to use the OCR text at some point. Again, I am not sure if this is a real issue or not.

thanks for your help!!
Hi Andrew,

Thanks for the details. We are working over your query and will get back to you soon.

Best regards,
Hi Andrew,

Thanks for your inquiry.

While using the latest version of Aspose.Words i.e. 15.3.0, I managed to reproduce the first two issues on my side. I have logged these issues in our bug tracking system. The IDs of these issues are WORDSNET-11718 and WORDSNET-11719. Your thread has also been linked to the appropriate issues and you will be notified as soon as it is resolved. Sorry for the inconvenience.

Regarding the third issue, we request you to please elaborate your inquiry further by mentioning the problem with the help of screenshot. This will help us to understand your scenario, and we will be in a better position to address your concerns accordingly.

Best regards,

The issues you have found earlier (filed as WORDSNET-11718;WORDSNET-11719) have been fixed in this .NET update and this Java update.


This message was posted using Notification2Forum from Downloads module by aspose.notifier.

Hi guys - thanks again for do the above bug fixes. we did find a few issues on these.

Attached is a sample word file that causes a few remaining problems I’ve found
with the Aspose words . All of the problems I found
have to do with the equations. Alongside the word doc is also attached the PDF from word 2013 converting as well as the aspose words PDF.<o:p></o:p>

The first error in line 1 is the same equation we sent previously: compare the little equation after the letters exp on line 1 - looking at the word 2013 conversion versus the aspose words conversion.

The second error is in Eq. (2). There is a box above the Dq.

The third error is in Eq. (27) on page two. For some reason the equation does not break and wrap to a second line. I saw this problem several times while QC’ing the PDF files.


thanks again guys

Hi Andrew,

Thanks for your inquiry.

While using the latest version of Aspose.Words i.e. 15.5.0, I managed to reproduce these issues on my side. I have logged the following issues in our bug tracking system.

WORDSNET-11969: Unwanted symbols fx() inside dotted box appears in equations in PDF
WORDSNET-11970: A square box appears in equation in PDF
WORDSNET-11971: An equation does not break and wrap to a second line in PDF

Your thread has also been linked to these issues and you will be notified as soon as they are resolved. Sorry for the inconvenience.

Best regards,

The issues you have found earlier (filed as WORDSNET-11969;WORDSNET-11970;WORDSNET-11971) have been fixed in this .NET update and this Java update.


This message was posted using Notification2Forum from Downloads module by aspose.notifier.