Bug Report: PDF Text extraction takes several minutes- with 100% CPU

lingway · October 13, 2015, 11:13am

We use Aspose for text extraction purposes only, on Java.

On some of our machines, the text extraction for a small document takes several minutes, with 100% CPU and locking other threads, whereas it is very fast on others. The reason is simple: Aspose.pdf looks for font directories in a given list. The list is the following:

“%WINDIR%/Fonts/”,

“/usr/openwin/lib/X11/fonts/TrueType/”,

“/usr/local/share/fonts/”,

“$home/.fonts/”,

“/usr/share/fonts/truetype/”,

“/usr/X11R6/lib/X11/fonts/ttfonts/”,

“/Library/Fonts/”,

“~/Library/Fonts/”,

“/Network/Library/Fonts/”,

“/System/Library/Fonts/”,

“~/.fonts/”,

“/usr/share/fonts/”,

“/usr/share/X11/fonts/TTF/”,

“/system/fonts/”

But, if none of these directories exist (this is disturb-dependent), then the fallback becomes “/” ! As a result, one thread scans the full hard-drive, locking all the other…

This results in a several minutes 100% CPU activity, but everything locked.

The workaround is simple, create an empty “.fonts” directory in the home dir of the user executing the application. But I clearly think this should be considered as a bug !

tilal.ahmad · October 14, 2015, 2:49am

Hi Claire,

Thanks for sharing your findings. We have logged a ticket PDFNEWJAVA-35202 in our issue tracking system for investigation and resolution. We will keep you updated about the issue resolution progress in this forum thread.

We are sorry for the inconvenience caused.

Best Regards,

aspose.notifier · April 17, 2016, 2:20am

The issues you have found earlier (filed as PDFNEWJAVA-35202) have been fixed in Aspose.Pdf for Java 11.4.0.

This message was posted using Notification2Forum from Downloads module by Aspose Notifier.