Hi, back on 28-05-2013 (PDFNEWNET-35382) I posted an issue with extracting text from PDF’s returning garbage when using the Aspose.Pdf.Text.TextAbsorber.
Attached is the example PDF and output.
We are using Aspose.PDF 9.5.0.0 with an Aspose.Total licence.
Test environments have been on Windows 8.1 and Server 2012.
We have had a client waiting on a solution for far to long now, can you please provide an update for a resolution.
Regards,
Bryant.
Hi Bryant,
I was able to resolve the issue today with much investigation, pass this onto your developers.
The PDF’s being scanned had encoded FontTypes that were obviously decoding as glyphs rather than text.
Not sure if you would be able to translate the glyph without using OCR.
My solution, As I was generating the initial PDF via a Windows Print Driver generated PostScript file was to change the printer driver settings “PostScript Output Option” to “Optimise for Portability” rather than the default “Optimise for Speed”.
Changing this setting ensures TrueType fonts are used and Aspose can decode the text.
Cheers.
Hi Bryant,