Help using com.aspose.pdf.kit.PdfExtractor

yanovskishai · September 8, 2009, 11:25am

Hello !

I'm interested in to use ASPOSE's pdf kit for converting PDF articles into plain text.

I don't mind to giving up the layout, but it's very important for me to keep to get the content, including the Greek letters (mu, Alfa..) that can be found at scientific articles.

I've tried to use Aspose's sample code, but the result was very distorted - many words just disappeared / lost letters..

Original (test1.pdf) and Result (test2.txt) files attached.

Please help me finding out if I'm doing something wrong.

Thanks in advance,

Shai Yanovski

shahzadlatif · September 9, 2009, 12:41pm

Hi Shai,

Thank you very much for considering Aspose.

First of all, I would like to inform you that when you extract text using an evaluation version it adds some random text and replaces some letters as well. In order to extract text properly you’ll need a license file. I have tested your requirement at my end and the results were better than the output you shared.

Secondly, when you extract text using Aspose.Pdf.Kit, you not only lose the formatting of the text, but also the text is just in raw format i.e. it might not be available to you as it is shown on the PDF in columns.

If you think that this satisfies your requirement then we would really encourage you to consider Aspose.Pdf.Kit. In order to evaluate the component without limitations you can also get a temporary license as well.

Regarding your concern for the Greek letters, please share a particular case with us and we’ll test it at our end to make sure that you get the required results.

If you have any further questions please do let us know.
Regards,