Arabic text is not recognized

mahmoudsaad · March 8, 2011, 2:22am

Hello,

I need to extract arabic text but the output text is not recognized (corrupted)

also the pdf file contains 3 pages but there is only one output txt file named 1.txt

code:

Aspose.Pdf.Kit.PdfExtractor extractor = new Aspose.Pdf.Kit.PdfExtractor();

extractor.Password = “”;

extractor.BindPdf(“test.pdf”);

extractor.ExtractText();

int pageCounter = 1;

while (extractor.HasNextPageText())

{

extractor.GetNextPageText(pageCounter.ToString() + “.txt”);

pageCounter += 1;

}

shahzadlatif · March 8, 2011, 12:09pm

Hi Mahmoud,

As I can see, you’re trying to extract text without a license. I would like to share with you that the text is not completely extracted in the evaluation mode. If you have already purchased a license then please set it before extracting the text. However, if you’re still evaluating then you may get a temporary license for 30 days from this link to test the complete text extraction.

I hope this helps. If you have any further questions, please do let us know.
Regards,

codewarior · May 28, 2011, 4:27pm

Hi,

Thanks for using our products. Can you please share the source PDF document and the code snippet that you are using so that we can test the scenario at our end. We apologize for your inconvenience.

mahmoudsaad · May 29, 2011, 2:46am

Hello,

Code:

Aspose.Pdf.Kit.PdfExtractor extractor = new Aspose.Pdf.Kit.PdfExtractor();

extractor.Password = "";

extractor.BindPdf("book.pdf");

extractor.ExtractText();

int pageCounter = 1;

while (extractor.HasNextPageText())

{

extractor.GetNextPageText(pageCounter.ToString() + ".txt");

pageCounter += 1;

}

codewarior · May 29, 2011, 4:28am

Hello Mahmoud,

Thanks for sharing the resource files.

I have tested the scenario and I am able to reproduce the same problem. For the sake of correction, I have logged it in our issue tracking system as PDFKITNET-27861. We will investigate this issue in details and will keep you updated on the status of a correction. <?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" />

We apologize for your inconvenience.