PdfExtractor encoding issue

nikitaus · August 4, 2010, 11:39am

Hello,

I have a problem with getting text from pdf file. I write text to the file by this function:

pdfExtractor.GetText(%PathToFile%)

In this file all Russian symbols are corrupted. I try to do how described in this post and I have same result.

Thanks,
Nikita.

shahzadlatif · August 5, 2010, 2:26am

Hi Nikita,

Please share the input PDF file with us, so we could test the issue at our end. You’ll be updated with the results accordingly.

We’re sorry for the inconvenience.
Regards,

nikitaus · August 5, 2010, 4:01am

Hi,

In attach sample pdf file. I have a lot of files which must be processed on this week and I need to find solution ASAP. All files created by PDF-XChange program by our customer and I can’t get sources of this files.
I search in google about this problem and I think that this issue cause by Identity-H encoding.

Thanks,
Nikita.

shahzadlatif · August 5, 2010, 5:26pm

Hi Nikita,

I have reproduced this problem at my end and logged it as PDFKITNET-19058 in our issue tracking system. Our team will look into this issue and you’ll be updated via this forum thread once it is resolved.

We’re sorry for the inconvenience.
Regards,

nikitaus · August 9, 2010, 4:46am

Hi,

Can you report approximately time of issue solution?

shahzadlatif · August 9, 2010, 6:35am

Hi Nikita,

As this issue was logged recently, our team still needs to investigate it in detail. I’m afraid, we’re unable to share the ETA at the moment. However, I have asked our development team to share the ETA and you’ll be updated via this forum thread once we get the idea.

We’re sorry for the inconvenience.
Regards,

nikitaus · October 18, 2010, 8:56am

Do have any news about this issue?

shahzadlatif · October 27, 2010, 9:35am

Hi Nikita,

Our team has looked into this issue and I would like to share with you that the software you used to create the sample PDF files used PDFXC30 character collection. This character collection is not standard and we don’t have any information about this encoding. This makes correct text extraction impossible at the moment. You might try some other font for Russian characters to avoid this problem.

I hope this helps. If you have any further questions, please do let us know.
Regards,