Hello,
I have a problem with getting text from pdf file. I write text to the file by this function:
pdfExtractor.GetText(%PathToFile%)
In this file all Russian symbols are corrupted. I try to do how described in this post and I have same result.
Thanks,
Nikita.
Hi Nikita,
Please share the input PDF file with us, so we could test the issue at our end. You’ll be updated with the results accordingly.
We’re sorry for the inconvenience.
Regards,
Hi,
In attach sample pdf file. I have a lot of files which must be processed on this week and I need to find solution ASAP. All files created by PDF-XChange program by our customer and I can’t get sources of this files.
I search in google about this problem and I think that this issue cause by Identity-H encoding.
Thanks,
Nikita.
Hi Nikita,
I have reproduced this problem at my end and logged it as PDFKITNET-19058 in our issue tracking system. Our team will look into this issue and you’ll be updated via this forum thread once it is resolved.
We’re sorry for the inconvenience.
Regards,
Hi,
Can you report approximately time of issue solution?
Hi Nikita,
As this issue was logged recently, our team still needs to investigate it in detail. I’m afraid, we’re unable to share the ETA at the moment. However, I have asked our development team to share the ETA and you’ll be updated via this forum thread once we get the idea.
We’re sorry for the inconvenience.
Regards,
Do have any news about this issue?
Hi Nikita,
Our team has looked into this issue and I would like to share with you that the software you used to create the sample PDF files used PDFXC30 character collection. This character collection is not standard and we don’t have any information about this encoding. This makes correct text extraction impossible at the moment. You might try some other font for Russian characters to avoid this problem.
I hope this helps. If you have any further questions, please do let us know.
Regards,