Hi , We are using aspose to extract arabic text from pdf files .
The problem is the extracted text looks encrypted , our code :
public String getString() throws Exception {
com.aspose.pdf.Document pdfDocument =null;
String extractedText = “”;
try {
if (inputStream == null) {
pdfDocument = new com.aspose.pdf.Document(this.path);
}
else {
pdfDocument = new com.aspose.pdf.Document(this.inputStream);
}
com.aspose.pdf.TextAbsorber textAbsorber = new com.aspose.pdf.TextAbsorber();
pdfDocument.getPages().accept(textAbsorber);
extractedText = textAbsorber.getText();
}
finally {
pdfDocument.freeMemory();
pdfDocument.dispose();
pdfDocument.close();
pdfDocument=null;
}
return extractedText;
}
Attached the Result of text extraction with sample pdf file.
Could you please assist us to solve this issue .
Thanks in advance.
Hi Feras,
Thanks for
your inquiry. I have tested your scenario with your shared document using
Aspose.Pdf for .NET 10.2.0 and managed to observe the reported issue. For
further investigation, I have logged an issue in our issue tracking system as PDFNEWNET-38416 and also linked your
request to it. We will keep you updated via this thread regarding the issue
status.
Please feel
free to contact us for any further assistance.
<span style=“font-size:10.0pt;line-height:115%;font-family:“Arial”,“sans-serif”;
mso-fareast-font-family:Calibri;mso-ansi-language:EN-US;mso-fareast-language:
EN-US;mso-bidi-language:AR-SA”>Best Regards
Please to note that we are using JAVA platform as we submitted an example of source code.
thanks.
Hi Feras,
Thanks for the acknowledgement.
I have tested the scenario with Aspose.Pdf for Java 10.1.0 and have managed to reproduce same issue that Arabic text is not properly being extracted from PDF file. For the sake of correction, I have logged it in
our issue tracking system as PDFNEWJAVA-34769. We
will investigate this issue in details and will keep you updated on the status
of a correction.
We
apologize for your inconvenience.
PS, As Aspose.Pdf for Java is an autoported version from Aspose.Pdf for .NET, so first the fix will be made in Aspose.Pdf for .NET and then same fix will be ported to Java version.
Hi Feras,
Thanks for your inquiry. I am afraid your reported issue is still not resolved. As we have noticed it recently and It is pending for investigation due to other issues already under investigation and resolution. We will notify you as soon as we made some significant progress towards issue resolution.
We are sorry for the inconvenience caused.
Best Regards,
Dears,
The point PDFNEWJAVA-34769 is very important to us.
based on the priority support we have, can you please share a delivery date?
Regards,
Dear,
The issue is reproduced with EVER TEAM. i beleive we have the purchased the priority support.
can you tell where i can post the problem in order to give it a high priority.
Regards,
we will post in the Priority Support forum.
meanwhile, please inform us if you have new updates.
thanks.
Hi Alain,
Thanks for your patience.
Both the issues are pending for review and I am afraid they are not yet resolved. However once you have raised the issue in Priority support forum, the investigation process will be expedited and then we will be able to share any possible news regarding their resolution.
The issues you have found earlier (filed as PDFNEWJAVA-34769) have been fixed in Aspose.Pdf for Java 10.6.0 .
This message was posted using Notification2Forum from Downloads module by Aspose Notifier.(3)