We are facing issue while converting a pdf having arabic text to word format. It is converting the doc and format is also maintained but some text is breaking and some is overlapping.
Can you please help with the same?
We are using aspose pdf for java.
Can you please share your sample PDF document for our reference so that we can test the scenario in our environment and address it accordingly?
@asad.ali Thanks for the quick response.
Attached is the pdf file that we are converting and for word file screenshots areScreenshot (86).png (37.8 KB)
Screenshot (87).png (64.3 KB)
attached since I am not able to upload doc file in here.135a.pdf (847.1 KB)
Would you please try and open the attached DOCX file generated in our environment using Aspose.PDF for .NET 22.3 and the below code snippet:
Document pdfDocument = new Document(dataDir + @"135a.pdf"); DocSaveOptions saveOptions = new DocSaveOptions(); saveOptions.Format = DocSaveOptions.DocFormat.DocX; saveOptions.Mode = DocSaveOptions.RecognitionMode.Flow; pdfDocument.Save(dataDir + @"output_flow.docx", saveOptions);
output_flow.docx (244.2 KB)
Please share the screenshots of any anomalies if you notice in the shared file.
@asad.ali looks better than the previous one but still lot of distortion is there.
Also the English dates have exchanged their places in a sentence but when I produced doc file using your code (corresponding java code actually), date issue is resolved but distortion is still there.
Please let know if we could connect over phone call in order to resolve it quickly.
Screenshot (89).png (16.6 KB)
We have logged an issue as PDFJAVA-41511 in our issue tracking system for further investigation. We will look into its details and let you know as soon as the issue is fixed.
Please note that the issues are resolved on first come first serve basis in free support model unlike the paid support where issues have high priority and are resolved on urgent basis. You can please create a post in paid support forum if you have subscription in order to expedite the resolution process. Furthermore, we encourage providing support via our dedicated forums and you can freely share your concerns here.
We apologize for the inconvenience.
@asad.ali any update on the above issue? Also other than this there is one more issue. The converted word file has textboxes around the texts. So when more text is added later, it keeps that in same box and once box size is exceeded, it overflows and we lose the text.
So please let know if there is a workaround so that we dont get those boxes around text.
In order to prevent the text box layout, please use the RecognitionMode.Flow or EnhancedFlow during conversion as shown in the above-shared code snippet. In case you still notice any issues, please share the respective files for our reference so that we can test the scenario in our environment and address it accordingly.