Using aspose PDF to convert to doc, the text appears garbled.
Document pdfDocument = new Document(_dataDir + "PDFToDOC.pdf");
// Save the file into MS document format
pdfDocument.save(_dataDir + "PDFToDOC_out.doc", SaveFormat.Doc);
Please check the attached output DOCX that was generated in our environment using 23.7 version of the API and valid license. Below is the code snippet that we used:
Document doc = new Document(dataDir + "涨乐财富通密码重置业务操作指引(1).pdf");
DocSaveOptions saveOption = new DocSaveOptions();
saveOption.setMode(DocSaveOptions.RecognitionMode.Flow);
saveOption.setFormat(DocSaveOptions.DocFormat.DocX);
saveOption.setAddReturnToLineEnd(false);
saveOption.setCloseResponse(false);
doc.save(dataDir + "涨乐财富通密码重置业务操作指引(1).docx", saveOption);
We did not notice any garbled text in the output. However, there were some formatting issues in the table at the end of the document. Can you please try using the latest version and let us know in case you still notice any issues.
I downloaded the docx file you uploaded, and the text in it is obviously faulty and garbled, please, isn’t that what you’re seeing, I’ve taken a screenshot of it for you Dingtalk_20230803135510.jpg (52.7 KB)
I used this way to extract the text and found that I couldn’t get the content at all。
com.aspose.pdf.Document pdfDocument = new com.aspose.pdf.Document(“ExtractBoldText.pdf”);
com.aspose.pdf.TextFragmentAbsorber textAbsorber = new com.aspose.pdf.TextFragmentAbsorber();
pdfDocument.getPages().accept(textAbsorber);
for (TextFragment textFragment:textAbsorber.getTextFragments())
{
System.out.println(textFragment.getText());
}
I don’t know what’s so special about this document. Help me.
The issue looks related to the missing fonts. It seems you do not have Windows Fonts installed in your system. Please install all Fonts that support this language characters. In case issue still persists, please let us know.
I looked at the output document you uploaded and it seems like there is a problem as well, maybe you are not getting normal output either, can you open your docx document and take a screenshot for me?
I put the ttf file in the fonts directory but it still doesn’t work, the screenshot you posted looks normal, I don’t know how to do it now, are there any other ideas? fangsong.zip (5.7 MB)
I opened the xml file of this document and found that it should be RHMBTW+FangSong this kind of font, what is this kind of font and how to install it? font.jpg (227.4 KB)
It is nice to know that your issue has been resolved. Please keep using our API and feel free to let us know by creating a new topic in case you need further assistance.