Convert docx to pdf,Chinese garbled,English normal.
code:
file_name = download_file(filepath)
file_name_without_ext = Path(file_name).stem
file_path = base_dir / "file" / file_name
new_file_name = file_name_without_ext + '.pdf'
new_file_path = base_dir / "file" / new_file_name
doc = aw.Document(str(file_path))
doc.save(str(new_file_path))
@David_Matin
Cause
The issue of Chinese characters appearing garbled when converting a DOCX file to PDF is often related to font embedding. If the fonts used in the DOCX file are not embedded or are not available on the system where the conversion is taking place, it can lead to incorrect rendering of non-Latin characters, such as Chinese.
Solution
To resolve this issue, ensure that the fonts used in your DOCX document are embedded during the conversion process. You can achieve this by setting the appropriate options in the PdfSaveOptions
when saving the document.
Code
Here’s how you can modify your existing code to embed fonts:
import aspose.words as aw
# Load the Word document
doc = aw.Document("input.docx")
# Create PdfSaveOptions and set the font embedding
save_options = aw.saving.PdfSaveOptions()
save_options.embed_full_fonts = True # Ensure all fonts are embedded
# Save the document as PDF
doc.save("output.pdf", save_options)
Next Steps
- Ensure that the fonts used in your DOCX document are compatible with the PDF format.
- Test the conversion again after applying the font embedding option.
- If the issue persists, check if the specific fonts used in the document are installed on your system.
Please implement these changes and let me know if it resolves the issue!
Sources:
[1]: Python Document Conversion - The Complete Guide
@David_Matin Could you please attach the problematic input and output documents here for testing? We will check the issue and provide you more information. Most likely the problem occur because fonts required to render the document are not available in the environment where the document is converted to PDF.
@alexey.noskov It looks like fonts problem.How do I install required fonts?
联行号.docx (35.8 KB)
联行号.pdf (967.3 KB)
@David_Matin According to IWarnignCallback there is a font “宋体-简” that is missing, and I don’t have it either. You can try to find it, or you can use the SimSun font, which replaces the missing font. You can install all fonts for Windows using the “Download fonts for all languages” command, or, for an example, using `sudo apt-get install ttf-mscorefonts-installer on Linux.
simsun.zip (9.3 MB)