Convert pdf to image Chinese garbled

Convert pdf to jpeg, Chinese garbled,other normal.
image.png (51.7 KB)

code:

    file_name_without_ext = Path(file_name).stem
    document = ap.Document(str(file_path))
    resolution = ap.devices.Resolution(200)
    device = ap.devices.JpegDevice(resolution)
    page_count = 1
    dmc_preview_urls = []
    dmc_util = DmcUtil(dmc_username, dmc_password)
    while page_count <= max_pages and page_count <= len(document.pages):
        new_file_name = file_name_without_ext + '_' +str(page_count) + '_out.jpeg'
        new_file_path = base_dir / "file" / new_file_name
        image_stream = FileIO(str(new_file_path), 'w')
        device.process(document.pages[page_count], image_stream)
        image_stream.close()

@David_Matin

Cause

The issue of Chinese characters appearing garbled when converting a PDF to JPEG is often related to font embedding or the way the PDF handles text encoding. If the fonts used in the PDF are not embedded correctly or are not supported during the conversion process, it can lead to garbled text in the output images.

Solution

To resolve this issue, you can try the following steps:

  1. Ensure Font Embedding: Make sure that the fonts used in the PDF are embedded. You can check this by opening the PDF in a viewer that shows font information.

  2. Use PdfImageToJpegOptions: Instead of using the default JpegDevice, consider using PdfImageToJpegOptions, which allows you to specify options for the conversion, including the resolution and page list.

  3. Set Output Resolution: Increasing the output resolution can sometimes help with rendering issues.

Here’s an example of how to implement these changes:

Code

from aspose.pdf import Document
from aspose.pdf.plugins import PdfImageToJpegOptions

# Load the PDF document
document = Document("path/to/your/document.pdf")

# Create PdfImageToJpegOptions
jpeg_options = PdfImageToJpegOptions()
jpeg_options.OutputResolution = 300  # Set the desired resolution

# Convert each page to JPEG
for page_number in range(1, document.Pages.Count + 1):
    output_file = f"output_page_{page_number}.jpeg"
    with open(output_file, "wb") as image_stream:
        document.Save(image_stream, jpeg_options)

Next Steps

  1. Test the above code to see if it resolves the garbled text issue.
  2. If the problem persists, check the PDF for any font-related issues or try converting the PDF with different settings.
  3. Investigate whether the PDF contains any non-standard fonts that may not be supported.

Please implement these suggestions and let me know if you need further assistance!