Characters corruption when converting PDF to PNG

arithmer · September 14, 2020, 9:14am

Hello Aspose,

I am trying to convert PDF files into PNG images on the environment Docker + Linux (Ubuntu 18.04.5 LTS).
But characters in the original PDF files become corrupted when converted into PNG.
I seems that this is due to lack of effective font setteing on my script.
Could you teach me how to resolve this issue?

I put the script which I wrote below.
As for the input file, it may take much time to prepare for you because of a certain reason.

String filepath = args[0];

String filename = Path.GetFileName(args[0]);

// Make directories

Directory.CreateDirectory(args[1] + "/" + filename);

String txt_outpath = args[1] + "/" + filename + "/" + "txt";

String png_outpath = args[1] + "/" + filename + "/" + "png";

Directory.CreateDirectory(txt_outpath);

Directory.CreateDirectory(png_outpath);

// Open document

Aspose.Pdf.Document pdfDocument = new Aspose.Pdf.Document(filepath);

Aspose.Pdf.Text.FolderFontSource source = new Aspose.Pdf.Text.FolderFontSource("../data/Fonts_windows");

pdfDocument.Save(args[1] + "/" + filename + "/"+ "result.pdf");

// Create Aspose.Pdf.RenderingOptions to enable font hinting

RenderingOptions opts = new RenderingOptions();

opts.UseFontHinting = true;

// Get PNG image from each page of the original PDF file

for (int pageCount = 1; pageCount <= pdfDocument.Pages.Count; pageCount++)

{

    using (FileStream imageStream = new FileStream(png_outpath +"/"+ $"SplitDocumentPageByPageOut_{pageCount}.png", FileMode.Create))

    {

        // Create PNG device with specified attributes

        // Width, Height, Resolution, Quality

        // Quality [0-100], 100 is Maximum

        // Create Resolution object

        Resolution resolution = new Resolution(300);

        PngDevice pngDevice = new PngDevice(resolution);

        //// Set predefined rendering options

        pngDevice.RenderingOptions = opts;

        // Convert a particular page and save the image to stream

        pngDevice.Process(pdfDocument.Pages[pageCount], imageStream);

        // Close stream

        imageStream.Close();

    }

    // Create TextAbsorber object to extract text

    TextAbsorber absorber = new TextAbsorber();

    // Accept the absorber for first page

    pdfDocument.Pages[pageCount].Accept(absorber);

    // Get the extracted text

    string extractedText = absorber.Text;

    // Create a writer and open the file

    TextWriter tw = new StreamWriter(txt_outpath +"/"+ $"SplitDocumentPageByPageOut_{pageCount}.txt");

    // Write a line of text to the file

    tw.WriteLine(extractedText);

    // Close the stream

    tw.Close();

}

asad.ali · September 14, 2020, 7:40pm

@shun1985

Would you please try installing all windows fonts in your docker environment. You can also copy/paste those fonts in Linux system. Please also make sure that libgdiplus package is installed and updated. In case issue still persists, please share your sample input PDF file and generated PNG. We will test the scenario in our environment and address it accordingly.

arithmer · September 15, 2020, 8:25am

2-8-82-2-120615.pdf (577.6 KB)
@asad.ali
Thank you for your reply.
I copied and pasted all windows fonts, and then made sure that libgdiplus package was installed and updated. However the scenario still arises.
So I attached the PDF file and generated PNG.

arithmer · September 15, 2020, 8:27am

@asad.ali
Here are generated PNG.
Because of volume limit, I included only three png files.
generated_png.zip (69.8 KB)

asad.ali · September 16, 2020, 3:27pm

@shun1985

We tested the scenario in CentOS 7 with Aspose.PDF for .NET 20.9 and were able to notice the issue. Therefore, we have logged it as PDFNET-48776 in our issue tracking system. We will further look into reasons behind this issue and keep you informed with its rectification status. Please be patient and spare us some time.

We are sorry for the inconvenience.

arithmer · September 17, 2020, 2:42am

@asad.ali

I appreciate that you will investigate further and rectify this scenario.
I am looking forward to hearing from you.