PDF to HTML image quality and rendering issues

Hi,
I am new to Aspose and am evaluating Aspose.PDF .Net to convert PDF files into HTML5.

I am using the following code -

        Document doc = new Document(@"c:\Headlines Week 45 2018.pdf");
        HtmlSaveOptions saveOptions = new HtmlSaveOptions();
        saveOptions.FixedLayout = true;
        saveOptions.SplitIntoPages = false;
        saveOptions.PartsEmbeddingMode = HtmlSaveOptions.PartsEmbeddingModes.EmbedAllIntoHtml;
        saveOptions.RasterImagesSavingMode = HtmlSaveOptions.RasterImagesSavingModes.AsEmbeddedPartsOfPngPageBackground;
        doc.Save(@"c:\Headlines Week 45.html", saveOptions);
        return;

I have the following questions/queries -

  1. I have noticed that the rendered output is smaller than the original, is there a scaling option?
  2. The converted images seem to be of lower quality that the original. The originals were JPG, I see that they are converted as PNG. Can the quality be improved?
  3. Some of the rendering alignment is out, I’d post a copy here but I don’t know how.

Any help or guidance would be much appreciated.

@AndrewN

Thanks for your inquiry.

Would you please share your sample PDF document with us. We will test the scenario in our environment and address it accordingly. You can please upload your document by using upload button in post editor. Upload_Files.png (8.9 KB)

I have attached the following which show my issues if converted using my sample code posted earlier -

Sample1 - this clearly shows the size difference in the rendered output, scaling the PDF viewer to 100% and comparing the size of the HTML against it, the HTML document is smaller. Also this sample illustrates the image quality difference.

Sample2 - this shows the alignment issues I am experiencing.

Thank you.

<a class=“attachment” href="/uploadSample1.pdf (82.5 KB)
s/discourse_instance3/17989">Sample2.pdf (68.8 KB)

Sample1.pdf (82.5 KB)
Sample2.pdf (68.8 KB)

@AndrewN

Thanks for sharing sample PDF documents.

We have tested the scenario in our environment using Aspose.PDF for .NET 18.8 and will like to share following findings:

Difference in size of content depends upon the application or software which you are using to view the PDF. Different applications shows PDF documents with different scaling and it is not related to the quality of font or images. If you open PDF document (i.e. Sample1.pdf) with chrome, you will notice the content renders with same size.

Please check attached output which was generated on our side using following modified code snippet and image quality was fine in the output HTML.

Aspose.Pdf.Document pdfDocument = new Aspose.Pdf.Document(dataDir + "Sample1.pdf");
HtmlSaveOptions htmlOptions = new HtmlSaveOptions();
htmlOptions.FontSavingMode = HtmlSaveOptions.FontSavingModes.AlwaysSaveAsTTF;
htmlOptions.LettersPositioningMethod = HtmlSaveOptions.LettersPositioningMethods.UsePixelUnitsInCssLetterSpacingForIE;
htmlOptions.SplitIntoPages = false;
pdfDocument.Save(dataDir + "Sample1.html", htmlOptions);

Sample1.zip (403.4 KB)

We were unable to notice any alignment issue in the generated output at our side using same code snippet. Would you please share some screenshot displaying the issue that you are facing. We will again test the scenario in our environment and address it accordingly.

Sample2.zip (295.3 KB)

Hi,

Thank you for taking the time to look at this for me.

The amendments to the code for Sample1 has solved my issues, thank you.

However, the sample output you have posted here for Sample2 does illustrate my alignment issues. Have a look at my attached image, this shows a small extract from the original PDF (left) and from the HTML output (right).
You should see that the text alignment within the gridlines is out.

I hope you can see my issue?mis-alignment.png (20.0 KB)

@AndrewN

Thanks for your feedback.

We have again checked the HTML (i.e. Sample2.html) in our environment and were unable to notice the alignment issue. Please check attached screenshot. Alignment.png (127.8 KB) Please note that earlier shared Sample2.zip was generated using the same code snippet shared in our reply. Would you please confirm if you have checked the same file in your environment.

In case things do not work as expected in your environment, please share a sample console application demonstrating the same issue in any environment. We will again test the scenario in our environment and address it accordingly.

Hi,

I can see from your screenshot that the alignment is ok from what you can see.

I think I have found what the problem is, I am using IE11 rather than Chrome to view the HTML.

As the end users of this converted document will also be using IE I need it to be cross-browser compatible.

Could you please try opening the documents in IE11?

Many thanks.

@AndrewN

We were able to notice the alignment issue with IE11 and logged an investigation ticket as PDFNET-45276 in our issue tracking system. We will further look into this issue and keep you posted with the status of its correction. Please be patient and spare us little time.

We are sorry for the inconvenience.

Hi,

Could I please get a progress update on issue PDFNET-45276, I don’t have access to view the ticket?

I have approval to purchase the licence, but without this fix, we cannot proceed.

Many thanks.

@AndrewN

Thanks for writing back.

We have recorded your concerns and will definitely consider them while investigating the logged issue. As soon as we have some definite updates in this regard, we will surely let you know. Please spare us little time.

We apologize for the inconvenience.

Hi, I can see that the status of this issue is marked as “Resolved”. However, I cannot see that is has been included in release 18.12.

Can you please let me know when this will be included?

@AndrewN

Thanks for your patience.

Please use following code snippet in order to get desired output:

string inFile = Dir + "Sample2.pdf";
string outHtmlFile = Dir + "PDFNET_45276_out_19.1.html";

Aspose.Pdf.Document pdfDocument = new Aspose.Pdf.Document(TaskDir + "Sample2.pdf");
HtmlSaveOptions htmlOptions = new HtmlSaveOptions();
htmlOptions.FontSavingMode = HtmlSaveOptions.FontSavingModes.AlwaysSaveAsTTF;
htmlOptions.LettersPositioningMethod = HtmlSaveOptions.LettersPositioningMethods.UsePixelUnitsInCssLetterSpacingForIE;
htmlOptions.SplitIntoPages = false;
pdfDocument.Save(outHtmlFile, htmlOptions);

For your kind reference, output HTML is also attached with screenshot of how it was displayed in IE.

PDFNET_45276_out.zip (295.6 KB)
PDFNET_45276_IE11_screenshot.png (143.4 KB)

Please use above code snippet with Aspose.PDF for .NET 18.12 and in case of any further assistance, please feel free to let us know.

Thank you for your update. However, using 18.12 and the code snippet you have provided along with sample2.pdf, the html output viewed in IE11 is still misaligned.

Could you please verify your findings?

@AndrewN

Thanks for writing back.

We apologize for the confusion and misinformation. Your issue has been resolved already however, its fix will be included in Aspose.PDF for .NET 19.1 which will be releasing in next month. As soon as API revision is available for download, you will be able to use earlier shared code snippet and obtain correct output.

The issues you have found earlier (filed as PDFNET-45276) have been fixed in Aspose.PDF for .NET 19.1.