HTML to PDF conversion fails when document is about page long

wost · November 16, 2023, 8:51am

Hi! When converting the HTML file I attached, using the .NET Aspose.HTML package, the memory just keeps going higher until everything eventually crashes. I tried it in your conversion tool and it keeps converting for a few minutes and then it fails there too.
image.png (3.6 KB)

I noticed the problem occurs when the document is around a page long - if you remove or add a line, it goes through immediately. I have the same problem when trying to convert HTML documents that are about a page long using the .NET Aspose.PDF package too. Replacing <p> </p> with just empty <p></p> gives some results eventually but the conversion takes a lot of time, and all the newlines are lost. I hope you can quickly look into it as I’m having a lot of conversion errors in production caused by documents that are about page size long.
TEST_HTML.zip (915 Bytes)

asad.ali · November 16, 2023, 7:07pm

@wost

We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.

Issue ID(s): HTMLNET-5125

You can obtain Paid Support Services if you need support on a priority basis, along with the direct access to our Paid Support management team.

wost · November 17, 2023, 10:38am

Thank you for the information. I assume that if it’s an “internal tracking system” then there’s no way I can follow progress on this one? All I can do is go through the release notes and see if my ID is there?

Additionally, as I mentioned before, a similar issue occurs when I use the Aspose.PDF package. Is the PDF package using the HTML package under the hood for HTML document conversion so fixing it in HTML, would also fix it in PDF?

asad.ali · November 17, 2023, 11:01pm

@wost

You may not be able to track the progress yourself. But, we will keep you updated in this forum thread with the progress of issue resolution. You can check the issue status as well at the bottom of this thread. As soon as the issue is resolved, we will send a notification in here as well.

For Aspose.PDF, can you please share what code snippet are you using? It is separate API and we will investigate the issue from its perspective as well.

aspose.notifier · December 5, 2023, 9:04pm

The issues you have found earlier (filed as HTMLNET-5125) have been fixed in this update. This message was posted using Bugs notification tool by avpavlysh

nordev · December 11, 2023, 11:38am

Hi! We tested the latest release of Aspose.HTML (23.12.0) to convert a HTML-file to PDF using the

Converter.ConvertHTML(document, saveOptions, streamProvider);

function, where document is a Aspose.Html.HTMLDocument, options are a new Aspose.Html.Saving.PdfSaveOptions and streamprovider a basic interface implementation.

What we observe is that the memory usage skyrockets in very short time. We also tested the online converter tool provided on the aspose website, but it breaks too, not providing any output.
Below you can find the html used in this case to reproduce. Hopefully you can find out of this issue.

Best regards!

Memory usage. 6.6GB after 30sec
image.png (9.0 KB)

From what I can see in memory is that there is a ton of List objects allocated. Hope that helps narrowing it down.
image.png (17.4 KB)

Code to reproduce

using Aspose.Html;
using Aspose.Html.Converters;
using Aspose.Html.Saving;

using var document = new HTMLDocument("<Path to file with HTML below>");
var options = new PdfSaveOptions();
Converter.ConvertHTML(document, options, "<Path to output file>");

HTML code used to reproduce issue

<html>
  <head>
    <style>
      .container {
        display: flex;
      }
    </style>
  </head>
  <body>    
      <div class="container">
        <div class="slip-rule">
          <div class="slip-rule--content">
            <p>(DATE)</p>
          </div>
        </div>
        <div class="content">
          <div class="content-wrapper">
            <p>TEST</p>            
            <p>SAMPLESAMPLESAMPLESAMPLESAMPLESAMPLESAMPLESAMPLESAMPLE</p>
            <div style="page-break-after: always">
              <span style="display: none">&nbsp;</span>
            </div>
            <p>SAMPLESAMPLESAMPLESAMPLESAMPLESAMPLESAMPLESAMPLESAMPLE</p>
          </div>
        </div>
      </div>    
  </body>
</html>

asad.ali · December 11, 2023, 7:08pm

@nordev

We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.

Issue ID(s): HTMLNET-5190

You can obtain Paid Support Services if you need support on a priority basis, along with the direct access to our Paid Support management team.

aspose.notifier · January 31, 2025, 6:14pm

The issues you have found earlier (filed as HTMLNET-5190) have been fixed in this update. This message was posted using Bugs notification tool by avpavlysh