HTML to PDF conversion runaway memory usage

riftsoft · May 1, 2025, 4:51pm

Using the latest version (25.4) of Aspose.HTML for .NET, some HTML documents that we convert will go into a runaway memory usage. We have isolated the code to a console app and still see the same issue with all available memory (many GB) being used and the conversion never being completed.

Professionalize.Discourse · May 1, 2025, 4:52pm

@riftsoft

It seems you are experiencing runaway memory usage during HTML to PDF conversion using Aspose.HTML for .NET. This issue has been noted by other users as well, particularly when processing multiple files in a single application instance.

Here are some key points to consider:

Memory Management: It has been observed that memory consumption can increase significantly when processing multiple files, and even after calling the Dispose() method, some objects may remain in memory. This can lead to high memory usage until the application is closed.
Garbage Collection: The garbage collector in .NET may not immediately reclaim memory, especially if objects are still in scope. To help with memory cleanup, it can be beneficial to set objects to null after use, which can assist the garbage collector in identifying them for cleanup.
Application Lifecycle: If you are running this conversion in a Windows service, be aware that the memory may not be released until the service is restarted. This can be problematic if you are processing files continuously.
Issue Tracking: The problem of memory consumption during HTML to PDF conversion has been logged in the issue tracking system, and updates will be provided once a resolution is available.

If you continue to face issues, consider isolating the HTML documents that cause the problem and sharing them for further investigation. This can help the support team to replicate the issue and provide a more targeted solution.

For more detailed information, you may want to check the updates related to this issue in the Aspose documentation or forums.

Sources:
[1]: HTML to PDF using Aspose.HTML - taking lot of memory

riftsoft · May 1, 2025, 5:00pm

The issue provided hasn’t had an update since 2022!

I can reproduce the issue with a single file in a console app.

Attached is the source HTML file and the program.cs used to run within a console app.
MemoryIssueExample.zip (7.7 KB)

asad.ali · May 2, 2025, 6:47pm

@riftsoft

Can you please share at which particular line you are facing code freeze? Also, please share some information about your system like RAM size.

riftsoft · May 2, 2025, 8:30pm

Hi,
The line that its failing on is:

renderer.Render(pdfDevice, htmlDocument);

In the full system where this code snippet has been lifted from we have 2GB RAM which will trigger an out of memory error for this document being converted. Normally this will be able to handle 10 documents at a time being converted without any memory issues.

The console app where I have been able to replicate the issue has 16Gb RAM. I haven’t let this run to the point it triggers the error but was it reached 4Gb RAM usage and was still going.

asad.ali · May 3, 2025, 8:08pm

@riftsoft

We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.

Issue ID(s): HTMLNET-6352

You can obtain Paid Support Services if you need support on a priority basis, along with the direct access to our Paid Support management team.

riftsoft · May 19, 2025, 2:25pm

Hi,
We have been digging into the exact cause of this issue to see if we can find a work around whilst the issue is being fixed. We have found that the cause of the issue is related to enforced page breaks. The following HTML will cause the memory issue:

<!DOCTYPE html>
<html lang="en" xmlns="http://www.w3.org/1999/xhtml">
<head>
    <meta charset="utf-8" />
    <style>
        tr {
            page-break-inside: avoid;
        }
    </style>
</head>
<body>
    <table>
        <tbody>
            <tr>
                <td>Page 1</td>
            </tr>
            <tr>
                <td><div style="page-break-after: always"></div>This text causes runaway memory usage!</td>
            </tr>
            <tr>
                <td>Page 2</td>
            </tr>
        </tbody>
    </table>
</body>
</html>

riftsoft · May 19, 2025, 2:39pm

A follow on issue to this is that if the text after the page break DIV is removed, the PDF is generated but the page break has been ignored and the whole table renders on a single page.
The same result can be seen if the page break style is moved to the TD.

This used to work in earlier version 19.12. We also didn’t see the memory issue in that version for the same HTML.

asad.ali · May 19, 2025, 7:44pm

@riftsoft

Thanks for sharing the details from your investigation. We have added the information below the ticket and will include it in our investigation. As soon as the ticket is resolved or we have some updates about its resolution, we will inform you. Please be patient and spare us some time.

We are sorry for the inconvenience.