HTML To PDF Conversion Takes Over 4 Minutes For Large Documents

Hello,

We generate pdf reports by converting html output to a pdf document using the PDF constructor (it’s the overloaded constructor that takes an input stream and a pdf html load options object). For larger documents the conversion process takes over 4 minutes to perform the conversion.

Can you please advise on optimizations we could apply to make the process more performant?

The following is sample code from the process:
"
PdfHtmlLoadOptions options = new PdfHtmlLoadOptions();
PageInfo pageInfo = new PageInfo();

MarginInfo marginInfo = new MarginInfo();
ProcessPDFPageAndMarginInfo(htmlOutput, pageInfo, marginInfo);
pageInfo.Margin = marginInfo;
options.PageInfo = pageInfo;
PdfDocument pdfDocument = null;
using ( var htmlStream = new MemoryStream(Encoding.UTF8.GetBytes(htmlOutput)) )
{
pdfDocument = new PdfDocument(htmlStream, options);
}
"

We are using Aspose total for c#.

@amos.johnson.dfs

To optimize the performance of HTML to PDF conversion in your application, consider the following strategies:

  1. Use Efficient HTML Structure: Ensure that the HTML content is well-structured and free of unnecessary elements. Complex layouts with excessive nested elements can slow down the conversion process.

  2. Optimize Resource Loading: Utilize the HtmlLoadOptions class to manage how external resources (like images and stylesheets) are loaded. For instance, if your HTML references many images, consider embedding them directly into the HTML or reducing their size to speed up loading times.

  3. Control Page Size and Margins: Adjust the PageInfo and MarginInfo settings to minimize the amount of content that needs to be processed. Smaller pages with less content can lead to faster conversion times.

  4. Disable Unused Features: If your application does not require certain features (like font embedding or media queries), disable them in the HtmlLoadOptions to reduce processing overhead.

  5. Profile and Benchmark: Use profiling tools to identify bottlenecks in your conversion process. This can help you understand which parts of your HTML or conversion settings are causing delays.

Here’s a refined version of your sample code that incorporates some of these optimizations:

PdfHtmlLoadOptions options = new PdfHtmlLoadOptions
{
    // Disable font embedding if not needed
    IsEmbedFonts = false,
    // Manage external resource loading
    CustomLoaderOfExternalResources = new CustomResourceLoader()
};

PageInfo pageInfo = new PageInfo
{
    // Set appropriate page size and margins
    Margin = new MarginInfo { Top = 10, Bottom = 10, Left = 10, Right = 10 }
};

using (var htmlStream = new MemoryStream(Encoding.UTF8.GetBytes(htmlOutput)))
{
    PdfDocument pdfDocument = new PdfDocument(htmlStream, options);
}

By implementing these strategies, you should see an improvement in the conversion time for larger documents. If the performance issues persist, consider breaking down the HTML content into smaller sections and converting them individually, if feasible.

For more detailed information on optimizing HTML to PDF conversion, you can refer to the Aspose documentation on this topic here.

Sources:
[1]: Convert HTML to PDF in .NET - Aspose Documentation

Okay thank you for a quick response! We will work on trying these suggestions and respond accordingly.

@amos.johnson.dfs

Please feel free to share your sample HTML with us in .zip format in case you still face any issues. We will test the scenario in our environment and address it accordingly.

REVENUEDETAILREPORT_2022 (1).zip (1.5 MB)

Attached is a zip file of the sample HTML report. Could you please proceed with testing it?

@amos.johnson.dfs

We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.

Issue ID(s): PDFNET-57956

You can obtain Paid Support Services if you need support on a priority basis, along with the direct access to our Paid Support management team.

@amos.johnson.dfs

Would you please confirm the following?

  1. What namespace is the PdfDocument class from?
  2. What does the ProcessPDFPageAndMarginInfo(htmlOutput, pageInfo, marginInfo) method do?