Issues with HTML to PDF Conversion Using Aspose.PDF

I am facing significant performance issues when converting HTML to PDF using Aspose.PDF. Specifically, it takes almost 2.5 minutes to load the HTML memory stream into the Aspose.Pdf.Document constructor.

  using System;
  using System.IO;
  using Aspose.Pdf;
  
  public class HtmlToPdfConverter
  {
      public void ConvertHtmlToPdf()
      {
          string htmlFileName = @"C:\Users\test\Downloads\proddefect\Newfolder\htmlSample.txt";
          string outputFileName = @"C:\Users\test\Downloads\proddefect\Newfolder\htmlSampleOutput.pdf";
          string content = System.IO.File.ReadAllText(htmlFileName);
  
          using (MemoryStream htmlStream = new MemoryStream(System.Text.Encoding.UTF8.GetBytes($"<html><pre>{content}</pre></html>")))
          {
              Document pdfDocument = new Document(htmlStream, new HtmlLoadOptions());
              pdfDocument.Save(outputFileName);
          }
      }
  }

When I enable common language exceptions, I encounter numerous System.InvalidOperationException errors in Aspose.PDF.dll.

Error Samples:

  • Exception thrown: ‘System.InvalidOperationException’ in Aspose.PDF.dll
    • Cannot find table ‘GPOS’ in the font file.
  • Exception thrown: ‘System.InvalidOperationException’ in Aspose.PDF.dll
    • Cannot find table ‘GSUB’ in the font file.

Packages Used:

  • Aspose.PDF 24.7 - Aspose.PDF for .NET Framework 4.0 24.8 (DLLs only) (Outdated)
  • Please find the attached packages.config screenshot for the other packages used in this project.
    Screenshot 2024-11-12 160618.png (66.0 KB)

Environment:

  • Target Framework: .NET Framework 4.8

Attached Files:

I have attached the sample HTML file that I used for this conversion.
htmlSample.zip (5.5 KB)

Additional Information:

This was working fine in the previous version of Aspose.PDF. I recently upgraded Aspose.PDF from 21.9 to 24.7, and these errors started appearing. I am not sure if there are any DLL conflicts with the other packages referenced in my project. Any guidance or solutions to improve the performance and resolve the exceptions would be greatly appreciated.

Thank you for your assistance.

@riyasvk

It seems you are experiencing performance issues and exceptions while converting HTML to PDF using Aspose.PDF. Here are some suggestions to help you improve performance and resolve the exceptions you are encountering.

Performance Improvement Tips

  1. Use HtmlLoadOptions: When loading your HTML, consider using HtmlLoadOptions to customize the loading process. This can help optimize how external resources are handled, which may improve performance. For example, you can specify a custom loader for external resources to control how images and other assets are loaded.

    HtmlLoadOptions options = new HtmlLoadOptions();
    options.CustomLoaderOfExternalResources = new LoadOptions.ResourceLoadingStrategy(CustomResourceLoader);
    Document pdfDocument = new Document(htmlStream, options);
    
  2. Reduce HTML Complexity: If possible, simplify the HTML content you are converting. Complex HTML structures with many external resources can slow down the conversion process.

  3. Check for Resource Loading: Ensure that your HTML does not reference too many external resources (like images, CSS, etc.) that need to be loaded during conversion. You can set options to skip loading certain resources if they are not necessary.

  4. Update Aspose.PDF Version: You mentioned using an outdated version of Aspose.PDF (24.7). Consider updating to the latest version, as performance improvements and bug fixes are regularly included in new releases.

Handling Exceptions

The exceptions you are encountering, such as System.InvalidOperationException related to missing font tables (like ‘GPOS’ and ‘GSUB’), suggest that there may be issues with the fonts being used in your HTML. Here are some steps to address this:

  1. Font Availability: Ensure that the fonts used in your HTML are available on the system where the conversion is taking place. If the fonts are not installed, Aspose may not be able to render the text correctly.

  2. Fallback Fonts: You can specify fallback fonts in your HTML or CSS to ensure that if a specific font is not found, a default font will be used instead.

  3. Font Embedding: If you are using custom fonts, consider embedding them directly in your HTML or ensuring they are accessible during the conversion process.

  4. Error Handling: Implement error handling in your code to catch exceptions and log them for further analysis. This can help you identify specific issues with the HTML content or resources.

Example Code

Here’s a modified version of your code that includes HtmlLoadOptions:

using System;
using System.IO;
using Aspose.Pdf;

public class HtmlToPdfConverter
{
    public void ConvertHtmlToPdf()
    {
        string htmlFileName = @"C:\path\to\your\htmlSample.txt";
        string outputFileName = @"C:\path\to\your\htmlSampleOutput.pdf";
        string content = File.ReadAllText(htmlFileName);

        using (MemoryStream htmlStream = new MemoryStream(System.Text.Encoding.UTF8.GetBytes(content)))
        {
            HtmlLoadOptions options = new HtmlLoadOptions();
            // Customize options as needed
            Document pdfDocument = new Document(htmlStream, options);
            pdfDocument.Save(outputFileName);
        }
    }
}

By following these suggestions, you should be able to improve the performance of your HTML to PDF conversion and address the exceptions you are encountering. If issues persist, consider reaching out to Aspose support for further assistance.

The shared file does not have any font references or external images; it is purely a text file. This was working fine in the previous version of Aspose.PDF. I recently upgraded Aspose.PDF from 21.9 to 24.7, and these errors started appearing.

@riyasvk

We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.

Issue ID(s): PDFNET-58624

You can obtain Paid Support Services if you need support on a priority basis, along with the direct access to our Paid Support management team.

@asad.ali

Upon thorough analysis, I found that the issue exists not only in the standalone package but also in the NuGet version of Aspose.PDF. This problem is unrelated to my project’s dependencies. I created a console application using only the Aspose.PDF NuGet package and its dependencies, and encountered significant performance issues. It consistently takes more than 20 minutes to convert the same file mentioned in the thread above. I tested Aspose.PDF versions 24.7 and 24.9, both yielding the same result. However, when I tried an older version, Aspose.PDF 21.9, it took around 30 seconds to process the same file

@riyasvk

Thanks for sharing your investigation results and additional information. We will consider it during ticket investigation and as soon as we have some results or new about its resolution, we will inform you via this forum thread. Please be patient and spare us some time.

We are sorry for the inconvenience.