Aspose.PDF for Java - Quick HTML to PDF question

Hi all,

We have a need to convert HTML files to PDFs using Aspose.PDF for Java v19.6. It’s working alright so far, but I’ve noticed some significant slowdown when loading pages where external resources (e.g. images, CSS, JS, etc) are required. Such pages may take minutes to load, while others take seconds (conversion is extremely quick either way). I’ve also determined that our use cases typically won’t require any external resources, so it’s alright not to load them.

My question is, does Aspose.PDF provide a way to disable external resource loading for HTML documents? I tried adding a CustomLoaderOfExternalResources to my HtmlLoadingOptions, but didn’t see much difference. I’ll add the code below.

Any other advice on getting Aspose.PDF to load HTML documents more quickly would also be greatly appreciated.

Thanks in advance!

Code:

    HtmlLoadOptions loadOptions = new HtmlLoadOptions();
    loadOptions.CustomLoaderOfExternalResources = new LoadOptions.ResourceLoadingStrategy() 
    {
        @Override
        public LoadOptions.ResourceLoadingResult invoke(String s) {
            log.info("No resources for you");
            return null;
        }
    };
    File input = new File("some_file");
    document = new Document(input.getAbsolutePath(), loadOptions);
    log.info("Loaded");
    String outputFile = nameAsPdf(input.getName());
    document.save(outputFile);

@BenBruno54

Would you please share comparison documents as ZIP files so that we may investigate the delay or slowdown that you are facing and then assist you accordingly. Before sharing requested data, please ensure using Aspose.PDF for Java 19.6.

Hi,

As mentioned in the first sentence of my previous post, we are using Aspose.PDF 19.6. Thanks for the reminder!

I’ve attached the worst offender below. To be clear, the issue is that calling new Document("attached_file", htmlLoadOptions) takes ~3 and a half minutes to complete. I’d like advice on speeding that process up, even at risk of not loading resources. Conversion is super fast after the file is loaded!

aspose_html.htm.zip (13.5 KB)

@BenBruno54

Thank you for sharing requested data.

We have been able to notice the problem and have logged a ticket with ID PDFJAVA-38677 in our issue management for further investigations and resolution. We will let you know as soon as some significant updates will be available in this regard.

Thanks! I’ll wait for any good news.

If I can add something to the investigation - It looks like HTML to PDF generation results aren’t consistent between runs.

I took a shot at using the code on this github page to speed up PDF generation. The example indicates that a ResourceLoadingStrategy’s empty result should be an empty byte array inside a ResourceLoadingResult instead of a null object.

After I tried that pages approach, I saw three(!) different outcomes for my input file:

  1. The HTML document would convert correctly. Nothing went wrong.
  2. Aspose throws an EndOfStreamException with the error message: “Attempted to read past the end of the stream.”
  3. Aspose provides no indications of errors and never completes the conversion.

Admittedly, outcomes 1 and 3 were already happening - I just thought outcome 3 was a problem with my machine. It wasn’t until I saw outcome 2 occasionally popping up that I realized that the same exact code and test files were producing different results.

@BenBruno54

Thank you for sharing further findings with us.

We have recorded your comments under the same ticket and these will certainly be helpful during investigations. We will let you know once any further update will be available in this regard.