Loading PDF file gets stuck when HTMLloadOptions is passed

Hi,
I am facing issue opening some of the pdf files with the below code.The code stucks when htmlLoadOptions is passed but works file when only source is passed.The files are of small size within 2MB .Could some one suggest on this?

HtmlLoadOptions htmlLoadOptions = new HtmlLoadOptions();

htmlLoadOptions.PageInfo.Margin.Top = 10;
htmlLoadOptions.PageInfo.Margin.Bottom = 10;

Document document = new Document(source, htmlLoadOptions))

@pkometineni

Thank you for contacting support.

We would like to update you that you may pass an instance of HtmlLoadOptions class to Document constructor while loading a HTML file as explained in Convert HTML to PDF. So you do not need to pass such an argument while loading a PDF document.

We hope this will be helpful. Please feel free to contact us if you need any further assistance.

Hi,
i have a similar issue using the Java API.

Document doc = new Document(new FileInputStream(new File(“foo.html”)), new HtmlLoadOptions());

In the HTML there is only a <h1>Hi</h1>, NOTHING else, but it keeps stucking in the Document Constructor, not giving any Exception or anything. It just hangs until there is a OutOfMemoryException.

I am using aspose-pdf in version 20.2.
PLEASE tell me what i am doing wrong. Thanks alot

@imdi

Would you kindly try using only file path instead of FileInputStream and if issue still persists, please share your sample HTML file with us. We will test the scenario in our environment and address it accordingly.

I tried using only the Files path, it still shows the same behaviour. I tested some different files:
A freshly created .html file - containing a single character - needs about 1 min to process, while consuming about 40% of my RAM (ca 4GB).
File: t.zip

Using other HTML files, size 86kb and a little more complex structure (which i am not allowed to share) leads the java process to consume all of my memory until it runs out. Machine has 12GB.
I wonder because the ressources it takes seem not to be appropriate.

@imdi

We tested the scenario in our environment with Aspose.PDF for Java 20.2 and your file. The total time taken by the API was 20 seconds. Would you kindly share the complete environment details i.e. OS Name and version, Application Type, JDK Version with us. Also, please try to share a sample HTML file (may be you can share a complex HTML file with dummy information so that we can test it in our environment to replicate the performance issue). We will further proceed to assist you accordingly.

Hi,

I have a similar issue using the .net API.

While trying to convert an HTML file to PDF with HTMLLoadOptions, new Aspose.Pdf.Document(sourceHtmlFullPath, options); gets stuck and new Aspose threads are being created until I get OutOfMemory exception.

This is the code that I am using:

Aspose.Pdf.HtmlLoadOptions options = new Aspose.Pdf.HtmlLoadOptions();
Aspose.Pdf.Document pdfDocument = new Aspose.Pdf.Document(sourceHtmlFullPath, options);
pdfDocument.Save(destPdfFullPath);

I am using the latest version of Aspose.PDF, licensed.
Attached the source HTML named “ch15lev1sec5.html” that causes the bug.

Thanks!

I ch15lev1sec5.zip (2.4 KB)

@bar678

We were able to reproduce the issue in our environment and logged it as PDFNET-50133 in our issue tracking system. We will further look into its details and let you know as soon as it is resolved. Please be patient and spare us some time.

We are sorry for the inconvenience.

It seems like this issue is not resolved as of yet.
I can still reproduce it with the latest version (25.8)
Any updates on the ticket PDFNET-50133?

Thanks in advance

@zoltankrekus

Regretfully, the ticket could not get resolved yet. However, the earlier logged ticket was document specific. Would you please share your sample file as well so that we can test it and generate a dedicated ticket for it in our system as well?

Here you go :slight_smile:
mailbody_memoryissue_original.zip (5,8 KB)

@zoltankrekus

We tested with 25.8 version and below code snippet in our environment. We were not able to replicate any issues. A sample output has also been attached for your kind reference.

string html = File.ReadAllText(dataDir + "mailbody_memoryissue_original.html");

var byteArray = Encoding.UTF8.GetBytes(html);

// Create HTML load options with appropriate settings
var htmlOptions = new HtmlLoadOptions
{
    IsEmbedFonts = false,
    IsRenderToSinglePage = true
};

using (var htmlStream = new MemoryStream(byteArray))
using (var pdfDocument = new Document(htmlStream, htmlOptions))
    pdfDocument.Save(dataDir + "output.pdf");

output.pdf (1.7 MB)

I’m able to reproduce the issue with the following code snippet:

                const string fileName = "mailbody_memoryissue_original.html";
                string mbContent = File.ReadAllText($"D:\\data\\testdata\\mailcapture\\holidu\\{fileName}");

                Document mailbodyDocument;
                HtmlLoadOptions htmlLoadOptions = new HtmlLoadOptions();

                htmlLoadOptions.PageInfo.Margin.Top = 20;
                htmlLoadOptions.PageInfo.Margin.Bottom = 10; //this line causes the memory issue. setting the margin to e.g. 20 fixes it
                htmlLoadOptions.PageInfo.Margin.Left = 30;
                htmlLoadOptions.PageInfo.Margin.Right = 30;
                
                byte[] contentBytes = Encoding.UTF8.GetBytes(mbContent);
                using (MemoryStream ms = new MemoryStream(contentBytes))
                {
                    mailbodyDocument = new Document(ms, htmlLoadOptions);
                }

@pkometineni

We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.

Issue ID(s): PDFNET-60559

You can obtain Paid Support Services if you need support on a priority basis, along with the direct access to our Paid Support management team.