Add HTML Fragment to PDF Document using Aspose.PDF for .NET - API is taking long time

HTMLFragmentIssueFiles.zip (13.1 KB)

We are having a serious issue with how long it takes Aspose PDF to create a PDF document for a very long form.

Word takes 20-30 seconds, PDF can take up to 3.5 to 4 minutes depending on the amount of HTML Fragments being created, change to their text state happening and ultimately being inserted.

I set a break point where the text to be converted into a fragment happens and captured 7 instances. In the zip file attached the files are PID95625_1st to 11th. (1-6 and the 11th instance) There are a total of 42 calls to the method we have created.

The HTML files are for viewing convenience. Context sensitive highlighting might make it easier to view them.

The 7th file, which is captured from the 11th breakpoint hit is interesting. Every word is wrapped in span tags with a style attribute that corrects the spacing. (PID95625_11thHTMLFragment.txt) This is no doubt taking a lot of time to construct.

The 11th breakpoint hit was captured because it noticeably took a lot of time for it to hit. That is, 7, 8, 9 and 10 were just a button click but between 10 and 11 it took several seconds before the breakpoint in Visual Studio was hit again. And this file must have a span around single words at least 200 times.

The C# code to make the fragments is GetHTMLFragmentCSharpCode.txt.

Please see the files in the zip to get an idea of what we’re running into.

I’m hoping there’s an Aspose PDF patch for this. My guess, based on the 7th file, is the amount of time taken to correct word/character spacing (kerning) that is taking all the time.

Thanks!

Mike Durthaler

@Ohio_Mike,
We have inserted all your HTML strings into a new PDF with the latest version 17.9 of Aspose.Pdf for .NET API and it takes 11 seconds in our environment. Kindly create a small project application which takes 3.5 to 4 minutes in your environment, and then share a Zip of this project along with source PDF. We will investigate and share our findings with you.

[C#]

Stopwatch watch = new Stopwatch();
watch.Start();
string dataDir = @"C:\Pdf\test375\";
string[] files = { "Test1st.html", "Test2nd.html", "Test3rd.html", "Test4th.html", "Test5th.html",
    "Test6th.html", "Test11th.html" };
// Instantiate Document object
Document document = new Document();
// Add a page to pages collection of PDF file
Aspose.Pdf.Page page = document.Pages.Add();
foreach (string file in files)
{
    string html = File.ReadAllText(dataDir + file);
    page.Paragraphs.Add(GetHtmlFragment(html));
}
document.Save(dataDir + "Output.pdf", Aspose.Pdf.SaveFormat.Pdf);
watch.Stop();
Console.WriteLine(watch.ElapsedMilliseconds);

Time in milliseconds: 11647
This is the output PDF: Output.pdf (102.5 KB)

Imran,

We have to track down which one is taking over 3 minutes to process. Meanwhile I found some other interesting data.

Took your method and wrapped it around every call to the GetHTMLFragment method. I have the text and time for each call. The time taken for each call is not out of the ordinary.

Time is being added on at the document.Save() method when HTML fragments are added to paragraphs. If Text Fragments are added the time is significantly less.

Look at the file PID95625_11thHTMLFragment, where several span tags are wrapping individual words. It might be that processing all the inline styles is causing document.Save() to take so long. This is just one such instance out of 42 calls to the GetHTMLFragment method.

Here is just one part of this file:

<span style="letter-spacing: -0.7pt;"> </span>
pollutants and
<span style="letter-spacing: -0.2pt;"> </span>
has
<span style="letter-spacing: -0.2pt;"> </span>not<span style="letter-spacing: -0.2pt;"> </span>
been
<span style="letter-spacing: -0.2pt;"> </span>
linked
<span style="letter-spacing: -0.05pt;"> </span>
with
<span style="letter-spacing: -0.2pt;"> </span>
any
<span style="letter-spacing: -0.4pt;"> </span>
special
<span style="letter-spacing: -0.2pt;"> </span>
mobile
<span style="letter-spacing: -0.2pt;"> </span>
source
<span style="letter-spacing: -0.2pt;"> </span>
air
<span style="letter-spacing: -0.2pt;"> </span>

I’ve looked in just 2 files out of 42 and found what must be 100 of these inline styles. I can’t imagine how many total there are in all 42 calls.

Tweaking all these spacings must take a significant amount of time when the document is being saved.

Is that possible?

@Ohio_Mike,
Yes, it is possible, but if you think that it is slower in your environment, then we recommend you to create a small use case, so that we could replicate the same scenario in our environment for the investigation purposes. We will take a closer look and let you know about our findings.

I have a solution file all zipped up, with sample inputs. With or without these extra span tags the HTML Fragment portion takes 24 seconds on my machine. Much longer on the server. The entire file generation takes 1:30 local, almost 4 minutes on the server.

I don’t see the file upload icon. Let me know how to send the file to you. I’ll try attaching it to the email I got from you. Let me know if it made it.

The notification email just got bounced back, tried attaching the zip to the reply.

Turns out I can drag and drop it here:

AsposeTestSolution.zip (1.8 MB)

You’ll just need to add the Aspose DLL on your end for the most recent Aspose Total.

@Ohio_Mike,
Please download and try the latest version 17.10 of Aspose.Pdf for .NET API on both local and server machines. The performance of the latest version 17.10 is better than 17.7. However, if this does not help, then kindly share the environment details of the server machine, e.g. Operating System name and edition, local language settings and some handy information which could help us to replicate the same problem in our environment.

Could you send me a link to the DLL for 17.10?

Tried NuGet and downloading a trial copy. NuGet finds 17.10 but nothing changes after installing it. The version remains 17.6.

The free trial version is dated 29 September 2010 and is version 17.9.

Thanks,

Mike

@Ohio_Mike,

Please refer to the download of the latest version 17.10 of Aspose.Pdf for .NET API: Download Aspose.Pdf for .NET API 17.10.

Please contact with our sales team by posting in the Aspose.Purchase forum, they can guide you better on the matter of license expiry.

Thanks, tried the latest version but it is about 6 seconds longer running.

Is there any chance a future patch is in the works to reduce run time?

I just noticed the year 2010 was put in the last post, ‘The free trial version is dated 29 September 2010 and is version 17.9.’ should have read as ‘The free trial version is dated 29 September 2017 and is version 17.9.’ There is no licensing issue, we’re current.

I’m also getting machine specifics for you. Should get back shortly what Server OS, hard drive space, etc. for the environment where the app lives.

Just got back the machine data:

"The site doesn’t have its own server. It’s on 2 servers, which are load balanced behind a VIP. 28 other test sites share this environment. Each server has:

Windows 2008 Standard
One 4 core CPU 2.40 Ghz
4 GB RAM
Plenty of hard drive space

At the moment, one server is 25% cpu utilization and 83% memory and the other is 16% cpu utilization and 75% memory."

@Ohio_Mike,

We have logged an investigation to improve the performance under the ticket ID PDFNET-43589 in our bug tracking system. We have linked your post to this ticket and will keep you informed regarding any available updates.

Can I see This File ?
I need this help

@mrbin3ky,

We do not share private documents of the clients. Kindly let us know which type of the help you required. We will help you as per your requirements.