Formatting to PDF from HTML takes forever and timing out in our application

Hi Support Team,


We are using version Aspose 9.1

We have tried Aspose.Pdf and Aspose.Words to convert our HTML to PDF.
But, it takes forever and times out in our application. We had the time out set for over 40 minutes to review but it was still processing and timed out.
Out Html contains tables, images and quite a lot of formatting. Attached sample html to give you an idea on the format and, which we were trying to convert to PDF.

Also, is there any known issues with HTML tags which might cause issue when converting/ formatting to PDF?


Thanks,

Hi Vennila,

I was unable to reproduce this issue at my end using the latest versions. Latest version of Aspose.Words for .NET takes less than a second to convert this HTML to PDF. Following was my code.

Document html = new Document(“sample.htm”);

html.Save(“Sample.pdf”);<?xml:namespace prefix = “o” ns = “urn:schemas-microsoft-com:office:office” /><o:p></o:p>


Can you please try with the latest versions and share your application with us to reproduce the issue if you still see this issue?

Best Regards,

Hi Vennila,


Thanks for contacting support.

Adding more to Ijaz’s comments, I have tested the scenario using Aspose.Pdf for .NET 10.0.0 in Visual Studio 2010 application with target platform as .NET Framework 4.0 and as per my observations, the conversion is being performed in 30 Seconds. For your reference, I have also attached the resultant file generated over my end.

Can you please try using the latest release and in case you still face the same issue, please share some details regarding your working environment.

[C#]

//open
input HTML file
<o:p></o:p>

Document document = new Document("c:/pdftest/sample.txt", new HtmlLoadOptions());

//save updated document

document.Save(“c:/pdftest/sample_HTML_output.pdf”);

Hi Nayyar,


Thanks for the reply.
We updated PDF component 10.0 version. But same issue…

We are basically using the following code. So we have embedded images with the html that I send you earlier.
We are creating Aspose.Pdf.HtmlLoadOptions object and passing path to our embedded images to PdfDocument as highlited below. If we take the following code and try exporting it keeps on executing. For example, it been 30 mins but still executing.
Attaching zip folder with images, and you can use the same html sample I had sent earlier.

Also,
If I use below htmlLoadOptions without any embedded images it will take only 5 seconds to export.

// Aspose.Pdf.HtmlLoadOptions htmlLoadOptions = new Aspose.Pdf.HtmlLoadOptions();

But If I use, the following it takes forever - and this is our requirement to be able have images exported as well wherever it is in the html content to PDF

Aspose.Pdf.HtmlLoadOptions htmlLoadOptions = new Aspose.Pdf.HtmlLoadOptions(embeddedImageLocation);

----

private void button1_Click(Object sender, EventArgs e)
{
PrepareAsposeLicense();
int count = 0;
byte[] data = null;
string embeddedImageLocation = “D:\Development\Temp”;
string[] memoImages = null;
Dictionary<int, string> embeddedImages = new Dictionary<int, string>();
embeddedImages.Add(243236, “clip_image002.jpg”);
embeddedImages.Add(243231, “clip_image002.jpg”);
embeddedImages.Add(243232, “clip_image002.jpg”);
embeddedImages.Add(239938, “clip_image001.jpg”);
embeddedImages.Add(239939, “clip_image002.jpg”);
embeddedImages.Add(239940, “clip_image003.jpg”);
embeddedImages.Add(239941, “clip_image002.jpg”);
embeddedImages.Add(239870, “clip_image001.jpg”);
embeddedImages.Add(239871, “clip_image002.jpg”);
embeddedImages.Add(239872, “clip_image002.jpg”);
embeddedImages.Add(239873, “clip_image006.jpg”);
embeddedImages.Add(239874, “clip_image008.jpg”);
embeddedImages.Add(239875, “clip_image010.jpg”);
embeddedImages.Add(239876, “clip_image012.jpg”);
embeddedImages.Add(239877, “clip_image002.jpg”);
embeddedImages.Add(239878, “clip_image004.jpg”);
embeddedImages.Add(239879, “clip_image006.jpg”);
embeddedImages.Add(239880, “clip_image008.jpg”);
embeddedImages.Add(232711, “clip_image001.jpg”);
HtmlAgilityPack.HtmlDocument htmlDocument;
memoImages = embeddedImages.Values.ToArray();
var wmfImages = memoImages.Where(image => image.EndsWith(".wmf", StringComparison.OrdinalIgnoreCase)).ToList();
if (wmfImages.Count > 0)
{
foreach (string imageName in wmfImages)
{
try
{
File.Delete(Path.Combine(embeddedImageLocation, imageName));
}
catch { }
}
}

Aspose.Pdf.HtmlLoadOptions htmlLoadOptions = new Aspose.Pdf.HtmlLoadOptions(embeddedImageLocation);
htmlLoadOptions.UseNewConversionEngine = true;
string requestHtml = System.IO.File.ReadAllText(“c:/pdftest/sample_1.html”);
using (MemoryStream htmlStream = new MemoryStream(Encoding.UTF8.GetBytes(requestHtml.ToString())))
using (Aspose.Pdf.Document pdfDocument = new Aspose.Pdf.Document(htmlStream, htmlLoadOptions))
using (MemoryStream pdfStream = new MemoryStream())
{
pdfDocument.PageInfo.Margin = new Aspose.Pdf.MarginInfo(10, 10, 10, 10);
for (int pageIndex = 1; pageIndex < pdfDocument.Pages.Count + 1; pageIndex++)
{
pdfDocument.Pages[pageIndex].SetPageSize(900f, 775f);
}
pdfDocument.Save(pdfStream);
pdfStream.Flush();
data = pdfStream.ToArray();
if (pdfDocument.Pages.Count > 0)
count = pdfDocument.Pages.Count;
}

System.IO.BinaryWriter writter =null;
using (writter = new System.IO.BinaryWriter(File.Open(“c:/pdftest/sample2_HTML_output.pdf”,FileMode.Create)))
{
writter.Write(data);
writter.Flush();
writter.Close();
}
data = null;
}
}
}





Hi Vennila,


Thanks for sharing the code snippet.

I have tried executing the code snippet but I am getting an error over two code lines. Please take a look over attached image file.

Can you please double check the scenario in your environment and share some sample project.

Hi Nayyer,


Please review attached solution.Copy pasting the full code here as well:

using Aspose.Pdf;
using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.IO;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Windows.Forms;
using System.Windows.Controls;
namespace WindowsFormsApplication1
{
public partial class Form1 : Form
{
public Form1()
{
InitializeComponent();
}

private void PrepareAsposeLicense()
{
Aspose.Pdf.License asposePdfLic = new Aspose.Pdf.License();
asposePdfLic.SetLicense(System.Reflection.Assembly.GetExecutingAssembly().GetManifestResourceStream(“WindowsFormsApplication1.Aspose.Total.lic”));

}

private void button1_Click(Object sender, EventArgs e)
{
PrepareAsposeLicense();
int count = 0;
byte[] data = null;
string embeddedImageLocation = “D:\Development\HSSilverlight2012\HelpSTAR.Web\Config\HSTemp”;
string[] memoImages = null;
Dictionary<int, string> embeddedImages = new Dictionary<int, string>();
embeddedImages.Add(243236, “clip_image002.jpg”);
embeddedImages.Add(243231, “clip_image002.jpg”);
embeddedImages.Add(243232, “clip_image002.jpg”);
embeddedImages.Add(239938, “clip_image001.jpg”);
embeddedImages.Add(239939, “clip_image002.jpg”);
embeddedImages.Add(239940, “clip_image003.jpg”);
embeddedImages.Add(239941, “clip_image002.jpg”);
embeddedImages.Add(239870, “clip_image001.jpg”);
embeddedImages.Add(239871, “clip_image002.jpg”);
embeddedImages.Add(239872, “clip_image002.jpg”);
embeddedImages.Add(239873, “clip_image006.jpg”);
embeddedImages.Add(239874, “clip_image008.jpg”);
embeddedImages.Add(239875, “clip_image010.jpg”);
embeddedImages.Add(239876, “clip_image012.jpg”);
embeddedImages.Add(239877, “clip_image002.jpg”);
embeddedImages.Add(239878, “clip_image004.jpg”);
embeddedImages.Add(239879, “clip_image006.jpg”);
embeddedImages.Add(239880, “clip_image008.jpg”);
embeddedImages.Add(232711, “clip_image001.jpg”);
HtmlAgilityPack.HtmlDocument htmlDocument;
memoImages = embeddedImages.Values.ToArray();
var wmfImages = memoImages.Where(image => image.EndsWith(“.wmf”, StringComparison.OrdinalIgnoreCase)).ToList();
if (wmfImages.Count > 0)
{
foreach (string imageName in wmfImages)
{
try
{
File.Delete(Path.Combine(embeddedImageLocation, imageName));
}
catch { }
}
}

// Aspose.Pdf.HtmlLoadOptions htmlLoadOptions = new Aspose.Pdf.HtmlLoadOptions();
Aspose.Pdf.HtmlLoadOptions htmlLoadOptions = new Aspose.Pdf.HtmlLoadOptions(embeddedImageLocation);
htmlLoadOptions.UseNewConversionEngine = true;
string requestHtml = System.IO.File.ReadAllText(“c:/pdftest/sample_1.html”);
using (MemoryStream htmlStream = new MemoryStream(Encoding.UTF8.GetBytes(requestHtml.ToString())))
using (Aspose.Pdf.Document pdfDocument = new Aspose.Pdf.Document(htmlStream, htmlLoadOptions))
using (MemoryStream pdfStream = new MemoryStream())
{
pdfDocument.PageInfo.Margin = new Aspose.Pdf.MarginInfo(10, 10, 10, 10);
for (int pageIndex = 1; pageIndex < pdfDocument.Pages.Count + 1; pageIndex++)
{
pdfDocument.Pages[pageIndex].SetPageSize(900f, 775f);
}
pdfDocument.Save(pdfStream);
pdfStream.Flush();
data = pdfStream.ToArray();
if (pdfDocument.Pages.Count > 0)
count = pdfDocument.Pages.Count;
}

System.IO.BinaryWriter writter =null;
using (writter = new System.IO.BinaryWriter(File.Open(“c:/pdftest/sample2_HTML_output.pdf”,FileMode.Create)))
{
writter.Write(data);
writter.Flush();
writter.Close();
}
data = null;
}
}
}

Hi Vennila,


Thanks for sharing the details.

<span style=“font-size:10.0pt;font-family:“Arial”,“sans-serif””>I
have tested the scenario and I am able to reproduce the same problem. For the
sake of correction, I have logged it in our issue tracking system as PDFNEWNET-38088. We will
investigate this issue in details and will keep you updated on the status of a
correction. <o:p></o:p>

We apologize for your inconvenience.

Hi Nayyer ,


Thanks very much for the update & for logging the ticket.
Just for the idea - when you think we can have the fix?

Thanks,

Hi Nayyer,


Thanks for logging the ticket for resolving this.
But I would like to inform you that our clients are facing this issue in the production version of our application. And it’s very critical for them.
So, please kindly expedite the fixing of this issue & provide us with an updated version ASAP. And please advise us with the ETA for the fix.

And in the meantime, can you please review & advise if there is a workaround fix for handling this issue.

Thanks & Regards,
Vennila

Vennila:
Thanks very much for the update & for logging the ticket.
Just for the idea - when you think we can have the fix?
Hi Vennila,

As we recently have been able to notice this problem, so its still pending for review and until or unless we have investigated and have figured out
the actual reasons of this problem, we might not be able to share any timelines
by which this problem will be resolved.

However, as soon as we have made some significant progress towards the resolution of this issue, we would be more than happy to update you with the status of correction. Please be patient and spare us little time. Your patience and comprehension is greatly appreciated in this regard.

Vennila:
Thanks for logging the ticket for resolving this.
But I would like to inform you that our clients are facing this issue in the production version of our application. And it’s very critical for them.
So, please kindly expedite the fixing of this issue & provide us with an updated version ASAP. And please advise us with the ETA for the fix.

And in the meantime, can you please review & advise if there is a workaround fix for handling this issue.
Hi Vennila,

We do understand the critticallity and urgency of this problem but as a normal rule of practice, issues
are resolved in first come and first serve basis; but the problems
logged/reported under Enterprise or Priority support model, have high
precedence in terms of resolution, as compare to issues under normal/free
support model.

But we will try our level best to get this problem resolved as quickly as possible.

Hi there,

I appreciate your consideration in resolving this problem as quickly as possible.
Please kindly keep it in the same priority and keep us updated on the status (with keeping this thread open).

Thanks & Regards,
Vennila

Hi Vennila,


Sure. As soon as the problem is resolved, we will update you within this forum thread.

Hi Nayyer,

We encountered another issue with exporting HTML to PDF. On exporting - it is not exporting all the content.

We have an HTML coming in from external source to our application, in this scenario - it is coming in from Sharepoint to our application..

This html has some smarttags and other tags in there, have attached the problem html for your review.

In future, as well there might be other external sources from where html could be coming in. These kind of of scenarios may come up again what would be the best way to handle so that export to pdf can work seamlessly?

Thanks,

Mansee

Vennila:
We encountered another issue
with exporting HTML to PDF. On exporting

- it is not exporting all the content.

We have an HTML coming in
from external source to our application, in this scenario - it is coming in
from Sharepoint to our application…
<o:p></o:p>

This html has some smarttags
and other tags in there, have attached
the problem html for your review.

In future, as well there might be other external sources from where html could be coming in. These kind of of scenarios may come up again what would be the best way to handle so that export to pdf can work seamlessly

Hi Mansee,

I have tested the scenario and I am able to reproduce the same problem. For the sake of correction, I have separately logged it in our issue tracking system as PDFNEWNET-38122. We will investigate this issue in details and will keep you updated on the status of a correction.

We apologize for your inconvenience.

Hi Nayyer,


Thanks for the update and for logging the ticket.

Thanks,

Hi Mansee,


As soon as we have some progress towards the resolution of earlier reported issues, we will update you within this forum thread.

Hi Nayyer,


We saw that there are newer version for Aspose.PDF for .NET
We went through the list of the fixed bugs in version 10.3, 10.4 and 10.5. It looks like the following two issues have not been fixed yet.

PDFNEWNET-38088
PDFNEWNET-38122.

Can you please update us on the progress towards the resolution of the above issues?

Thanks,
Vennila.


Hi Vennila,


Thanks for your patience.

The above stated issues are still pending for review and I am afraid they are not yet resolved. However I have intimated the product team to have a look over these issues and share the possible ETA. As soon as we have some further updates, we will let you know.
Hi Nayyer,
There were two new versions (10.7.0, 10.8.0) of Aspose.PDF for .NET released from our last email.

I have gone through the list of bugs that were fixed and I could not find the below stated request number.

PDFNEWNET-38088
PDFNEWNET-38122.

Can you please update us on the progress.