Formatting to PDF from HTML takes forever and timing out in our application

Hi Vennila,


Thanks for your patience.

I am afraid the earlier reported issues are still pending for review and are not yet resolved. However I have intimated the team to share any possible ETA and as soon as we have some updates, we will let you know.

Hi Vennila,


Thanks for your patience.

We are pleased to share that the issue PDFNEWNET-38122 reported earlier is resolved and its resolution will be included in upcoming release of Aspose.Pdf for .NET 11.1.0 which is planned to release in few days. However the team is working on fixing other issue reported as PDFNEWNET-38088 and as soon as we have some further updates, we will let you know.

The issues you have found earlier (filed as PDFNEWNET-38122;PDFNEWNET-38088) have been fixed in Aspose.Pdf for .NET 11.1.0.


This message was posted using Notification2Forum from Downloads module by Aspose Notifier.

HI Nayyer,

Thanks for providing the fixes .I have used the latest Aspose.Pdf for .NET 11.1.0 and I was able to reproduce PDFNEWNET-38122. I am attaching the sample text and html that we are trying to export . Can you please provide a way to handle it.

I am attaching the plan text in the .txt file and html in .rtf file

Thanks,

Hi Vennila,


Thanks for the acknowledgement.

I have tested the scenario using latest release of Aspose.Pdf for .NET 11.1.0 and I am unable to notice any issue. For your reference, I have also attached the output generated over my end.

[C#]

Aspose.Pdf.Document
doc = new Document(“c:/pdftest/HTML+code+of+memos+with+missing+content+when+exported+to+PDF.html”,
new HtmlLoadOptions());<o:p></o:p>

doc.Save(@“c:/pdftest/HTML+code+of+memos+with+missing+content+when+exported+to+PDF.pdf”);

Hey,
I am attaching the sample project in which I have used the latest aspose DLL. Can you please review and update the possible fix.

Thanks

Hi Vennila,


Thanks for sharing the resource files.

I have tested the scenario and have observed the same contents missing issue when using your shared project. However I have observed that you are using legacy Aspose.Pdf.Generator approach to convert HTML documents to PDF format, whereas when using new Document Object Model of Aspose.Pdf namespace, no such issue occurs. Please try using the code snippet shared in my earlier post 678014 and in case you still face any issue or you have any further query, please feel free to contact.
Hi Nayyer,
When I was using the latest Aspose.PDF DLL (11.1.0). I have faced following issues when exporting to PDF.

1. I was able to export the attached HTML to PDF. But, there was some content missing (I have attached a screen shot Capture.PNG and the HTML text).
2.I have an image that needs to be exported to PDF. Does not export to PDF ( I have attached the image)

Can you please review and update.

Thanks

Hi Vennila,


Thanks for your inquiry We are looking into the issue and will update you soon.

Best Regards,

Hi Vennila,


Thanks for your inquiry. I have tested the scenario with Aspose.Pdf for .NET 11.1.0 using new DOM approach for HTML to PDF conversion and unable to notice the issue. Please use following code snippet, it will resolve the issue.


Document doc = new
Document(“712086±+PDF+not+showing+digits.html”,
new HtmlLoadOptions());<o:p></o:p>

doc.Save("HTMLtoPDf.PDF");


Please feel free to contact us for any further assistance.


Best Regards,

Hi Sana,
I will try this approach as provided. Can you please review the second reported issue about the image (Not able to export.png).

Thanks

Hi Vennila,


Thanks for your feedback. I am afraid I am unable to notice any PNG image reference in your sample HTML. I will appreciate it if you please share some more details of the issue, so we will guide you accordingly.

P.S: In new DOM appraoch of HTML to PDF, if we need to use some external resources(images/fonts/css) in HTML then we must pass the resource path as parameter to HtmlLoadOptions().

We are sorry for the inconvenience caused.

Best Regards,


Hi Ahmad ,
I was refereeing to the PNG image that was attached on January 14th ,2016 ( Not able to export.PNG). I am attaching the same PNG to this reply.
Can you please review , I was not able to export to PDF.

Thanks

Hi Vennila,


Thanks for your feedback. I have tested the image to PDF conversion using new DOM approach and unable to notice the issue, please find attached output PDF.

However it there is difference in your requirement and my understanding then please share some more details and your sample code here, we will look into it and guide you accordingly.

We are sorry for the inconvenience caused.

Best Regards,

Hi Ahmad,
We have tried the approach you have provided and it works only for this case of the image converting to PDF.
We need to convert HTML pages with Images in to PDF.
We have been trying this approach till the previous version and it worked fine for some cases with some exceptions.

Aspose.Pdf.Generator.Pdf pdfDocument = new Aspose.Pdf.Generator.Pdf();

pdfDocument.HtmlInfo.BadHtmlHandlingStrategy = Aspose.Pdf.Generator.BadHtmlHandlingStrategy.TreatAsPlainText;

pdfDocument.HtmlInfo.ShowUnknownHtmlTagsAsText = true;

pdfDocument.HtmlInfo.CharSet = "UTF-8";

pdfDocument.IsLandscape = true;

pdfDocument.IsPageNumberForDocument = true;

pdfDocument.IsFontNotFoundExceptionThrown = false;

pdfDocument.IsImageNotFoundErrorIgnored = true;

pdfDocument.IsAutoFontAdjusted = true;

pdfDocument.PageNumberFormat = Aspose.Pdf.Generator.PageNumberFormatType.EnglishLower;

pdfDocument.PageSetup.Margin = new Aspose.Pdf.Generator.MarginInfo() { Left = 30, Top = 30, Right = 10, Bottom = 10 };

pdfDocument.Author = "ServicePRO";

pdfDocument.Subject = "Request Details";

pdfDocument.Title = string.Format("{0} - {1}", returnValue.Id, returnValue.Name);

try

{

pdfDocument.BindHTML(requestHtml.ToString(), embeddedImageLocation); // this is the location for images

using (MemoryStream pdfStream = new MemoryStream())

{

pdfDocument.Save(pdfStream);

pdfStream.Flush();

returnValue.Data = pdfStream.ToArray();

if (pdfDocument.PageCount > 0)

returnValue.Tag = pdfDocument.PageCount;

}

}

we have the latest Aspose.PDF.dll and we are trying to export the below attached html page with image src to PDF. It does not work with the code you have provided.


// instantiate Pdf object
Aspose.Pdf.Generator.Pdf pdf = new Pdf();
// specify the Character encoding for for HTML file
pdf.HtmlInfo.CharSet = "UTF-8";
pdf.HtmlInfo.CharsetApplyingLevelOfForce = HtmlInfo.CharsetApplyingForceLevel.UseWhenImpossibleDetectFromContent;
// load the HTML file to Stream object
using (Stream htmlAsStream = System.IO.File.OpenRead(@"C:\\Users\\bram\\Desktop\\Waste\\HTML.html"))
{
// bind the source HTML
pdf.BindHTML(htmlAsStream, "D:"\\Config\\HSTemp");
}
pdf.Save(@"C:\\Waste\\HTMLhtml.pdf");

I have attached the html and image with the output.

we need to use above code like BindHTML() to provide the basepath of image. Please provide with a way to convert any html pages with images to PDF using BindHTML().

Thanks

Hi Vennila,


Thanks for your inquiry. Please note as suggested above it is recommended to use new generator(Aspose.Pdf) instead old generator(Aspoe.Pdf.Generator), as old generator is obsolete. Please use new DOM approach for HTML to PDF conversion, it is more efficient and improved.

HtmlLoadOptions options = new HtmlLoadOptions(“E:/data/”);<o:p></o:p>

Document doc = new Document("E:/data/HTML_image.html", options);

doc.Save("E:/data/HtmltoPDFDOM.pdf");

Please feel free to contact us for any further assistance.


Best Regards,

Hi,
I was using Aspose.Pdf.Generator.Pdf to convert HTML to PDF, as you have mentioned this is no longer supported.

I am using the code that you have provided (as you see below)for converting HTML with images in to PDF.

HtmlLoadOptions options = new HtmlLoadOptions("E:/data/");
Document doc = new Document("E:/data/HTML_image.html", options);
doc.Save("E:/data/HtmltoPDFDOM.pdf");

I need the following properties in the new implementation:

pdfDocument.HtmlInfo.ShowUnknownHtmlTagsAsText = true;
pdfDocument.HtmlInfo.CharSet = "UTF-8";
pdfDocument.IsLandscape = true;
pdfDocument.IsPageNumberForDocument = true;
pdfDocument.IsFontNotFoundExceptionThrown = false;
pdfDocument.IsImageNotFoundErrorIgnored = true;
pdfDocument.IsAutoFontAdjusted = true;
pdfDocument.PageNumberFormat =
Aspose.Pdf.Generator.PageNumberFormatType.EnglishLower;

Can you please review and update accordingly

Thanks

Hi Vennila,

When using new DOM, CharacterSet, Font and related properties are automatically handled. However concerning to other requirements, please visit

Hi,
Thanks for the links. But, this is my need.

Can you please provide code snippets for these -:

1. I need to have a page number in each and every page that is exported.
2. I need to have the page auto size or is there a way to set the page size so the content gets wrapped in side the page like A4 size.
3.When I set the Update Page Dimensions content gets cut-off. Please see the attached PDF which was exported using the sample you have provided.
4.We are using the below seen way to export to PDF. Can you please review if this approach will work in all cases like for HTML with embedded images and also with special HTML Tags also I am using UTF8 encoding (will this cause any issues) .

byte[] byteArray = Encoding.UTF8.GetBytes(requestHtml.ToString));

MemoryStream stream = new MemoryStream(byteArray);
Document doc = new Document(stream, options);
doc.Info.Title = string.Format("{0} - {1}", returnValue.Id, returnValue.Name);
doc.Info.Subject = "Request Details";
doc.Info.Author = "SP";
doc.PageInfo.Margin = new MarginInfo { Left = 30, Top = 30, Right = 10, Bottom = 10 };
PageNumberStamp pageNumberStamp = new PageNumberStamp();
pageNumberStamp.Format = "Page # of " + doc.Pages.Count;
pageNumberStamp.StartingNumber = 1;
doc.Pages[doc.Pages.Count].AddStamp(pageNumberStamp);

try
{
using (MemoryStream pdfStream = new MemoryStream())
{
doc.Save(pdfStream);
pdfStream.Flush();
returnValue.Data = pdfStream.ToArray();

}
}

Thanks

Hi Vennila,


Thanks for your inquriy. While converting HTML to PDF, you can use PageInfo property of HtmlLoadOptions object to set Page margin and dimensions. For page number stamp yo can add footer in PDF document, please check following code snippet for details.

Furthermore, please note for embedded resources the sample code will work. However if external resources are used in source HTML then you need to pass external resources path to HtmlLoadOptions object as parameter.

HtmlLoadOptions options = new HtmlLoadOptions();<o:p></o:p>

options.PageInfo.Margin = new Aspose.Pdf.MarginInfo { Left = 40, Right = 40, Top = 30, Bottom = 20 };

options.PageInfo.Width = Aspose.Pdf.PageSize.A4.Width;

options.PageInfo.Height = Aspose.Pdf.PageSize.A4.Height;

// Instantiate Document object

Document doc = new Document("test.html", options);

MemoryStream ms = new MemoryStream();

doc.Save(ms);

doc = new Document(ms);

Aspose.Pdf.HeaderFooter header = new Aspose.Pdf.HeaderFooter();

Aspose.Pdf.HeaderFooter footer = new Aspose.Pdf.HeaderFooter();

FileStream fs = new FileStream(myDir + "logo.png", FileMode.Open, FileAccess.Read);

Aspose.Pdf.Image image1 = new Aspose.Pdf.Image();

image1.FixWidth = 50;

image1.FixHeight = 50;

//Add the image into paragraphs collection of the section

header.Paragraphs.Add(image1);

//Set the ImageStream to a MemoryStream object

image1.ImageStream = fs;

//Add footer text

Aspose.Pdf.Text.TextFragment fTxt = new Aspose.Pdf.Text.TextFragment("$p / $P ");

fTxt.TextState.Font = FontRepository.FindFont("Arial");

fTxt.TextState.FontSize = 16;

fTxt.HorizontalAlignment = Aspose.Pdf.HorizontalAlignment.Right;

footer.Paragraphs.Add(fTxt);

foreach (Aspose.Pdf.Page page in doc.Pages)

{

page.Header = header;

page.Footer = footer;

}

// Save PDF file

doc.Save("htmltopdf.pdf");

Please feel free to contact us for any further assistance.


Best Regards,