Html to PDF issue with bookmark and style

We have a html file with embedded css file need to be convert to PDF. (see attached the HTML and css)


I am using Aspose.PDF version 11.2 with below code to convert HTML to PDF:

Dim dataDir As String = Path.GetFullPath(“…/…/…/Data/”)
Dim filename As String = “data.html”
Dim fileOutPDF As String = “data.pdf”

If File.Exists(dataDir & fileOutPDF) Then
File.Delete(dataDir & fileOutPDF)
End If

Dim Licence As Aspose.Pdf.License = New Aspose.Pdf.License
Licence.SetLicense(“Aspose.Total.lic”)

Dim basePath As String = dataDir
Dim htmloptions As New HtmlLoadOptions(basePath)

’ Load HTML file
Dim pdfDocument As New Document(dataDir & filename, htmloptions)
pdfDocument.PageInfo.Width = 597.6
pdfDocument.PageInfo.Height = 842.4

'get page collection
Dim pageCollection As PageCollection = pdfDocument.Pages
'get particular page
For Each pdfPage As Page In pageCollection
pdfPage.SetPageSize(597.6, 842.4)
Next

'Optimize the pdf file in order to decrease it’s size
Dim optimization As Aspose.Pdf.Document.OptimizationOptions = New Aspose.Pdf.Document.OptimizationOptions()
optimization.LinkDuplcateStreams = True
optimization.RemoveUnusedObjects = True
optimization.RemoveUnusedStreams = True
optimization.CompressImages = True
'optimization.ImageQuality = 10
pdfDocument.OptimizeResources(optimization)

’ Save HTML file
pdfDocument.Save(dataDir & fileOutPDF, SaveFormat.Pdf)



The pdf is generated but with below issues:
1) It took about 4 to 10 minutes to convert files with sizes above 600k (in our case we have 14 files and it took almost 2 hours)
2) We have some bookmarked inside html file which works properly in browser but not working in PDF :


Then I created another function using StreamReader instead of HTMlLoadOption which is a lot faster (less than a minute) but got below issues:
1) Bookmarked are not working
2) content wrapped in a div (with table format) are rendered vertically instead of horizontal

Dim dataDir As String = Path.GetFullPath(“…/…/…/Data/”)
Dim filename As String = “data.html”
Dim fileOutPDF As String = “data.pdf”

If File.Exists(dataDir & fileOutPDF) Then
File.Delete(dataDir & fileOutPDF)
End If

Dim Licence As Aspose.Pdf.License = New Aspose.Pdf.License
Licence.SetLicense(“Aspose.Total.lic”)
’ The path to the documents directory.

Dim startTime As DateTime = DateTime.Now

’ Instantiate an object PDF class
Dim pdf As New Aspose.Pdf.Generator.Pdf()

’ add the section to PDF document sections collection
Dim section As Aspose.Pdf.Generator.Section = pdf.Sections.Add()

’ Read the contents of HTML file into StreamReader object
Dim r As StreamReader = File.OpenText(dataDir & filename)

'Create text paragraphs containing HTML text
Dim html As [String] = r.ReadToEnd()
Dim text2 As New Aspose.Pdf.Generator.Text(section, html)

’ enable the property to display HTML contents within their own formatting
text2.IsHtmlTagSupported = True
text2.IfHtmlTagSupportedOverwriteHtmlFontNames = True
text2.IfHtmlTagSupportedOverwriteHtmlFontSizes = True


'Add the text paragraphs containing HTML text to the section
section.Paragraphs.Add(text2)
section.PageInfo.PageWidth = Aspose.Pdf.Generator.PageSize.A4Width
section.PageInfo.PageHeight = Aspose.Pdf.Generator.PageSize.A4Height


’ Specify the URL which serves as images database
'pdf.HtmlInfo.ImgUrl = dataDir & “Images”
pdf.HtmlInfo.ExternalResourcesBasePath = dataDir

'Save the pdf document
pdf.Save(dataDir & fileOutPDF)
r.Close()

What is the solution? how can a html file be converted to PDF without loosing any of it style or functionality (like bookmark).

Hi David,


Thanks for your inquriy. I have tested the scenario using following code snippet using Aspoe.Pdf for .NET 11.3.0 and it is taking almost 14 seconds and bookmark urls are working fine in resultant PDF. If you want to set Page dimensions in HTML to PDF conversion then you need to use PageInfo property of HtmlLoadOptions object.

Furthermore, please note Aspose.Pdf uses system memory for processing instead temporary files on disk, so performance depends upon the system resources and size/contents of the files. Please share your sample problematic source files, we will look into these and will guide you accordingly.

Dim options As New HtmlLoadOptions(“C:\Files\Files”)<o:p></o:p>

options.PageInfo.Width = 597.6

options.PageInfo.Height = 842.4

' Open document

Dim pdfDocument As New Document("C:\Files\Files\Data.html", options)

Dim optimization As New Aspose.Pdf.Document.OptimizationOptions()

optimization.LinkDuplcateStreams = True

optimization.RemoveUnusedObjects = True

optimization.RemoveUnusedStreams = True

optimization.CompressImages = True

'optimization.ImageQuality = 10

pdfDocument.OptimizeResources(optimization)

pdfDocument.Save("C:\Files\Files\HelloWorld1.pdf")


Best Regards,

Thanks for your reply. I am going to send you the sample file shortly.


Our server environment is setup with proper memory and CPU;therfore, we don’t have any resource issue and it should be much faster.

In meantime what will be the difference between version 11.2 and 11.3?

Hi


I checked the attached pdf file that you sent and the bookmarks are not working!

I added my html,css and generated PDF file and you can see the bookmarks are not working.

Please see attached image which shows what happens when I clicked on the bookmark’s link.


The attached html file is not big but when I tried to convert the real html file with size over 600k it took 4 minutes to be converted.


Hi David,

ontarioservices:
Thanks for your reply. I am going to send you the sample file shortly.

Our server environment is setup with proper memory and CPU;therfore, we don't have any resource issue and it should be much faster.

In meantime what will be the difference between version 11.2 and 11.3?


Please share your sample HTML here, so we will test the performance issue and will update you accordingly.

Furthermore in reference to difference in 11.2 and 11.3, latest version includes some enhancements and fixes for the issues reported in earlier versions.

Best Regards,
Hi David,

ontarioservices:
Hi

I checked the attached pdf file that you sent and the bookmarks are not working!

I added my html,css and generated PDF file and you can see the bookmarks are not working.

Please see attached image which shows what happens when I clicked on the bookmark's link.


The attached html file is not big but when I tried to convert the real html file with size over 600k it took 4 minutes to be converted.



Thanks for sharing the updated resource files. I have noticed the Html bookmark rendering issue and logged a ticket PDFNEWNET-40348 in our issue tracking system for further investigation and resolution.

For performance issue, we will appreciate it if you please share some sample problematic HTML document here, so we will look into it and guide you accordingly.

We are sorry for the inconvenience caused.

Best Regards,

when do you think the issue will be resolved? we can’t deliver a pdf file with broken bookmarks to client and need to have a fix as soon as possible.

Hi David,


Yes, as stated above I have noticed the bookmark issue in HTML to PDF conversion and logged the issue PDFNEWNET-40348 for rectification. We will notify you as soon as it is resolved.

Best Regards,

Please see attached file with a larger html file which is 1.21 MB (it is not really big). It took 3:40 minutes to be converted to the PDF.


You can test it using above code that I sent in previous post.

Thanks.

Have you tested the performance issue? why is that that slow?

Hi David,


Thanks for sharing the source HTML. I have tested the HTML to PDF conversion scenario and noticed the performance issue, conversion is taking 3.08 minutes. I have logged a ticket PDFNEWNET-40353 in our issue tracking system for further investigation and rectification. We will keep you updated about the issue resolution progress within this forum thread.

We are sorry for the inconvenience caused.

Best Regards,

Is there any update regarding those two issues? when can we expect to have the fix.

Hi David,


Thanks for your patience.

As we recently have noticed earlier reported issues, so they are still pending for review. Nevertheless, we will start investigating these issues as per our schedule and as soon as we have some definite updates regarding their resolution, we will let you know.

Please be patient and spare us little time.