HTML to WORD to PDF

Team,

We have tried n number of things using aspose.word to parse the HTML to PDF. It’s not looking good at all after the conversion. We have the license for Aspose.Total & using it for PDF conversion from HTML. PDF is not formatted well as it looks in the HTML. Please find the attached html & PDF for your reference.

Issues noticed in simple HTML to PDF file conversion process.

1] Text & font size doesn’t look not exactly same as HTML.
2] Overall formatting is not 100% perfect.
3] Background color of the text is missing in PDF
4] Bullets are not up to the mark as they look in the HTML.
5] There is a problem with image rendering using current approach (.InsertHtml, BindHtml methods). 6] We can get the image in the pdf but positioning could be an problem..

Note: We tried Apose.pdf in the beginning & noticed it’s not even rendering the full HTML. So we moved it to the Aspose.Words as an alternative solution. If needed I can provide an examples for Aspose.PDF too. Can someone please help.

/* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin:0in; mso-para-margin-bottom:.0001pt; mso-pagination:widow-orphan; font-size:11.0pt; font-family:"Calibri","sans-serif"; mso-ascii-font-family:Calibri; mso-ascii-theme-font:minor-latin; mso-fareast-font-family:"Times New Roman"; mso-fareast-theme-font:minor-fareast; mso-hansi-font-family:Calibri; mso-hansi-theme-font:minor-latin; mso-bidi-font-family:"Times New Roman"; mso-bidi-theme-font:minor-bidi;}

<![endif]–>

Hi Viswanathan,


Thanks for your inquiry.

Please note that it is not guaranteed that the output PDF document will look exactly the same as the input HTML. This is because Aspose.Words was originally designed to work with Microsoft Word documents and HTML documents are quite different.

Aspose.Words’ HTML engine tries to mimic the way the Microsoft Word works. To you, this means that if you convert HTML file into a Microsoft Word document or PDF using Aspose.Words, the output will appear almost exactly as if it was done by Microsoft Word. Moreover, I have attached two PDF documents (one generated by Aspose.Words and the other generated by Microsoft Word) here for your reference.

Moreover, while using the latest version of Aspose.Words i.e. 13.1.0, I managed to reproduce the issue mentioned in your fourth point on my side (see the attached screenshot for details). I have logged this issue in our bug tracking system. The issue ID is WORDSNET-7764. Your request has also been linked to this issue and you will be notified as soon as it is resolved. Sorry for the inconvenience.

Best regards,

Hi Awais,

If we keep the Aspose.Words out of scope for the moment, my ask is that the o/p of HTML should be similar to the o/p of Aspose.pdf.

Attached is a sample HTML input and the output using Aspose.Pdf. Why is the output using Aspose.pdf different than the formatting shown in HTML?

Hi,


Thanks for your inquiry. As your question is more related to Aspose.Pdf. I will move your request in Aspose.Pdf forum. My colleagues from Aspose.Pdf component team will answer you shortly.

Best regards,

Thanks Awais. This is regarding the post Posts 5,449.



Are you not able to replicate the point 1,2,3 & 5 .

If your answer is yes, Can you please let us know solution for it. We are using same version as you & still has the issues as described in the original post.

Please keep us posted if you manage to get exact PDF from the attached HTML.
We are OK to use any Aspose product to get this task done. So please suggest.


Viswanathan.sundaresan@bankofamerica.com:
Attached is a sample HTML input and the output using Aspose.Pdf. Why is the output using Aspose.pdf different than the formatting shown in HTML?
Hi,

Thanks for using our products.<o:p></o:p>

I have tested the scenario and as per my observations, an exception is occurring when converting HTML file to PDF. As per my findings, you are using Aspose.Pdf for .NET 7.2.0 whereas I have tested the scenario with Aspose.Pdf for .NET 7.7.0. For the sake of correction, I have logged this issue as PDFNEWNET-34874 in our issue tracking system. We will further look into the details of this problem and will keep you updated on the status of correction. Please be patient and spare us little time. We are sorry for this inconvenience.

Hi Viswanathan,

Thanks for your inquiry.
Viswanathan:

Are you not able to replicate the point 1,2,3 & 5 .

If your answer is yes, Can you please let us know solution for it. We are using same version as you & still has the issues as described in the original post.


Yes, I managed to reproduce the same issues (1, 2, & 3) on my side; but, as mentioned here, it is not guaranteed that the output PDF document will look exactly the same as the input HTML. This is because Aspose.Words mimics the way the Microsoft Word works. Moreover, I used Aspose.Words 13.1 and the following code snippet to generate ‘out-aspose.words-13.1.pdf’ (please find this attachment here):

Document
doc = new Document(@“C:\Temp\sample1.html”);

doc.UpdateTableLayout();

doc.Save(@“C:\temp\output.pdf”);


If I can help you with anything else, please feel free to ask.

Best regards,

Aspose team,

Let's keep Aspose.Words Document class out of the picture. As I had requested, the HTML output should be 100% similar to the output generated from Aspose.Pdf.

Please let me know if this is a reasonable expectation.

Hi Viswanathan,


We will further investigate the reasons on why an exception is occurring during HTML to PDF when using the latest version of Aspose.Pdf for .NET, whereas the same conversion worked with Aspose.Pdf for .NET 7.2.0. Please be patient and spare us little time.

Team,

Any Updates on the previous queries.

Can someone answer the follwoing questions.

What HTML standards are supported by Aspose products ?
What CSS standards/version support by Aspose products ?

Hi Viswanathan,


Thanks for your request.

Regarding WORDSNET-7764, our development team has completed the analysis of this issue and the root cause has been identified. The problem occurs because when the ‘margin-left’ CSS property is set for a LI element, its value overrides indentation value of the list item. The reason of this bug is that Aspose.Word applies values of CSS properties to list items in the same way as it applies them to usual paragraphs. Currently, it does not take into account the indentation of outer list levels when it applies margin-left values to list items. Rest assured, I will be sure to inform you via this forum thread as soon as this issue is resolved. I apologize for any inconvenience.

Moreover, the HTML produced by Aspose.Words conforms to HTML 4.0 or XHTML 1.0 Transitional specifications and Aspose.Words supports most CSS 1 and CCS 2 properties that have an eqivilant use in Word documents.

Please let me know if I can be of any further assistance.

Best regards,

Not sure why you are saying HTML produced by Aspose.Words. Please don't look at the existing HTML with reference to above 2 questions. My questions are generic & need to know the followings using Aspose.Word & Aspose.PDF. Can you please answer keeping both the products in mind?

Can someone answer the following questions?

What HTML standards / version are supported by Aspose.PDF/Aspose.Words?
What CSS standards/version supported by Aspose.PDF/Aspose.Words

/* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin-top:0in; mso-para-margin-right:0in; mso-para-margin-bottom:10.0pt; mso-para-margin-left:0in; line-height:115%; mso-pagination:widow-orphan; font-size:11.0pt; font-family:"Calibri","sans-serif"; mso-ascii-font-family:Calibri; mso-ascii-theme-font:minor-latin; mso-hansi-font-family:Calibri; mso-hansi-theme-font:minor-latin;}

<![endif]–>

Can someone please reply.

Hi Viswanathan,


Sorry for the delayed response.

I am in coordination with development team to get answer pertaining to your queries. Soon you will be updated with the required information.

Team,

We will appreciate if you answer all the questions as I mentioned in first & previous post.

There are total 5 questions in my first post. You have identified the issue for the question number 4 but how about the rest of the 4. So basically we are waiting for the 7 questions including the 2 which I asked in the previous post. We have a technology decision pending based on your answers. Please answer the all the question keeping Aspose.PDF & Aspose.Word in mind.

/* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin-top:0in; mso-para-margin-right:0in; mso-para-margin-bottom:10.0pt; mso-para-margin-left:0in; line-height:115%; mso-pagination:widow-orphan; font-size:11.0pt; font-family:"Calibri","sans-serif"; mso-ascii-font-family:Calibri; mso-ascii-theme-font:minor-latin; mso-fareast-font-family:"Times New Roman"; mso-fareast-theme-font:minor-fareast; mso-hansi-font-family:Calibri; mso-hansi-theme-font:minor-latin;}

<![endif]–>Note :- We tried the same HTML with Expert PDF & they are rendering it very well.This company is very small as compared to Aspose but I am not sure how they are managing to get the perfect PDF.

Please reply.

Hi Viswanathan,


Thanks for your patience.

I am afraid Aspose.Pdf for .NET is currently encountering an exception during HTML to PDF conversion and once we are able to resolve this problem, then we would be able to render the output in PDF format and can confirm which of the above questions are satisfied with Aspose.Pdf for .NET. Please be patient and spare us little time. We are really sorry for this delay and inconvenience.

Viswanathan.sundaresan@bankofamerica.com:
Can someone answer the following
questions?

What HTML standards / version are supported by Aspose.PDF/Aspose.Words?
What CSS standards/version supported by Aspose.PDF/Aspose.Words

Hi Viswanathan,


Thanks for your patience.


Aspose.Pdf for .NET fully supports HTML 4.1 and CSS 2.1. However HTML5 is partially supported i.e. The feature 'Canvas' object of HTML5 is not yet implemented. Our component do support CSS3 standard but the support is still partial.


In case of any further query, please feel free to contact.

Thanks for answering the above questions.

Do you have any ETA for the rest of 5 Questions ? We have raised these questions on 02-07-2013, 10:08 AM EST & still waiting for your solutions.

.

Hi Viswanathan,

We have a good news for you that is WORDSNET-7764 has now been resolved and its fix will be included in the next version of Aspose.Words (13.2) which is planned to be released by the end of this month. We will inform you via this forum thread as soon as the new release is published.

Best regards,

Many Thanks Guys.

Does it mean your team has addressed all 5 issues as mentioned in the first post ?
Its very important for us to know this before we start making the changes.Please confirm.