Convert PDF to HTML and HTML to PDF using Aspose.PDF for .NET - formatting is lost

Hi Team,

I was evaluating yout Pdf Library for our requirement where we tried to convert PDF to HTML and then rendering the HTML in our Kendo RichTextEditor where we can edit the html and saving back to PDF or DOCX which loses entire formatting in output pdf/docx.

I have attached the source pdf and converted HTML in the zip and below is the code i have used to convert it to HTML.

Could you please help us to fix the issue so that we can go ahead with the purchase.

Aspose.Pdf.HtmlSaveOptions saveOptions = new Aspose.Pdf.HtmlSaveOptions();
saveOptions.FixedLayout = true;
saveOptions.RasterImagesSavingMode = Aspose.Pdf.HtmlSaveOptions.RasterImagesSavingModes.AsEmbeddedPartsOfPngPageBackground;
saveOptions.PartsEmbeddingMode = Aspose.Pdf.HtmlSaveOptions.PartsEmbeddingModes.EmbedAllIntoHtml;
saveOptions.SplitIntoPages = false;// force write HTMLs of all pages into one output document
wrdf.Save(htmlOutput, saveOptions);
Ravi Paladiya Resume_SeniorMSDynamicsCRMDeveloper_6 Years_aeup.zip (323.7 KB)

@agrawaltejas

Thanks for contacting support.

Would you kindly share a bit more details like how you are converting HTML back to PDF and what kind of formatting issues you are facing. It would be helpful if you could please share a screenshot of errors that you are noticing in output. We will test the scenario in our environment and address it accordingly.

Hi asad,

We are using Kendo RichText Editor for Exporting HTML as PDF/DOC.
You can use https://htmltidy.net/ to paste the HTML and see the same formatting issue.

Attached the output document from HTML.
EditorContent (18).zip (320.3 KB)

@agrawaltejas

Thanks for sharing more details.

We have observed the issue in our environment that you have mentioned and it needs to be further investigated. For the purpose, we have generated an investigation ticket as PDFNET-47639 in our issue tracking system. We will look into details of it and keep you posted with the status of its resolution. Please be patient and spare us little time.

Besides Kendo RichTextEditor, we have checked the HTML by pasting it here(https://htmltidy.net/) but were unable to notice any formatting issue (see the attached screenshot). Would you please point out some issues here in a screenshot so that we can address it as well.

HTMLRendering.png (138.4 KB)

We are sorry for the inconvenience.

Hi Team,

Did you get any resolution for this as we need to purchase the product as soon as possible as we are stuck here for our problem.

@agrawaltejas

Regretfully, the issue is pending for analysis due to low priority. It will be investigated and resolved on first come first serve basis. However,

Would you kindly provide your feedback upon above of our comments. Your feedback would help us investigating the issue accordingly.

Hi Asad,

For htmltidy.net, paste the html and then click the Tidy Button then it will show the unformatted Output.

I understand you are investigating on first come first server basis, however, we have an urgency to go live with your product and without this fix, we are stuck at the moment and we have to check another vendor tool for the same if your team could not help us quickly.

@agrawaltejas

Thanks for the feedback.

We have recorded your concerns and will surely consider them during investigation of the ticket. Please spare us some time. We will surely inform you as soon as we have some additional updates in this regard.

We are sorry for the inconvenience.

Hi asad,

Could you please escalate this and help us to fix this issue as soon as possible.
I have my temporary license getting expired in couple of days.

@agrawaltejas

Thanks for getting back to us.

We definitely understand your concerns and have escalated the issue to next level of priority. We will surely look into this while considering your concerns and let you know about the updates as soon as possible. Please spare us some time.

We are sorry for the inconvenience.

Hi Asad,

We encountered another issue while converting the Doc to HTML where Danish characters were not converted/parsed properly. Attached the zip file with source document and converted HTML.

Let me know if there is anything which can improve this conversion.

CL CV v188.zip (126.7 KB)
into HTML.

@agrawaltejas

Would you please share the code snippet through which you are converting Word file to HTML. We believe that you are using Aspose.Words for the purpose and if so, please post your inquiry in Aspose.Words forum category so that you can be assisted accordingly.

Hi Asad,

Do we have any updates on PDF issue? We want to make it work as soon as possible.

@agrawaltejas

We converted source PDF to HTML and used htmltidy.net (Tidy button) for the HTML content.
HTML file created by Aspose.PDF has absolute text positioning defined in CSS. Tidy removes the CSS body or link to it from the HTML file, so the absolute positioning of the text disappears and the HTML formatting looks broken.

Hi Asad,

Yeah, but we are having issue same when on our Kendo RichTextEditor when we export that html back to PDF. Its just we noticed same thing on htmltidy. If you want we can do screenshare and demonstrate.

@agrawaltejas

As shared earlier that file created by Aspose.PDF has absolute text positioning defined in CSS and in your case, the HTML editor (Kendo RichTextEditor) is removing it. The issue does not seem directly related to the API. However, if you can please provide an expected output HTML which is supported by your HTML editor - we will try to investigate the feasibility whether Aspose.PDF can generate such HTML or not.

Hi Asad,

I have attached the html file which is converted from the same pdf to html using MS Office. This is what kind of html we are expecting. Let me know if it is possible to achieve.
Ravi Paladiya Resume_SeniorMSDynamicsCRMDeveloper_6 Years.pdf.zip (15.5 KB)

@agrawaltejas

Sure, we will surely investigate the feasibility of your requirements and will share additional updates with you as soon as we have some. Please spare us some time.

Hi Asad,

Its been more than a month since we had the issue and we have still not received any estimated time frame when this could be fixed. This is really critical blocker for us to purchase your product license and we have to try any other tool for conversion.

Do you still want to give us one chance to wait for any updates or we can put this down.