HTML tags are not formed properly when converting from PDF

Hi,

We are trying to convert a PDF file into HTML using the following code.

Document doc = new Document(@“C:\Test.pdf”);
doc.Save(@“c:\test.html”, SaveFormat.Html);

It does the conversion and saves a html version, but the file does not have any HTML tag other that the “div” tag.

For e.g.
Typically, these 2 lines would be represented in a table as separate rows, but the converted file just has style which pushes the data to different parts of the page.

div class=“stl_01” style="left:35.2809em;top: 8.7022em; ">span class=“stl_07 stl_08 stl_09” style=“word-spacing:0.0027em;”>Date: 18-Feb-2019  </span</div

div class=“stl_01” style="left:36.1852em;top: 9.7918em; ">span class=“stl_07 stl_08 stl_10” style=“word-spacing:0.0025em;”>Name: ABCDE FGHIJ  </span</div

Is there any option to convert the pdf to a traditional html?

@maheshagouda_policepatil_baml_com

Thank you for contacting support.

Would you please share source and generated ZIP files so that we may try to reproduce and investigate it in our environment. Before sharing requested data, please ensure using Aspose.PDF for .NET 19.3.

Yes, we are using the latest version. I’ve attached 3 files, the first file is the normal html, then a pdf generated out of it and the last is the html generated out of the pdf.

Code to generate pdf

byte[] content = File.ReadAllBytes(@“c:\TestForAspose.html”);
using (Document pdf = new Document(new MemoryStream(content), new HtmlLoadOptions()))
{
pdf.Save(@“c:\TestForAspose.PDF”);
}

Code to convert back to HTML

Document doc = new Document(@“c:\TestForAspose.pdf”);
doc.Save(@“c:\TestForAsposeConverted.html”, SaveFormat.Html);

@maheshagouda_policepatil_baml_com

Thank you for elaborating it further.

We can not find attached files so kindly ZIP the files and then upload by Drag and Drop into the post editor. Or alternatively, please share these via Google Drive, Dropbox etc. if they are bigger in size, so that we may proceed to assist you accordingly.

TestForAspose.zip (38.4 KB)
Attached the zip file as requested.

@maheshagouda_policepatil_baml_com

We have been able to notice the difference between source and generated HTML files. However, this is because of internal algorithm of Aspose.PDF for .NET API which, we are afraid, may not be changed.