Free Support Forum - aspose.com

DOCX to HTML - Too many HTML tags

Hi,
I hope you can help me.

I have some issues converting DOCX to HTML files.

You can see the files I used in the attachments.

I converted "Test_File.docx" to "Test_File.html" with Aspose.Word (17.1.0.0).

The html file contains too many tags that break the sentences.

For instance:
Docx file contains:

Normal text, Normal text, Normal text....

Html file contains:

Normal Text,
 
Normal Text,
 
Normal Text,

I think the result is too verbose. What I expect is something like that:
Normal Text, Normal Text, Normal Text,

The code I used to convert DOCX to HTML is simply:
Document document = new Document(dataDir + "Test_File.docx");
document.Save(dataDir + "Test_File.html", SaveFormat.Html);

I also tryed many options of "HtmlSaveOptions" class but the result still remains almost the same.

Is there any way to obtain a file like "Test_File_GOAL.html"?

I need to have html files cleaned for future custom edits.


Kind regards,

Andrea
Hi Andrea,

Thanks for your inquiry. We tested the scenario and have managed to reproduce the same problem on our end. For the sake of correction, we have logged this problem in our issue tracking system as WORDSNET-14861. Our product team will further look into the details of this problem and we will keep you updated on the status of correction. We apologize for your inconvenience.

Best regards,

A post was split to a new topic: Too many HTML tags - DOCX to HTML