Bullet points in htm file produced by Word

When Microsoft Word is used to save a document to htm format, it changes various style names and converts the bullets and numbering styles to the bullet point or number with spaces to tab the text to the right place.

  • One

If you re-open this htm document in Word, this code is transformed back into the bullet and numbering style.

Is there any way of using Aspose to convert the Word html bullets back proper bulletted lists ?

Thanks
Fiona Treveil

Hi Fiona,

Thanks for your inquiry.

In case you are using an older version of Aspose.Words, I would suggest you please upgrade to the latest version (v14.5.0) from here and let us know how it goes on your side. If the problem still remains, please attach your input Word document here for testing. I will investigate the issue on my side and provide you more information.

Please note that
Aspose.Words mimics the same behavior as MS Word does. Aspose.Words
converts the MS Word documents to html (MS Word save option “Web Page,
Filtered”
). If you convert your document to HTML by using MS Word, you
will get the same output.

Thanks for the quick response. I attach a simple Word document that I have tried saving as both web page and filtered web page. The htm file contains a bullet point and some spaces. When I reopen the htm file with Word, there are bullet points against the 3 paragraphs. When I re-open the filtered one, I do not see these.

I am using Aspose to copy the contents of a full htm file into a new document. The resulting document has a bullet symbol and some spaces instead of this being a bulletted list.

I wondered if there was any way to read this as bullets and numbering when I copy the contents of the file.

Thanks
Fiona

Hi Fiona,

Thanks for your inquiry. Yes, you can achieve your requirements using Aspose.Words. Following code example save the Word document to HTML (with bullet list) and copy the same html into new Word document.

Document doc = new Document(MyDir + "One.doc");
doc.Save(MyDir + "Out.html");
Document newdoc = new Document(MyDir + "Out.html");
foreach (Paragraph para in doc.GetChildNodes(NodeType.Paragraph, true))
{
    if (para.IsListItem)
        Console.WriteLine(para.ToString(SaveFormat.Text));
}
newdoc.Save(MyDir + "Out.docx");

Upon processing HTML, some features of HTML might be lost. You can find a list of limitations upon HTML exporting/importing here:
https://docs.aspose.com/words/net/load-in-the-html-html-xhtml-mhtml-format/
https://docs.aspose.com/words/net/save-in-html-xhtml-mhtml-formats/

Hope this answers your query. Please let us know if you have any more queries.

Thanks for your help, unfortunately, we are editing the files in Word and saving as html, then using Aspose to combine the Word htm files into a single document.

When Word saves a file as html it does loads of processing to create something very complicated. It is able to undo all of this when re-editing the file in Word, but I cannot find any other way to get back the original bullet point information from the document.

I think that we might need to go back to the original word document for this to work.

Thanks
Fiona

We originally used the htm version so that we could display the word document in a .NET forms program window.

Is it possible to display a word document in a window using Aspose ?

Thanks
Fiona

Hi Fiona,

Thanks for your inquiry.

*fiona.treveil:

unfortunately, we are editing the files in Word and saving as html, then using Aspose to combine the Word htm files into a single document.
When Word saves a file as html it does loads of processing to create something very complicated. It is able to undo all of this when re-editing the file in Word, but I cannot find any other way to get back the original bullet point information from the document.*

Currently
most of the special Microsoft “Mso” attributes, which are normally
added by Microsoft Word to HTML output to make it round-trip capable
back to Word formats, are not written during export to HTML or MHTML.
This makes the HTML produced by Aspose.Words much cleaner than the
output produced by Microsoft Word which is often bloated with these many
round-trip based attributes.

I have logged this feature request as WORDSNET-10328 in our issue tracking system to import mso-list attributes in Aspose.Words DOM. You will be notified via this forum thread once this feature is available.

*fiona.treveil:

We originally used the htm version so that we could display the word document in a .NET forms program window.
Is it possible to display a word document in a window using Aspose ?*

Aspose.Words for .NET is just a class library and with it you can programmatically generate, modify, convert, render and print documents without utilizing Microsoft Word®. So, it does not offer any UI or web control for viewing Word documents.

Our sister company GroupDocs.com offers a viewer app which you may consider including in your application to view the various file formats. With GroupDocs apps, you can also have limited editing functionalities. For more information, I suggest you please contact GroupDocs support through live chat or support forums of GroupDocs.

Please let me know if I can be of any further assistance.