HTML Parser

Hello,

I am investigating your components to see the ability to use it in our system (Very Large System).

We have a problem that I need your help with.

We need to pass a string that contains HTML formatted text and then we need objects to work with like nodes and attributes

We thought of something like XML Reader or DOM to parse the html , but we can’t guarantee that the HTML will be written in a standard way.

I would appreciate your help. Thank you!

Remon Zakaria

DashSoft

Aspose.Words was designed mostly with Microsoft Word document formats in mind. You can load DOC, RTF, WordML, DOCX and HTML into an Aspose.Words Document object. It represents a document (a Microsoft Word document) as a tree of nodes. The model is somewhat a hybrid of System.Xml DOM and MS Word Automation.

If your work is somehow related to Microsoft Word documents, then you should consider Aspose.Words. But if all you need to do is to deal with pure HTML, then try the open source HtmlAgilityPack project.