Is there a way to select a specific paragraph or section by some style class or attribute once the document node has been created from an HTML import? There are specific DIV elements which I’d like to manipulate once imported, but since the XPATH support is limited to just element names and not their attributes, I’m finding this rather difficult.
These elements are quite deep, and the XPATH could theoretically reference paragraphs in other sections. If I were to add some sort of attribute to the HTML DIVs in question, and then use:
NodeList paras = DocNode.SelectNodes("//Body/Paragraph");
How could I loop through this collection and test to see which of these paragraphs are the ones I’m interested in? What identifier could be used?
Thanks in advance!
Hi Sean,
Thanks for your inquiry. Please note that Aspose.Words is quite different from the Microsoft Word’s Object Model in that it represents the document as a tree of objects more like an XML DOM tree. When you load a Word document into Aspose.Words, it builds its DOM and all document elements and formatting are simply loaded into memory. Please read the following articles for more information on DOM:
https://docs.aspose.com/words/net/aspose-words-document-object-model/
https://docs.aspose.com/words/net/logical-levels-of-nodes-in-a-document/
The CompositeNode.SelectNodes selects a list of nodes matching the XPath expression. In your case, the list of Paragraph nodes. There is no specific identifier attached to a Paragraph node. You can identify a Paragraph node by its text or font properties. Please read the members of Paragraph class from here:
https://reference.aspose.com/words/net/aspose.words/paragraph/
In your case, I suggest you please enclose each Paragraph inside bookmark to get specific Paragraph.
Hope this answers your query. Please let us know if you have any more queries.
Thank you, Tahir.
The problem with using bookmarks is that I’d first have to access those specific paragraphs somehow after the HTML import occurs in order to wrap that content in a bookmark, bringing me back to my original issue.
Is there a way to assign bookmarks in the HTML itself that is translated during the HTML import?
Hi Sean,
Thanks for your inquiry. Yes, you import the HTML bookmarks in Aspose.Words DOM. Please check the following html snippet.
<a name=bm>test</a>
I have attached the Aspose.Words DOM image for this html also. Please let us know if you have any more queries.