Hi,
We have a requirement to read the body content from the word doc. we are using aspose.words.paragraph to read all the paragraphs. but upon this, we are getting paragraphs from header, footer, comments, etc along with doc body content, extra things can be loop through and filtered out, but that’s adding too many conditions . Please let me know if there is any way where we can just get the paragraphs from the main doc part (i.e doc body).
Thanks
@Nizzam2024 You can simply get paragraph from the body:
Document doc = new Document(@"C:\Temp\in.docx");
foreach (Section sect in doc.Sections)
{
foreach (Node child in sect.Body.GetChildNodes(NodeType.Any, false))
{
// Here are nodes from the document's main body.
}
}
Or you can use code like this:
Document doc = new Document(@"C:\Temp\in.docx");
List<Node> bodyNodes = doc.GetChildNodes(NodeType.Any, true)
.Where(n => n.GetAncestor(NodeType.Body) != null)
.ToList();
Please see our documentation to learn more about Aspose.Words Document Object Model:
https://docs.aspose.com/words/net/aspose-words-document-object-model/
@alexey.noskov One more thing is we should also support cross references. so if we are reading the specific section will the cross references work?
@Nizzam2024 Cross-references in MS Word documents are implemented using hyperlink and bookmark. So of they both are there, the cross-reference will work without problems.