I extract text content from Docx document.
Is there a way to differentiate Run, which contains some real text, represented on page from Run, with Run which contains field information (for example HYPERLINK)?
Basically I need somehow ignore these Runs, with field-related text inside.
But I can’t find a civilized way to do that.
I could create set with names of all possible fields and filter out all Runs which starts with either of this name. But in that case - what if there would be Run which starts with HYPERLINK for example and it would be not related to field? - I would ignore it as well and it’s not a desired behavior.
Field related text is inside w:instrText node, where usual content is in w:t, but I can’t find anything that would indicate that Run is field-related in properties.
Could you advise how to resolve this problem?
Hi Andrey,
NodeCollection nodeColl = doc.getChildNodes(NodeType.PARAGRAPH, true);
for (Paragraph para : (Iterable<Paragraph>) nodeColl)
{
String text = para.toString(SaveFormat.TEXT);
System.out.println(text);
}
I hope, this helps.
Best regards,