I tried with DocumentVisitor but could not reach my target. This class is looking more like a collector and i need an on-the-fly recognition.
My method for searching things in a document is to go through every node in the document and check for the Node Type.
For example with NodeType.PARAGRAPH i can cast the node to a paragraph and get for example the StyleIdentifier to identify Headings.
with NodeType.TABLE i can work with table elements and parse the content in my structure.
there are also NodeType.FOOTNOTE or NodeType.HEADER but there are not working, i try to cast but it’s not working, i can’t get the text of a footnote or header.
but for me this is the best way: running through the document and check every node for it’s type.
But why there are the NodeTypes FOOTNOTE and HEADER but there are not working.
i prepared a piece of code for you as easy as possible. i think you will be able to see how i’m going through all nodes. I tried to recognize footnotes and get their text in 2 ways: first with the style identifier of a paragraph and second with the own node type footnote. but all 2 things have been without success. I’m also using RunNodes to get text styles like bold, italic,… but couldn’t find any method or class related to footnotes there. I hope you can help me now.
doc = loadDocument();
// Get first node in the document
Node currentNode = doc.getFirstSection().getBody().getFirstChild();
// loop through all nodes in the document and search for headings
while (currentNode != null)
{
// Check whether current node is Paragraph
if (currentNode.getNodeType() == NodeType.PARAGRAPH)
{
// for StyleIdentifier
Paragraph currPar = (Paragraph) currentNode;
// for Run Objects
NodeCollection runs = currPar.getRuns();
// Check whether paragraph is title of section
// number of Style Identifier
int style = currPar.getParagraphFormat().getStyleIdentifier();
// for list elements
if (currPar.isListItem())
{
numStyle = currPar.getListFormat().getListLevel().getNumberStyle();
}
switch (style)
{
// Footnote
case StyleIdentifier.FOOTNOTE_TEXT:
// should print footnotes
break;
// Heading level 1
case StyleIdentifier.HEADING_1:
// print as heading 1
break;
// Heading level 2
case StyleIdentifier.HEADING_2:
// print as heading 2
break;
// Heading level 3
case StyleIdentifier.HEADING_3:
// print as heading 3
break;
default: //regard it as normal paragraph
// print as normal paragraph
break;
}
}
// Check whether current node is Table else if (currentNode.getNodeType() == NodeType.TABLE)
{
// work with table elements
}
else if (currentNode.getNodeType() == NodeType.FOOTNOTE)
{
// should be able to work with footnotes
}
if (currentNode.getNextSibling() == null)
{
Node currSect = currentNode.getAncestor(NodeType.SECTION);
// If there is one more section then move to next section
if (currSect.getNextSibling() != null)
currentNode = currSect.getNextSibling().getDocument().getFirstChild();
else
currentNode = null;
}
else
{
// Move to next node
currentNode = currentNode.getNextSibling();
}
}
}
Thank you for additional information. It would be great if you also attach your document here for testing. The problem might be in the document itself, but not in your code.
Best regards,
Hi Adam
Thanks for posting your template and code here. I have taken a look into it. I believe there is an easier way to achieve what you are looking for. Please see the structure of a footnote in your Document as shown in the Demo project DocumentExplorer below:
``
As you can see the footnote is represented by a Footnote node inline directly after the reference in the main text. This node is an inline story so contains child nodes of paragraphs and runs which make up the footnote as seen at the bottom of the page in MS Word.
In this case we should be able to simply search for the Footnote nodes in document and then work with paragraph and the footnote text easily from there. Please see the sample implementation below which does this.
There should be no problem with the footnotes being referenced from text inside the header. The reason this was a problem with your original code was because the Header and Footers are separate nodes from the main body of text in a Document. You can see this in the screen shot above as the HeaderFooter node is separate from the Body node.
In order to work with in the header in the Document you can call the getHeadersFooters method on each Section in your document. It will return a HeaderFooter object which you can work with.
If you have any further inquries, please feel free to ask.
Thanks,
Sure, to do this we have to avoid getting the text of the Footnote nodes when getting the text of the Paragraph. Please see code implementation below which achieves this.
NodeCollection nodes = doc.getChildNodes(NodeType.FOOTNOTE, true);
for (Footnote footNote: nodes)
{
Paragraph parentPara = footNote.getParentParagraph();
StringBuilder paraText = new StringBuilder();
// Iterate through all children of the paragraph and append the text of a node
// as long as it's not a footnote.
Node nextNode = parentPara.getFirstChild();
while (nextNode != null)
{
if (nextNode.getNodeType() != NodeType.FOOTNOTE)
paraText.append(nextNode.toTxt());
nextNode = nextNode.getNextSibling();
}
String footNoteText = footNote.toTxt().trim();
String outputText = paraText + " " + "";
System.out.println(outputText);
}
Thanks,
Cookie Notice
To provide you with the best experience, we use cookies for personalization, analytics, and ads. By using our site, you agree to our cookie policy.
Enables storage, such as cookies, related to analytics.
Sets consent for sending user data to Google for online advertising purposes.
Sets consent for personalized advertising.
Cookie Notice
To provide you with the best experience, we use cookies for personalization, analytics, and ads. By using our site, you agree to our cookie policy.
More info
Enables storage, such as cookies, related to analytics.
Enables storage, such as cookies, related to advertising.
Sets consent for sending user data to Google for online advertising purposes.
Sets consent for personalized advertising.
Cookie Notice
To provide you with the best experience, we use cookies for personalization, analytics, and ads. By using our site, you agree to our cookie policy.
More info
Enables storage, such as cookies, related to analytics.
Enables storage, such as cookies, related to advertising.
Sets consent for sending user data to Google for online advertising purposes.