Free Support Forum - aspose.com

Recognize footnote text

Hello Aspose Team.

I have a little issue with the footnote’s in Aspose.Words.

When I work with a document where footnote’s are set there are just represented as normal text behind the text with the footnote index, like:

text

[1].


[1] this is a footnote.


gets:


text this is a footnote.


i would like now to represent it similar like in latex:


text


tried to solve it with StyleIdentifier but it didn't work.


how to solve it?


Regards


Adam




Hello

<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" />

Thanks for your request. I think, in your case, you can try using DocumentVisitor to achieve what you need. Please follow the link to learn more:

http://www.aspose.com/documentation/.net-components/aspose.words-for-.net/howto-extract-content-using-documentvisitor.html

Hope this helps.

Best regards,

Hi Aspose Team.

I tried with DocumentVisitor but could not reach my target. This class is looking more like a collector and i need an on-the-fly recognition.

My method for searching things in a document is to go through every node in the document and check for the Node Type.

For example with NodeType.PARAGRAPH i can cast the node to a paragraph and get for example the StyleIdentifier to identify Headings.

with NodeType.TABLE i can work with table elements and parse the content in my structure.

there are also NodeType.FOOTNOTE or NodeType.HEADER but there are not working, i try to cast but it’s not working, i can’t get the text of a footnote or header.

but for me this is the best way: running through the document and check every node for it’s type.

But why there are the NodeTypes FOOTNOTE and HEADER but there are not working.

Or is my way of thinking wrong??

Thanks for your help.

Regards

Adam

Hi Adam,

Thanks for your inquiry. Could you post your template and implementation here and I will take a look into for you.

Thanks,

Hi Adam (nice, same name :slight_smile: )

i prepared a piece of code for you as easy as possible. i think you will be able to see how i’m going through all nodes. I tried to recognize footnotes and get their text in 2 ways: first with the style identifier of a paragraph and second with the own node type footnote. but all 2 things have been without success. I’m also using RunNodes to get text styles like bold, italic,… but couldn’t find any method or class related to footnotes there. I hope you can help me now.

doc = loadDocument();

//Get first node in the document

Node currentNode = doc.getFirstSection().getBody().getFirstChild();

//loop through all nodes in the document and search for headings

while (currentNode != null)

{

//Check whether current node is Paragraph
if (currentNode.getNodeType() == NodeType.PARAGRAPH)

{

//for StyleIdentifier
Paragraph currPar = (Paragraph)currentNode;

//for Run Objects
NodeCollection runs = currPar.getRuns();

//Check whether paragraph is title of section

//number of Style Identifier
int style = currPar.getParagraphFormat().getStyleIdentifier();


//for list elements
if (currPar.isListItem()){

numStyle = currPar.getListFormat().getListLevel().getNumberStyle();

}

switch(style)

{

//Footnote
case StyleIdentifier.FOOTNOTE_TEXT:

//should print footnotes

break;

//Heading level 1
case StyleIdentifier.HEADING_1:

//print as heading 1

break;

//Heading level 2
case StyleIdentifier.HEADING_2:

//print as heading 2

break;

//Heading level 3
case StyleIdentifier.HEADING_3:

//print as heading 3

break;


default: //regard it as normal paragraph

//print as normal paragraph

break;
}

}

//Check whether current node is Table
else if (currentNode.getNodeType() == NodeType.TABLE)

{

//work with table elements

}

else if (currentNode.getNodeType() == NodeType.FOOTNOTE){

//should be able to work with footnotes

}

if (currentNode.getNextSibling() == null)

{

Node currSect = currentNode.getAncestor(NodeType.SECTION);

//If there is one more section then move to next section

if (currSect.getNextSibling() != null)

currentNode = currSect.getNextSibling().getDocument().getFirstChild();

else

currentNode = null;

}

else

{

//Move to next node

currentNode = currentNode.getNextSibling();

}

}

}

Regards

Adam

Hi Adam,

<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" />

Thank you for additional information. It would be great if you also attach your document here for testing. The problem might be in the document itself, but not in your code.

Best regards,

oh yes, of course. sorry i forgot.

it’s a simple document with a footnote and a header. btw maybe you can tell me directly how to access the header information?

Hi Adam :)

Thanks for posting your template and code here. I have taken a look into it. I believe there is an easier way to achieve what you are looking for. Please see the structure of a footnote in your Document as shown in the Demo project DocumentExplorer below:

As you can see the footnote is represented by a Footnote node inline directly after the reference in the main text. This node is an inline story so contains child nodes of paragraphs and runs which make up the footnote as seen at the bottom of the page in MS Word.

In this case we should be able to simply search for the Footnote nodes in document and then work with paragraph and the footnote text easily from there. Please see the sample implementation below which does this.

Document doc = new Document("Aspose.docx");

NodeCollection nodes = doc.getChildNodes(NodeType.FOOTNOTE, true);

for(Footnote footNote : nodes) {

String parentParaText = footNote.getParentParagraph().toTxt().trim();

String footNoteText = footNote.toTxt().trim();

String outputText = parentParaText + " " + "";

System.out.println(outputText);

}

There should be no problem with the footnotes being referenced from text inside the header. The reason this was a problem with your original code was because the Header and Footers are separate nodes from the main body of text in a Document. You can see this in the screen shot above as the HeaderFooter node is separate from the Body node.

In order to work with in the header in the Document you can call the getHeadersFooters method on each Section in your document. It will return a HeaderFooter object which you can work with.

If you have any further inquries, please feel free to ask.

Thanks,

ok, i implemented it your way and now it’s working quite perfect. just one little last issue i found in the output:

"…young adult ducks ready for roasting are sometimes labelled “duckling”. This is a footnote text. <footnote = “This is a footnote text.”>".

He is printing the text of the footnote as normal paragraph before writing it in tags. is there a way to avoid this? to get a clear output like:

“…young adult ducks ready for roasting are sometimes labelled
“duckling”. <footnote = “This is a footnote
text.”>”.

Thank you very much for your help :slight_smile:

Hi Adam,

Sure, to do this we have to avoid getting the text of the Footnote nodes when getting the text of the Paragraph. Please see code implementation below which achieves this.

NodeCollection nodes = doc.getChildNodes(NodeType.FOOTNOTE, true);
for(Footnote footNote : nodes)
{
Paragraph parentPara = footNote.getParentParagraph();
StringBuilder paraText = new StringBuilder();

// Iterate through all children of the paragraph and append the text of a node
// as long as it's not a footnote.
Node nextNode = parentPara.getFirstChild();
while(nextNode != null)
{
if(nextNode.getNodeType() != NodeType.FOOTNOTE)
paraText.append(nextNode.toTxt());

nextNode = nextNode.getNextSibling();
}

String footNoteText = footNote.toTxt().trim();
String outputText = paraText + " " + "";

System.out.println(outputText);
}

Thanks,