VisitorAction.SKIP_THIS_NODE not working

alexmcmillan · February 16, 2013, 5:45am

I’m using the Visitor pattern to work my way through a Document. I have a class that extends DocumentVisitor, overriding the “visit” methods e.g:

@Override
public int visitFieldStart(FieldStart fieldStart) throws Exception
{
    LOG.trace("FieldStart");
    switch (fieldStart.getFieldType())
    {
        case FieldType.FIELD_TOC:
            // Skip Table-of-Contents nodes
            LOG.trace("Skipping TOC Nodes");
            return VisitorAction.SKIP_THIS_NODE;
    }

    return VisitorAction.CONTINUE;
}

However, even though the LOG message “Skipping TOC Nodes” appears in my log file, the child nodes of the FieldStart are passed in to their respective “visit” functions anyway.

My actual log file:

…ParagraphStart
…FieldStart
…Skipping TOC Nodes
…Run: TOC \o “1-3” \h \z \u
…FieldSeparator
…FieldStart
…Run: HYPERLINK \l “_Toc255301557”
…FieldSeparator
…Run: Page
…FieldStart (FIELD_PAGE_REF)
…Run: PAGEREF _Toc255301557 \h
…FieldSeparator
…Run: 2.
…FieldEnd
…FieldEnd
…FieldEnd
…ParagraphEnd

Expected Log file:

…ParagraphStart
…FieldStart
…Skipping TOC Nodes
…ParagraphEnd

What am I doing wrong, and how can I skip child nodes of a node passed into a “visit” function?

alexmcmillan · February 16, 2013, 11:28pm

It appears that the runs and fields inside the TOC are actually SIBLINGS of the FieldStart node, not children. How can I get the corresponding FieldEnd node for a specific FieldStart node?

tahir.manzoor · February 18, 2013, 3:44am

Hi Alex,

Thanks for your inquiry. A field in a Word document is a complex structure consisting of multiple nodes that include field start, field code, field separator, field result and field end. Fields can be nested, contain rich content and span multiple paragraphs or sections in a document. The Field class is a “facade” object that provides properties and methods that allow to work with a field as a single object.

The Start, Separator and End properties point to the field start, separator and end nodes of the field respectively.

The content between the field start and separator is the field code. The content between the field separator and field end is the field result. The field code typically consists of one or more Run objects that specify instructions. The processing application is expected to execute the field code to calculate the field result.

You are using LOG.trace method before VisitorAction.SKIP_THIS_NODE that is the reason you are getting the log of Field Start node. Please use LOG.trace method at appropriate position in your code.

FieldChar.getField method return a field for the field char. Once you have filed object you can use Field.getEnd() method to gets the node that represents the field end. You can view the DOM structure of your documents by opening them with DocumentExplorer.

Please see the following code snippet for your kind reference.

Document doc = new Document(MyDir + "in.docx");

FieldStart fStart = (FieldStart) doc.getChild(NodeType.FIELD_START, 0, true);
Field field = fStart.getField();
FieldEnd fEnd = field.getEnd();