How to find Paragraph ID?

Hello,

I need to do some checks on word documents which are created by customers using a docm-template I have no influence on.

I need to check if all items in the bibliography are being referred to.
The references are generated using the id of the paragraph containing the bibliography item as an anchor.

I can see the id in the reference.
I do not know how to get this id (the id of the paragraph the reference is referring to).

Is there any chance to get this id from aspose?

A zipped example file is enclosed.

Hi

Thanks for your request. I suppose what you are calling “ID of Paragraph” is just a name of REF field at the end of the paragraph. (Press Alt+F9 to see these fields). If so, you can try using the following code to get these IDs:

// Open document.
Document doc = new Document("C:\\Temp\\aspose_demo.docm");
// Get all paragraphs.
NodeCollection paragraphs = doc.getChildNodes(NodeType.PARAGRAPH, true);
// Loop through all paragraphs in the docuemnt and get their IDs.
for (int i = 0; i <paragraphs.getCount(); i++)
{
    Paragraph currentParagraph = (Paragraph) paragraphs.get(i);
    System.out.printf("%d\t%s\n", i, GetParagraphId(currentParagraph));
}
/**
* Returst ID of the paragraph. If the paragraph does not have ID, returns an empty string.
*/
private static String GetParagraphId(Paragraph par)
{
    // "Paragraph ID" and of REF field.
    // So we need to find REF field in the paragraph.
    NodeCollection starts = par.getChildNodes(NodeType.FIELD_START, true);
    // If there are no fields return an empty string because the paragraph does not have any ID.
    if (starts.getCount() <= 0)
        return "";
    // Search for start of REF field.
    FieldStart refStart = null;
    for (int i = 0; i <starts.getCount(); i++)
    {
        FieldStart start = (FieldStart) starts.get(i);
        // Check field type. If this is start of REF field, stop searching.
        if (start.getFieldType() == FieldType.FIELD_REF)
        {
            refStart = start;
            break;
        }
    }
    // If there are no REF field, return an empty string.
    if (refStart == null)
        return "";
    // Now we shoudl get field code of REF field.
    String refFieldCode = "";
    Node currentNode = refStart;
    while (currentNode != null &&
        currentNode.getNodeType() != NodeType.FIELD_SEPARATOR &&
        currentNode.getNodeType() != NodeType.FIELD_END)
    {
        if (currentNode.getNodeType() == NodeType.RUN)
            refFieldCode += ((Run) currentNode).getText();
        // Move to the next node.
        currentNode = currentNode.getNextSibling();
    }
    // Field code of REF field looks like th efollowing:
    // REF
    // We need to get ref_field_name, we will use the following regular expression to achieve this.
    Pattern regex = Pattern.compile("\\s*REF\\s+(\\S+)\\s+.+");
    Matcher match = regex.matcher(refFieldCode);
    match.find();
    return match.group(1);
}

Hope this helps.
Best regards.

Hi, Alexey,

thank you for the prompt response.

It seems I haven’t made myself clear. Sorry for that.

In the document attached to my first post there are 4 paragraphs of standard text. In the two paragraphs containing “Example text[1].” and “Another example text[2].” are crossreferences made to the two items in the list in the bibliography section.

These crossreferences use a REF-Field which contains the ID of the paragraph they are referring to. I can read this ID from the REF-Field just fine. In order to find out, which paragraph the REF-Field is referring to, I need a method of generating this ID from a paragraph (so that I can verify that, e.g., the first paragraph containing a REF-Field is actually referring to the first paragraph in the “Bibliography” section). The textual content of the REF-Field is no help, since the author of the text could have changed this manually.

Can you think of a way to generate the ID of these paragraphs in the “Bibliography” section so that I can compare them to the contents of the REF-Fields?

Best regards,

Martin

Hi Martin,

Thank you for additional information. Field code of REF field looks like this:
{ REF _Ref255459859 \n \h }
Where “_Ref255459859” is a name of the bookmark, but not a paragraph ID. There is no way to calculate this name from a paragraph. MS Word just generated some kind of random name of bookmark. The only requirement is that there were no bookmarks with the same name.
Best regards.

Hello, Alexey,

thanks for the prompt reply.

I am a bit dissatisfied with your reply. Somehow there must be a connection between the bookmark and the element it is linking to. Otherwise I would not be able to update the field contents with the correct list number.
But no one seems to know the mechanism this is working with.

Thanks all the same.

Best regards, Martin

p.s. Should I stumble the mechanics of this I will let you know.

Hi

I do not think that there is some mechanism. MS Word just generates random name of bookmarks and places it where it is necessary.
Best regards.

Hello, Alexey,

I feel a bit stupid now that I understand what you wanted to tell me all the time.

Looking at the document structure I completely overlooked the name field of the Bookmark-nodes.
Now I finally understand that a Ref-field is linking to the content of one bookmark and all is fine.
Coding the check I need is easy now…

Sorry for inconveniencing you.

Best regards, Martin

Hi Martin,

It is perfect that you resolved your issue.
Best regards.