Extract Reference broken cases in word document

Hi,

  1. I have a reference added to particular text (1.4) in the document and the ref target is set to numberedlist in the document. As shown in the below screenshot -

  2. Now after adding the cross reference, I am editing the same document. At the place of numberedlist β€˜1.4’ I am adding one more line (This is inserted text) and making this line as numberedlist so that earlier target (in step 1) got shifted to next line (i.e., 1.5). (As shown in below screenshot)

After performing above 2 steps, if i click on the cross-referenced text (Section 1.4) then it will take me to numberedlist item β€˜1.5’.

This case is considered as Reference broken. To handle this, I want to read the numberedlist value of target reference (bookmark) (i.e., 1.5) compare it with the field reference display text (1.4). And if both are different that means Reference is broken and i can flag that case. Could you please help me to achieve the above. Thanks !

@KCSR Could you please attach your input and output documents along with code that will allow us to reproduce the problem? We will check the issue and provide you more information. Unfortunately, it is impossible to analyze the issue using screenshots. but from what I see I suspect that the bookmark has been moved into the 1.5 item, so the reference points to it.

Hi @alexey.noskov, Please find attached the test document.
SampleListItemTestDocument.docx (19.8 KB)

Yes, now the bookmark is moved to 1.5 item and i am trying to pull such reference broken cases in the document.

Please let me know if you need more details. Thanks!

@alexey.noskov - looks like I found a way to achieve this.

Could you please let me know how can we find start and end index of a particular text in document?

We identify certain text pattern in document from backend service and send the list of result text list to front end. Front end has to navigate to that particular text in the document so they would need the start and end index to navigate to that particular text which needs to be sent from backend side.

Thanks.

@KCSR Thank you for additional information. You can simply update REF field and check whether it’s displayed text has been changed. For example see the following code:

Document doc = new Document(@"C:\Temp\in.docx");

List<FieldRef> refFields = doc.Range.Fields.Where(f => f.Type == FieldType.FieldRef)
    .Cast<FieldRef>().ToList();

foreach (FieldRef r in refFields)
{
    string oldText = r.DisplayResult;
    // Update the field.
    r.Update();
    string newText = r.DisplayResult;
    // Check whether refernce field displaed text has been changed. 
    if (oldText != newText)
        Console.WriteLine($"Reference text has been changed from \"{oldText}\" to \"{newText}\"");
}

// The output document will have the correct (updated) value of the REF field.
doc.Save(@"C:\Temp\out.docx");

It is not quite clear what you mean. There is no index based navigating within document in Aspose.Words. The document is represented as a tree of nodes.

Hi @alexey.noskov - Great, this is simple way to achieve my requirement, and I was using complex way to do this!

Could you please let me know if we could find the paragraph number of the referenced fields that has been changed (in your code the paragraph number of β€˜FieldRef r’ or β€˜oldText’)? And also, in that paragraph if we have same text occurrences multiple times then which occurrence of that text has changed. Thanks

@KCSR Do you mean index of the paragraph within the document? If so you can use IndexOf:

Document doc = new Document(@"C:\Temp\in.docx");

// Get paragraphs.
NodeCollection paragraphs = doc.GetChildNodes(NodeType.Paragraph, true);

// Get index of the particular paragraph.
// For demonstration purposes get index of the last paragraph 
Console.WriteLine(paragraphs.IndexOf(doc.LastSection.Body.LastParagraph));

If it is required to get the list label of the paragraph, you can use Paragraph.ListLabel property. But please note before using this property you should call Document.UpdateListlabels method.

Hi @alexey.noskov -

I need the Paragraph Index and Paragraph Text of the parent paragraph of FieldRef β€˜r’ (paragraph in which that field reference exists) -
This line is from your previous code β†’ foreach (FieldRef r in refFields)

Thanks

I think this will work - paragraphs.IndexOf(reference.Start.ParentParagraph).ToString()

Thanks!

@KCSR Yes, your code is correct. It get the parent paragraph of the field. Having the paragraph you can get it’s index and text.

1 Like

@alexey.noskov - Thanks a lot again for helping on this. Much appreciated!

1 Like

@alexey.noskov - Could you please let me know if there is easy way to pull the parent paragraph of the target cross reference?

In your previous code -
oldText is the text where we are adding the cross reference.
newText is the text that is getting updated if the referenced target is updated to new.

I was able to retrieve the parent paragraph of text where we are adding the cross reference. Now i am trying to retrieve the parent paragraph of the referenced target. Thanks.

@KCSR Sure, you can get the target bookmark and then get it’s parent paragraph:

Document doc = new Document(@"C:\Temp\in.docx");
doc.UpdateListLabels();

List<FieldRef> refFields = doc.Range.Fields.Where(f => f.Type == FieldType.FieldRef)
    .Cast<FieldRef>().ToList();

foreach (FieldRef r in refFields)
{
    Bookmark refBookmark = doc.Range.Bookmarks[r.BookmarkName];
    if (refBookmark != null)
    {
        Paragraph targetParagraph = (Paragraph)refBookmark.BookmarkStart.GetAncestor(NodeType.Paragraph);
        Console.WriteLine(targetParagraph.ToString(SaveFormat.Text).Trim());
    }
}

Much love, thanks @alexey.noskov !!

1 Like

Hi @alexey.noskov - Just checking to understand what happens when we perform β€œr.Update()”, i see that reference displayresult is getting updated to the correct target reference and when we compare the old and next text we will be able to say the reference has been changed. Just for my understanding purpose to know what is happening in background when we exectue β€œr.Update()”. Thanks.

@KCSR Field.Update() method performs the field update operation, i.e. updates it’s displayed text to an actual value.

1 Like