Unable to find text which has a hyperlink on one of the words

Hello,
Please use the attached document as a sample. Sample Find.docx (24.2 KB)

The file has two paragraphs. When trying to find paragraph:
“A paragraph with a Click Me link in the sentence.”
None of the FindReplaceOptions settings work. Is there some specific way for finding a sentence or text, which has a link in it or is this a known issue ?

Searching for this paragraph text works fine.
“A paragraph with no link in the sentence.”

Thank you !

Hi, @kml2020

I tried this snippet of code and it worked as you expected.

// Open the source word file using Document class
Document wordDoc = new Document(".\\Sample Find.docx");

// Initialize FindReplaceOptions class object to replace text string
FindReplaceOptions options = new FindReplaceOptions();

// Somehow this method allowed the hyperlink to be ignored.
options.setIgnoreFieldCodes(true);

// Find and Replace the text
wordDoc.getRange().replace("A paragraph with a Click Me link in the sentence.", "Replaced text", options);

// Save the replaced text result
wordDoc.save("<your_output_file_path>\\<file_name.docx>");

It seems that the method FindReplaceOptions.setIgnoreFieldCodes did the trick, but I it would be interesting to test other scenarios besides plain text and hyperlink.

I hope it helps.

Regards

1 Like

Hello @rogerg, I had both ignoreFieldCodes and ignoreFields as true. And it did not work. However, setting ignoreFieldCodes only to true and leaving ignoreFields as false, did work as you have noted.

Going through the documentation, I could not see, why setting ignoreFieldCodes only would work. Could you please provide a bit more details on these two flags ?

Also, I would like to confirm, what is the Aspose version where these two flags were introduced.

I am attaching a document, which has some data field bindings, where the FindReplaceOption is not returning the result.
“This Supplier Agreement (the “Agreement”), is made this [EffectiveDate] (the “Effective Date”) between ABC (“ABC”) .”

TestFindReplace DataFields Sample.docx (33.5 KB)

I could not see any options in the FindReplaceOption class, that seem to accommodate searching on this.

Best Wishes !

1 Like

@kml2020 Fields in MS Word documents are represented with special FieldStart, FieldSeparator, FieldEnd nodes and content between them. Content between FieldStart and FieldSeparator represents field code, which is normally not displayed in MS Word unless you press Alt+F9. Content between FieldSeparator and FieldEnd represents field result - the displayed value. See our documentation to learn more about fields.
When you use ignoreFieldCodes option field code is ignored, but when you use ignoreFields option whole field is ignored including it’s displayed result.

ignoreFieldCodes flag was introduced in 21.11 version of Aspose.Words. See release notes for more information.

The provided text in MS Word document contain structured document tag. I have managed to reproduce the problem with SDT and logged it as WORDSNET-24241. We will keep you updated lad let you know once it is resolved or we have more information for you.

@alexey.noskov Thanks for helping.

1 Like

Hi @alexey.noskov, Thank you for providing the additional details and that helps understand the properties better.

What I found is with the new properties, if searching for a text “abc” that is present in the body and also in header/footer, the occurrence in header or footer is ignored in the count returned by findReplaceOptions. But this was included in the previous aspose versions.

That is, if we use legacyMode(true) all text gets included. I assume the legacyMode property is provided for that reason ? But than we lose the ability to use ignoreFieldCodes & ignoreFields.

I tried using the ignoreFootnotes(false), but that does not seem to help including text in footer in the search results.

@kml2020 Could you please attach your sample document here for testing? We will check the issue and provide you more information.
I have tested with a simple document with "abc" text in the document’s body and header and both occurrences are properly replaced and counted. Here is my code for testing:

Document doc = new Document("C:\\Temp\\in.docx");
FindReplaceOptions options = new FindReplaceOptions();
options.setIgnoreFieldCodes(true);
int count = doc.getRange().replace("abc", "replaced", options);
System.out.println(count);

@alexey.noskov, Please find attached the sample test document. TestFindReplace DataFields Sample.docx (30.9 KB)

I can see the difference is not due to text in Footer, but the hyperlink itself has a string ( abc, which is part of search string ).

  • With legacyMode = true, the search count is 6
  • If we have ignoreFieldCodes = false and legacyMode = false, the search count is 6.
  • With ignoreFieldCodes = true the search count is 4

However, in this case, ( as was the intent of the original post ), we do need ignoreFieldCodes = true. Because, only then the search will find a paragraph or sentence which has a hyperlink style present.

For example ( in the same attached document ) search will only work with ignoreFieldCodes = true:
“NOW, THEREFORE, in consideration of the promises with ABC Company herein, the parties mutually agree as follows”

This should suffice.

Although we do have a blocking open bug ( WORDSNET-24241 ) it seems where the search on a sentence with content control or data binding does not seem to give results. For example, search string:
“these Services to Client[CustomerName] based out of”

@kml2020 Thank you for additional information.

The behavior is correct. If you press Alt+F9 in MS Word you will see field codes and note that they contain "abc" string, like this:
{ HYPERLINK "http://www.abc.xyz/" }
When you enable ignoreFieldCodes = true flag field codes are ignored and there are only 4 occurrences. When this option is disabled, occurrences from field codes are also counted and there are 6 occurrences.

Yes, this issue is currently in the queue for analysis. We will keep you updated and let you know once it is resolved or we have additional information for you.

hi @alexey.noskov, Do we have any update on WORDSNET-24241 from your analysis on this gap?

@kml2020 The issue is currently in the queue for analysis, We will be sure to update you once the issue is resolved or we have more information for you.
As a temporary workaround, you can remove content controls leaving their content untouched. For example see the following code:

Document doc = new Document(@"C:\Temp\in.docx");

// Remove content controls leaving their content untouched.
doc.GetChildNodes(NodeType.StructuredDocumentTag, true)
    .Cast<StructuredDocumentTag>().ToList().ForEach(sdt => sdt.RemoveSelfOnly());

// This paragraph contains SDT.
Paragraph p = (Paragraph)doc.FirstSection.Body.GetChild(NodeType.Paragraph, 3, true);
string textToSearch = p.ToString(SaveFormat.Text).Trim();
Console.WriteLine(textToSearch);

FindReplaceOptions options = new FindReplaceOptions();
options.IgnoreFieldCodes = true;
doc.Range.Replace(textToSearch, "Replaced text", options);

doc.Save("C:\\Temp\\out.docx");

Hi @alexey.noskov, thank you for the workaround suggestion. However, this will mutate the document that is being read, since the structured document tags are removed. And we need to further work on document, by adding comments or paragraphs. But here since the structured document tags are removed, there is no way of inserting them back after the operation.

@kml2020 The issue is already resolved in the current codebase. The fix will be included into the next 22.10 version of Aspose.Words. We will be sure to inform you once it is available.
The following option will be added into FindReplaceOptions class:

/// <summary>
/// Gets or sets a boolean value indicating either to ignore content of <see cref="StructuredDocumentTag"/>.
/// The default value is <c>false</c>.
/// </summary>
/// <remarks>
/// <para>
/// When this option is set to <c>true</c>, the content of <see cref="StructuredDocumentTag"/>
/// will be treated as a simple text.
/// </para>
/// <para>
/// Otherwise, <see cref="StructuredDocumentTag"/> will be processed as standalone Story
/// and replacing pattern will be searched separately for each <see cref="StructuredDocumentTag"/>,
/// so that if pattern crosses a <see cref="StructuredDocumentTag"/>, then replacement will not
/// be performed for such pattern.
/// </para>
/// </remarks>
public bool IgnoreStructuredDocumentTags { get; set; }

The issues you have found earlier (filed as WORDSNET-24241) have been fixed in this Aspose.Words for Java 22.10 update also available on Maven.