Ignore Cross References Fields in Word Document during Find Range & Replace | C# .NET

Hi Team,

I’m performing replace operation where the cross references is also considered as text for replace how can we ignore cross references and hyperlinks.
where I need to capitalize first letter of word when followed by number. Else decapitalize.

The text which I found in Range.Text
(See Section REF _Ref19104241 \r \h * MERGEFORMAT 6.1 for additional information)

Actual text
(See Section 6.1 for additional information)

@kkumaranil485,

Please compress the following resources into ZIP format and attach the .zip file here for testing:

  • A simplified source Word document
  • Aspose.Words v21.8 generated DOCX file showing the undesired behavior
  • Your expected DOCX file showing the desired output. You can create this output file manually by using MS Word.
  • Please also create a standalone simplified Console Application (source code without compilation errors) that helps us to reproduce this problem on our end and attach it here for testing. Please do not include Aspose.Words DLL files in it to reduce the file size.

As soon as you get these pieces of information ready, we will then start investigation into your particular scenario and provide you code to achieve the same expected output by using Aspose.Words.

ErrorOutputDOC.docx (30.6 KB)
ExpectedOutput.docx (13.8 KB)
InputDoc.docx (13.7 KB)

Please refer to below code:

using Aspose.Words;
using Aspose.Words.Drawing;
using Aspose.Words.Fields;
using Aspose.Words.Lists;
using Aspose.Words.Replacing;
using Aspose.Words.Tables;
using System;
using System.Collections;
using System.Collections.Generic;
using System.Diagnostics;
using System.Drawing;
using System.Globalization;
using System.IO;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;
using System.Threading;
using System.Threading.Tasks;
using System.Windows.Controls;
using System.Xml.Linq;

namespace AsposeLibraryWord
{
    class Program
    {
        static void Main(string[] args)
        {
            string fileName = @"D:/PWS/Automation/BugsTest.docx";

            // Load the document from the absolute path on disk.
            Document doc = new Document(fileName);
            
            var wholeLowerCase = "section";
            var firstletterUpperCase = "Section";

            DocumentBuilder builder = new DocumentBuilder(doc);
            foreach (Paragraph p in doc.GetChildNodes(NodeType.Paragraph, true))
            {
                //string firstletterUpperCase = Thread.CurrentThread.CurrentCulture.TextInfo.ToTitleCase(NoCaptilizeNumberWord);
                //string wholeLowerCase = NoCaptilizeNumberWord.ToLower();

                FindReplaceOptions options = new FindReplaceOptions();
                options.UseSubstitutions = true;
                options.IgnoreDeleted = true;
                //options.IgnoreFields = true;
                //options.FindWholeWordsOnly = false;
                options.ApplyFont.HighlightColor = Color.Red;

                if (p.ParagraphFormat.StyleIdentifier != StyleIdentifier.Heading1 &&
                  p.ParagraphFormat.StyleIdentifier != StyleIdentifier.Heading2 &&
                  p.ParagraphFormat.StyleIdentifier != StyleIdentifier.Heading3 &&
                  p.ParagraphFormat.StyleIdentifier != StyleIdentifier.Heading4 &&
                  p.ParagraphFormat.StyleIdentifier != StyleIdentifier.Heading5 &&
                  p.ParagraphFormat.StyleIdentifier != StyleIdentifier.Heading6 &&
                  p.ParagraphFormat.StyleIdentifier != StyleIdentifier.Heading7 &&
                  p.ParagraphFormat.StyleIdentifier != StyleIdentifier.Heading8 &&
                  p.ParagraphFormat.StyleIdentifier != StyleIdentifier.Heading9 &&
                  p.ParagraphFormat.StyleIdentifier != StyleIdentifier.Title &&
                  p.ParagraphFormat.StyleIdentifier != StyleIdentifier.Subtitle &&
                  p.ParagraphFormat.StyleIdentifier != StyleIdentifier.Caption)
                {
                    //pattern at begining of sentence/paragraph it cover [Subjects-12  Subject(s) Subject, Subject. Subject's. subject] will capitalize
                    string startOfParagraph = string.Format(@"(^|(\.\s))\b{0}(s)?\b", wholeLowerCase);
                    p.Range.Replace(new Regex(startOfParagraph), "$1" + firstletterUpperCase + "$3", options);

                    //mid of pararagh followed by number capitalize
                    string midParagraphFollowedByNumber = string.Format(@"(?<!^|(\.\s))\b{0}(s)?(\s+)?(\d)", wholeLowerCase);

                    //mid of pararagh followed by number capitalize either space or nonbreaking space
                    //string midParagraphFollowedByNumber = string.Format(@"(?<!^|(\.\s))\b{0}(s)?([\s+|\u00A0])?(\d)", wholeLowerCase);
                    p.Range.Replace(new Regex(midParagraphFollowedByNumber), "$1" + firstletterUpperCase + "$2 $4", options);

                    //mid of pararagh not followed by number de-capitalize only characters
                    string midParagraphNotFollowedByNumber = string.Format(@"(?<!^|(\.\s))\b{0}(s)?(\s+)([a-zA-Z])", firstletterUpperCase);
                    p.Range.Replace(new Regex(midParagraphNotFollowedByNumber), "$1" + wholeLowerCase + "$2$3$4", options);

                    //mid of pararagh not followed by number de-capitalize only special characters
                    string midParagraphNotFollowedByNumberAndSpecial = string.Format(@"(?<!^|(\.\s))\b{0}(s)?(\S)(\w)", firstletterUpperCase);
                    p.Range.Replace(new Regex(midParagraphNotFollowedByNumberAndSpecial), "$1" + wholeLowerCase + "$2$3$4", options);

                    //end Of Sentence i.e: secapitalize the Sponsor. or sponsor,
                    string endOfSentence = string.Format(@"{0}(s)?(\.|,)", firstletterUpperCase);
                    p.Range.Replace(new Regex(endOfSentence), wholeLowerCase + "$1$2", options);

                }
                else
                {
                    //wherever in tilte subtitle or caption capitalize for letter of word
                    //p.Range.Replace(wholeLowerCase, firstletterUpperCase, options);

                }

            }

            string dataDir = @"D:/PWS/Automation/ConvertedDOC.docx";
            doc.Save(dataDir);

        }
    }
}

@kkumaranil485,

We are checking this scenario and will get back to you soon.

Hi Team,

Any update on this please?

@kkumaranil485,

It seems that in this case the FindReplaceOptions.IgnoreFields property is currently not working as expected. We have logged this problem in our issue tracking system. Your ticket number is WORDSNET-22686. You will be informed here as soon as we have any further updates on WORDSNET-22686. Sorry for the inconvenience.

@kkumaranil485,

In the meantime while you are waiting for the final resolution, please try Document.UnlinkFields Method as a workaround. We are also investigating if we should change the behavior of FindReplaceOptions.IgnoreFields Property or not? We will keep you posted here on any further updates.

The issues you have found earlier (filed as WORDSNET-22686) have been fixed in this Aspose.Words for .NET 21.11 update also available on NuGet.