Range.Replace not working properly with breaks

I am trying to replace ‘Dummy Data’ text to ‘Some new text’. Below is the original document text: \rSome text.\rThis is a demo application with Dummy\rData and real data. We also have other Dummy Data that are useful.

Code:

FindReplaceOptions options = new FindReplaceOptions(FindReplaceDirection.Forward)
        {
            FindWholeWordsOnly = true,
            MatchCase = false,
            LegacyMode = false
        };
doc.Range.Replace("Dummy Data", "Some new text", options);

As there are 2 instance of the searched text i.e. ‘Dummy Data’ in the document but the replace code only replaces the last instance and the first instance is left as it is because of that \r. Is there a way to handle this.

@23manognya Could you please attach your source document here for testing? We will check it and provide you more information.

Sample doc.pdf (6.8 KB)

Attached is the pdf file. This file we are converting to docx using Aspose.pdf save method with format as DocX and then trying to replace the word ‘Dummy Data’.

@23manognya As I can see replace works fine with your document. Here is a simple code I have used for testing:

Document doc = new Document(@"C:\Temp\in.pdf");
FindReplaceOptions options = new FindReplaceOptions(FindReplaceDirection.Forward);
options.ApplyFont.HighlightColor = Color.Yellow; // For demonstration purposes. 
doc.Range.Replace("Dummy Data", "Some new text", options);
doc.Save(@"C:\Temp\out.docx");

Actually we are converting it to docx and doing some other operations. So this replacement also we are doing after the conversion. Could you please try to convert the same to docx using Aspose and then try to replace in the docx file.

@23manognya The following code produces exactly the same result:

Document doc = new Document(@"C:\Temp\in.pdf");
doc.Save(@"C:\Temp\tmp.docx");

doc = new Document(@"C:\Temp\tmp.docx");
FindReplaceOptions options = new FindReplaceOptions(FindReplaceDirection.Forward);
options.ApplyFont.HighlightColor = Color.Yellow; // For demonstration purposes. 
doc.Range.Replace("Dummy Data", "Some new text", options);
doc.Save(@"C:\Temp\out.docx");

Could you please attach your intermediate DOCX document here for testing?

@alexey.noskov Thanks for all your inputs. I checked the code and the issue is converting the pdf file to docx using Aspose.Pdf. If Aspose.Words is used to save the file as docx, then it is working fine.

Aspose.Pdf.Document pdfFile = new Aspose.Pdf.Document(@"C:\Temp\in.pdf");
Aspose.Pdf.DocSaveOptions saveOptions = new Aspose.Pdf.DocSaveOptions
{
    Format = Aspose.Pdf.DocSaveOptions.DocFormat.DocX
};
pdfFile.Save(@"C:\Temp\tmp.docx", saveOptions);

@23manognya The problem occurs because Aspose.PDF unnecessarily puts each line into a separate paragraph:

If you would like to replace a paragraph break, you should use &p metacharacter. Please see our documentation for more information:
https://reference.aspose.com/words/net/aspose.words/range/replace/

You can modify your code like the following to get the desired result:

Aspose.Pdf.Document pdfFile = new Aspose.Pdf.Document(@"C:\Temp\in.pdf");
Aspose.Pdf.DocSaveOptions saveOptions = new Aspose.Pdf.DocSaveOptions
{
    Format = Aspose.Pdf.DocSaveOptions.DocFormat.DocX
};
pdfFile.Save(@"C:\Temp\tmp.docx", saveOptions);

Document doc = new Document(@"C:\Temp\tmp.docx");
FindReplaceOptions options = new FindReplaceOptions(FindReplaceDirection.Forward);
options.ApplyFont.HighlightColor = Color.Yellow; // For demonstration purposes. 
doc.Range.Replace(new Regex("Dummy((&p)|(\\s))Data"), "Some new text", options);
doc.Save(@"C:\Temp\out.docx");

@alexey.noskov This is helpful and working. Either i will be using Aspose.Words to convert to docx or will replace all the spaces in the search string with the metacharacter you suggested. Thank You.

1 Like