Malformed html in IReplacingCallback | Replace Text in Word Document while Ignoring the Meta Characters C# .NET

Hello,

We use Aspose.Doc IReplacingCallback to replace placeholders in our document with html.
The problem is if our html contains “ characters they appear as dquo in the resulting document.
Note that ” characters work correctly. Please see the attachment for the example.
AsposeDocQuotesTest.zip (163.2 KB)
.

@gconnect,

Thanks for your inquiry. We have tested the scenario and have managed to reproduce the same issue at our side. For the sake of correction, we have logged this problem in our issue tracking system as WORDSNET-15840. You will be notified via this forum thread once this issue is resolved. We apologize for your inconvenience.

Please use following code example as a workaround of this issue. Hope this helps you.

string html = @"<p>&ldquo;Tax-free intra community supply&rdquo;</p>";

// Initialize a Document.
Document doc = new Document();

// Use a document builder to add content to the document.
DocumentBuilder builder = new DocumentBuilder(doc);
builder.Write("{PLACEHOLDER}");

var findReplaceOptions = new FindReplaceOptions
{
    ReplacingCallback = new FindAndInsertHtml(html)
};

doc.Range.Replace("{PLACEHOLDER}", "", findReplaceOptions);

public class FindAndInsertHtml : IReplacingCallback
{
    private string html;
    public FindAndInsertHtml(string htmlstring)
    {
        html = htmlstring;
    }
    /// This method is called by the Aspose.Words find andreplace engine for each match.
    ReplaceAction IReplacingCallback.Replacing(ReplacingArgs e)
    {
        // This is a Run node that contains either the beginning or the complete match.
        Node currentNode = e.MatchNode;

        // The first(and may be the only) run can contain text before the match,
        // in this case it is necessary to split the run.
        if (e.MatchOffset > 0)

        {
            currentNode = SplitRun((Run)currentNode, e.MatchOffset);
        }

        // This array is used to store all nodes of the match for further removing.
        ArrayList runs = new ArrayList();

        // Find all runs that contain parts of the match string.
        int remainingLength = e.Match.Value.Length;

        while (
            remainingLength > 0 &&
            currentNode != null &&
            currentNode.GetText().Length <= remainingLength)
        {
            runs.Add(currentNode);

            remainingLength = remainingLength - currentNode.GetText().Length;

            // Select the next Run node.
            // Have to loop because there could be other nodes such as BookmarkStart etc.
            do
            {
                currentNode = currentNode.NextSibling;
            }

            while (currentNode != null && currentNode.NodeType != NodeType.Run);
        }

        // Split the last run that contains the match if there is any text left.
        if (currentNode != null && remainingLength > 0)
        {
            SplitRun((Run)currentNode, remainingLength);

            runs.Add(currentNode);
        }

        // create Document Buidler and insert MergeField
        DocumentBuilder builder = new DocumentBuilder(e.MatchNode.Document as Document);

        builder.MoveTo((Run)runs[runs.Count - 1]);

        builder.InsertHtml(html);

        // Now remove all runs in the sequence.
        foreach (Run run in runs)
        {
            run.Remove();
        }

        //Signal to the replace engine to do nothing because we have already done all what we wanted.
        return ReplaceAction.Skip;
    }

@gconnect,

Thanks for your patience. It is to inform you that the issue which you are facing is actually not a bug in Aspose.Words. The reason for the issue is that &l is reserved for line break. Please try Unicode &#x201C for this character. You may use the shared code in my previous post to get the desired output.

Could you please share your use case detail why you use such replacing callback? Thanks for your cooperation.

@gconnect,
The issues you have found earlier (filed as WORDSNET-15840) have been fixed in this Aspose.Words for .NET 18.1 update and this Aspose.Words for Java 18.1 update.
Please also check the following articles:

@gconnect,

Please use the latest version of Aspose.Words for .NET or Aspose.Words for Java APIs to replace text in Word document while ignoring the meta-characters. The following C# code example explains how to replace text while ignoring the meta-characters:

Document doc = new Document();
DocumentBuilder builder = new DocumentBuilder(doc);
builder.Write("text&plain");
doc.Range.Replace("&&", " & ");
Console.WriteLine(doc.GetText()); // The output is: text & plain\f
doc.Save("C:\\temp\\20.9.docx");