Placeholder not replaced completely

Hey folks,

I encountered an issue when replacing a placeholder string in a .docx document (created with Microsoft Word) with a table by using Aspose.Words for Java.

The content of the document looks like this:

<owner>
<product>
<reportTable>
<reportCreator>
<dateOfReportCreation>

I want to replace “” with a table by using this expression:

FindReplaceOptions options = new FindReplaceOptions(new ProductChangeTableReplacingCallback(productChanges));
document.getRange().replace("<reportTable>", "", options);

The resulting table is inserted as expected, but the placeholder is not removed entirely. Instead, just the opening bracket is gone, “reportTable>” remains and appears after the table. When using LibreOffice to create the text document, the problem doesn’t seem to occur. However, many of my colleagues are relying Word, so switching the editor is not really an option.

As a temporary workaround, we check if parts of the placeholder are remaining after the replacement and delete them afterwards:

document.getRange().replace("reportTable>", "", new FindReplaceOptions());

Of course this should not be the permament solution, since it is fragile and ugly.

I’d appreciate any help to get this issue fixed, thanks in advance!
Henry

@GeoHenry

What is happening?

When a .docx file is created in Microsoft Word the placeholder

<reportTable>

is often broken into separate runs (Word’s internal “pieces” of a paragraph).
Typical reasons are:

  • automatic formatting of the “<” character,
  • a different style/character format applied to the opening bracket,
  • hidden Word markup (e.g. a soft‑line‑break) that Word inserts after “<”.

Aspose.Words processes a replace operation on a run‑by‑run basis.
If the text you are looking for spans several runs, only the part that is
contained in a single run is removed – in your case the “<” run is
deleted and the remaining “reportTable>” run stays behind the inserted
table.

The document created with LibreOffice does not split the placeholder,
therefore the replace works there.


How to make the replace work reliably

1. Tell Aspose.Words to ignore run‑to‑run formatting differences

FindReplaceOptions options = new FindReplaceOptions(new ProductChangeTableReplacingCallback(productChanges));

// Important:
options.setIgnoreFormatting(true);   // <‑‑ let the engine match across runs
options.setMatchCase(true);          // optional, keep case‑sensitive matching
options.setFindWholeWords(false);    // default, but left here for clarity

document.getRange().replace("<reportTable>", "", options);

Setting IgnoreFormatting to true makes the search engine treat the
whole placeholder as a single logical string, even if Word split it into
multiple runs. The callback (ProductChangeTableReplacingCallback) is still
invoked only once, so your table insertion logic stays unchanged.

2. (Alternative) Use a regular‑expression that matches the placeholder

regardless of run breaks. Aspose.Words can treat the search pattern as a
regex:

FindReplaceOptions options = new FindReplaceOptions(new ProductChangeTableReplacingCallback(productChanges));
options.setUseRegex(true);               // enable regex mode
options.setIgnoreFormatting(true);       // still needed for cross‑run matches

// The pattern matches the placeholder even if it is split into runs
String pattern = "(?s)<reportTable>";
document.getRange().replace(pattern, "", options);

The (?s) flag makes the dot match new‑line characters, which helps when a
soft line‑break is inserted between the runs.

3. (Optional) Clean up any stray characters

If you still see stray characters after the replacement (which can happen
only for very unusual Word documents), you can run a second, very simple
replace:

document.getRange().replace("reportTable>", "", new FindReplaceOptions());

This line should never be required when IgnoreFormatting is set
correctly, but it can be kept as a safety net.


Full example (Java)

// Load the document created in Microsoft Word
Document document = new Document("Input.docx");

// Callback that inserts your table
ReplaceCallback callback = new ProductChangeTableReplacingCallback(productChanges);

// Configure FindReplaceOptions
FindReplaceOptions options = new FindReplaceOptions(callback);
options.setIgnoreFormatting(true);   // <-- critical
options.setMatchCase(true);          // optional
options.setUseRegex(false);          // plain text search

// Perform the replacement
document.getRange().replace("<reportTable>", "", options);

// Save the result
document.save("Output.docx");

Why IgnoreFormatting solves the problem

Setting Effect on the search engine
Default (false) The engine matches only within the same run; different runs break the match.
true Formatting differences (font, style, run boundaries) are ignored – the text is treated as a continuous string.

Because the placeholder is a simple literal (no special regex symbols), you do not need any additional escaping.


Additional resources

If you have any further questions (e.g., the placeholder still appears split in a particular document), please share a minimal reproducible sample (the .docx file) and we’ll be happy to dive deeper.

@GeoHenry Most likely, the problem occurs because the placeholder consists of several Run nodes, but in your IReplacingCallback implementation only the first Run node is processed and replaced. There are two ways to resolve this problem:

  1. You can modify your IReplacingCallback implementation to process all matched nudes. For example see the following implemetation:
public class ReplacingCallbackReplaceWithImage  implements IReplacingCallback {
    
    /**
     * This method is called by the Aspose.Words find and replace engine for each match.
     */
    @Override
    public int replacing(ReplacingArgs args) throws Exception {
        Document doc = (Document)args.getMatchNode().getDocument();
        ArrayList<Run> matchedRuns = GetMatchedRuns(args);
        
        // Create DocumentBuilder to insert the image.
        DocumentBuilder builder = new DocumentBuilder(doc);
        // Move builder to the first run.
        builder.moveTo(matchedRuns.get(0));
        // Insert Image.
        builder.insertImage(args.getReplacement());
        
        // Delete matched runs
        for (Run run : matchedRuns)
            run.remove();
        
        // Signal to the replace engine to do nothing because we have already done all what we wanted.
        return ReplaceAction.SKIP;
    }
    
    private static ArrayList<Run> GetMatchedRuns(ReplacingArgs args)
    {
        // This is a Run node that contains either the beginning or the complete match.
        Node currentNode = args.getMatchNode();
        
        // The first (and may be the only) run can contain text before the match,
        // in this case it is necessary to split the run.
        if (args.getMatchOffset() > 0)
            currentNode = splitRun((Run)currentNode, args.getMatchOffset());
        
        // This array is used to store all nodes of the match for further deleting.
        ArrayList<Run> runs = new ArrayList<Run>();
        
        // Find all runs that contain parts of the match string.
        int remainingLength = args.getMatch().group().length();
        while (
                remainingLength > 0 &&
                        currentNode != null &&
                        currentNode.getText().length() <= remainingLength)
        {
            runs.add((Run)currentNode);
            remainingLength -= currentNode.getText().length();
            
            // Select the next Run node.
            // Have to loop because there could be other nodes such as BookmarkStart etc.
            do
            {
                currentNode = currentNode.getNextSibling();
            } while (currentNode != null && currentNode.getNodeType() != NodeType.RUN);
        }
        
        // Split the last run that contains the match if there is any text left.
        if (currentNode != null && remainingLength > 0)
        {
            splitRun((Run)currentNode, remainingLength);
            runs.add((Run)currentNode);
        }
        
        return runs;
    }
    
    private static Run splitRun(Run run, int position)
    {
        Run afterRun = (Run)run.deepClone(true);
        run.getParentNode().insertAfter(afterRun, run);
        afterRun.setText(run.getText().substring(position));
        run.setText(run.getText().substring(0, position));
        return afterRun;
    }
}

The above implementation is for replacing placeholder with an image, but for the table the idea is the same.

  1. Replace placeholders with themselves before actual processing. Such preprocessing will make all the placeholders to be represented with a single Run node:
Document doc = new Document("C:\\Temp\\in.docx");
// Replace placeholders in the document tp make them to be represented as a single run.
FindReplaceOptions tmpOptions = new FindReplaceOptions();
tmpOptions.setUseSubstitutions(true);
doc.getRange().replace(Pattern.compile("<\\w+>"), "$0", tmpOptions);
    
// Here is the actual processing.
// ....................

Many thanks for the replies!

We are using aspose-words version 17.2.0, the object FindReplaceOptions only provides the setter setDirection(), so there is no way in specifying additional options for us.

However, after reading your replies, I played around with placholder names a little. It seems that our version of Aspose is quite picky when it comes to the beginning of a placeholder. These variants failed: <..., {..., __.... Just starting with letters and using _ as word delimiters works.

I think this is okay for us, thank you for your suggestions anyway :wink:

1 Like