@GeoHenry
What is happening?
When a .docx file is created in Microsoft Word the placeholder
<reportTable>
is often broken into separate runs (Word’s internal “pieces” of a paragraph).
Typical reasons are:
- automatic formatting of the “<” character,
- a different style/character format applied to the opening bracket,
- hidden Word markup (e.g. a soft‑line‑break) that Word inserts after “<”.
Aspose.Words processes a replace operation on a run‑by‑run basis.
If the text you are looking for spans several runs, only the part that is
contained in a single run is removed – in your case the “<” run is
deleted and the remaining “reportTable>” run stays behind the inserted
table.
The document created with LibreOffice does not split the placeholder,
therefore the replace works there.
How to make the replace work reliably
1. Tell Aspose.Words to ignore run‑to‑run formatting differences
FindReplaceOptions options = new FindReplaceOptions(new ProductChangeTableReplacingCallback(productChanges));
// Important:
options.setIgnoreFormatting(true); // <‑‑ let the engine match across runs
options.setMatchCase(true); // optional, keep case‑sensitive matching
options.setFindWholeWords(false); // default, but left here for clarity
document.getRange().replace("<reportTable>", "", options);
Setting IgnoreFormatting
to true makes the search engine treat the
whole placeholder as a single logical string, even if Word split it into
multiple runs. The callback (ProductChangeTableReplacingCallback
) is still
invoked only once, so your table insertion logic stays unchanged.
2. (Alternative) Use a regular‑expression that matches the placeholder
regardless of run breaks. Aspose.Words can treat the search pattern as a
regex:
FindReplaceOptions options = new FindReplaceOptions(new ProductChangeTableReplacingCallback(productChanges));
options.setUseRegex(true); // enable regex mode
options.setIgnoreFormatting(true); // still needed for cross‑run matches
// The pattern matches the placeholder even if it is split into runs
String pattern = "(?s)<reportTable>";
document.getRange().replace(pattern, "", options);
The (?s)
flag makes the dot match new‑line characters, which helps when a
soft line‑break is inserted between the runs.
3. (Optional) Clean up any stray characters
If you still see stray characters after the replacement (which can happen
only for very unusual Word documents), you can run a second, very simple
replace:
document.getRange().replace("reportTable>", "", new FindReplaceOptions());
This line should never be required when IgnoreFormatting
is set
correctly, but it can be kept as a safety net.
Full example (Java)
// Load the document created in Microsoft Word
Document document = new Document("Input.docx");
// Callback that inserts your table
ReplaceCallback callback = new ProductChangeTableReplacingCallback(productChanges);
// Configure FindReplaceOptions
FindReplaceOptions options = new FindReplaceOptions(callback);
options.setIgnoreFormatting(true); // <-- critical
options.setMatchCase(true); // optional
options.setUseRegex(false); // plain text search
// Perform the replacement
document.getRange().replace("<reportTable>", "", options);
// Save the result
document.save("Output.docx");
Why IgnoreFormatting
solves the problem
Setting |
Effect on the search engine |
Default (false ) |
The engine matches only within the same run; different runs break the match. |
true |
Formatting differences (font, style, run boundaries) are ignored – the text is treated as a continuous string. |
Because the placeholder is a simple literal (no special regex symbols), you do not need any additional escaping.
Additional resources
If you have any further questions (e.g., the placeholder still appears split in a particular document), please share a minimal reproducible sample (the .docx file) and we’ll be happy to dive deeper.