How to remove special formatting symbol in MS Word

arunavayyala · July 13, 2023, 7:31am

Hi, I have a document which requires formatting and there is a special formatting character which needs to be removed. How can i remove it?

alexey.noskov · July 13, 2023, 7:52am

@arunavayyala There is no way to remove formatting marks, you can only disable them in MS Word.

arunavayyala · July 13, 2023, 7:54am

@alexey.noskov even if i disable them, it appears as an additional line break. Is there a way to remove that ?Capture1.PNG (2.0 KB)

alexey.noskov · July 13, 2023, 7:59am

@arunavayyala This is paragraph break. You can remove empty paraphs using code like this:

Document doc = new Document(@"C:\Temp\in.docx");

// Remove empty paragraphs from the docoment.
doc.GetChildNodes(NodeType.Paragraph, true).Cast<Paragraph>().Where(p => !p.HasChildNodes)
    .ToList().ForEach(p => p.Remove());

doc.Save(@"C:\Temp\out.docx");

arunavayyala · July 14, 2023, 2:23am

Hi @alexey.noskov thank you for your reply, but its not exactly empty paragraphs we are dealing with. There is a line break “\r” or “\f” which needs be removed from the last paragraph of a document. screenshot-2 (3).png (13.2 KB)
screenshot-1 (7).png (241.3 KB)

alexey.noskov · July 14, 2023, 4:46am

@arunavayyala Could you please attach your input and expected output documents here for testing? We will check the documents and provide you more information.

arunavayyala · July 14, 2023, 5:26am

{{ts.Title.Description.docx}} this placeholders needs to be replaced with the document in the parentdoc. When replacing it always comes with an extra line. parentdoc.docx (26.0 KB)
ts.Title.Description.docx (14.6 KB)code.zip (1.4 KB)

alexey.noskov · July 14, 2023, 6:31am

@arunavayyala Please try using the following code to replace placeholder with a document content:

Document doc = new Document("C:\\Temp\\dst.docx");
    
FindReplaceOptions findReplaceOptions = new FindReplaceOptions();
findReplaceOptions.setDirection(FindReplaceDirection.BACKWARD);
findReplaceOptions.setReplacingCallback(new ReplaceWithDocumentCallback());
doc.getRange().replace("{{ts.Title.Description.docx}}", "C:\\Temp\\ts.Title.Description.docx", findReplaceOptions);
        
doc.save("C:\\Temp\\out.docx");

import com.aspose.words.*;
import java.util.ArrayList;

public class ReplaceWithDocumentCallback implements IReplacingCallback {
    
    /**
     * This method is called by the Aspose.Words find and replace engine for each match.
     */
    @Override
    public int replacing(ReplacingArgs e) throws Exception {
        
        Document doc = (Document)e.getMatchNode().getDocument();
        
        // This is a Run node that contains either the beginning or the complete match.
        Node currentNode = e.getMatchNode();
        
        // The first (and may be the only) run can contain text before the match,
        // in this case it is necessary to split the run.
        if (e.getMatchOffset() > 0)
            currentNode = splitRun((Run)currentNode, e.getMatchOffset());
        
        // This array is used to store all nodes of the match for further deleting.
        ArrayList<Run> runs = new ArrayList<Run>();
        
        // Find all runs that contain parts of the match string.
        int remainingLength = e.getMatch().group().length();
        while (
                remainingLength > 0 &&
                        currentNode != null &&
                        currentNode.getText().length() <= remainingLength)
        {
            runs.add((Run)currentNode);
            remainingLength -= currentNode.getText().length();
            
            // Select the next Run node.
            // Have to loop because there could be other nodes such as BookmarkStart etc.
            do
            {
                currentNode = currentNode.getNextSibling();
            } while (currentNode != null && currentNode.getNodeType() != NodeType.RUN);
        }
        
        // Split the last run that contains the match if there is any text left.
        if (currentNode != null && remainingLength > 0)
        {
            splitRun((Run)currentNode, remainingLength);
            runs.add((Run)currentNode);
        }
        
        // Create DocumentBuilder to insert Document.
        DocumentBuilder builder = new DocumentBuilder(doc);
        // Move builder to the first run.
        builder.moveTo(runs.get(0));
        builder.insertDocument(new Document(e.getReplacement()), ImportFormatMode.KEEP_SOURCE_FORMATTING);
        
        Paragraph currentParagraph = builder.getCurrentParagraph();
        // Delete matched runs
        for (Run run : runs)
            run.remove();
        // Remove current paragraph if it is empty.
        if(!currentParagraph.hasChildNodes())
            currentParagraph.remove();
        
        // Signal to the replace engine to do nothing because we have already done all what we wanted.
        return ReplaceAction.SKIP;
    }
    
    private static Run splitRun(Run run, int position)
    {
        Run afterRun = (Run)run.deepClone(true);
        run.getParentNode().insertAfter(afterRun, run);
        afterRun.setText(run.getText().substring(position));
        run.setText(run.getText().substring(0, position));
        return afterRun;
    }
}

out.docx (26.1 KB)

arunavayyala · July 17, 2023, 6:36am

Hi @alexey.noskov,
this solution works. Thank you.

arunavayyala · September 7, 2023, 5:51am

Hi @alexey.noskov,
in the above method to remove the matching runs, when we have a CurrentNode like this as below

Each period from, and including, a Fixed Amount Payer Period End Date to, but excluding, the next following applicable Fixed Amount Payer Period End Date, except that the initial Fixed Amount Payer Calculation Period will commence on, and include, {{EffectiveDate.txt#!FullCouponFix & IsAdjustedFix}}{{[FirstCalcPeriodStartDateFix]#FullCouponFix}}{{[EffectiveDateFormatted]#!FullCouponFix & !IsAdjustedFix}} and the final Fixed Amount Payer Calculation Period will end on, but exclude, {{TerminationDate.txt#IsAdjustedFix}}{{[TerminationDateFormatted]#!IsAdjustedFix}}.

when we remove the term {{EffectiveDate.txt#!FullCouponFix & IsAdjustedFix}}, the above text of the currentNode gets split into 2 nodes. so when we try to remove the next argument {{[FirstCalcPeriodStartDateFix]#FullCouponFix}}, its not found in the currentNode.

How do we make sure the node remains the same?

alexey.noskov · September 7, 2023, 6:20am

@arunavayyala Unfortunately, the match node might contain not only the searched text and vice versa the searched text might span several runs. That is why the code provided above splits the matched run(s) into parts. So there is no way to keep the original nodes structure exactly the same after performing replace operation.

arunavayyala · September 7, 2023, 7:30am

Hi @alexey.noskov,
is there any way to remove the matched text while preserving node structure, coz we need to remove all the matched nodes in the paragraph.

alexey.noskov · September 7, 2023, 11:12am

@arunavayyala You can set the matched runs text to empty string instead of removing them in IReplacingCallback. But still since the matched runs are split in the IReplacingCallback the original nodes structure will not be preserved. So I am afraid there is no way to preserve exact nodes structure after editing document.