Question about FindAndReplace feature of Aspose.Words for Java

Pintutrnt · June 26, 2019, 10:41am

Hi Team,

I am using the FindAndReplace feature of Aspose word java. For this we are finding the tokens from a given template and replacing them with proper data. I have a use case wherein I need to extract the format (font size, type, color etc) of the token and then retain/apply the same on the replaced text and additionally set the Heading style so that the table of contents picks the replaced text. I don’t intend to apply any default format of Heading style. I just need the Heading style so that Table of Contents picks it up.

Below is the code snippet. The issue I am facing is that the font size is not being retained from my token font size. Instead default heading font size of 16 is being set. How can I control the font size as well? Or in general , is there an easier way to apply the token’s format as is (exactly) to the replaced text and then add additional formats/styles?

public String TOKEN_REGEX="(\|[a-zA-z]+[\{[a-zA-Z0-9\,\-\/\=\:\s*]\}]\|)";

public class FindAndReplaceCallBack implements IReplacingCallback {
@Override
public int replacing(ReplacingArgs e) throws Exception {

    String token = e.getMatch().group();
    DocumentBuilder builder = new DocumentBuilder((Document) e.getMatchNode().getDocument());
    logger.info("token : " + token);
    logger.info("Match Node Text : "+e.getMatchNode().getText());
    //Skipping in the main replace engine , do it at the end
    Node currentNode = e.getMatchNode();
    builder.moveTo(e.getMatchNode());
	
	
    switch (tokenName) {
        case "token1":          
                    builder.insertHtml("Meeting", useBuilderFormatting);

            break;
        case "token2":
               builder.insertHtml("25-06-2019", useBuilderFormatting);
            break;
        case "token3":
		      Font font = builder.getFont();
              font.setBold(true);

             ParagraphFormat paragraphFormat = builder.getParagraphFormat();
             paragraphFormat.setAlignment(ParagraphAlignment.LEFT);
             paragraphFormat.setKeepTogether(true);
			 //This is default format of token 
            int defaultStyleIdentifier = builder.getParagraphFormat().getStyleIdentifier();

            //Here we are applying heading style to reflcet in TOC, but it should retain the token format that is above line
            builder.getParagraphFormat().setStyleIdentifier(StyleIdentifier.HEADING_1);
           if(null!=Object.getTitle()){
                 builder.writeln((null!=serialNumValue?serialNumValue+ONE_SPACE:"")+ ObjectsSize + ": " + Object.getTitle());
             }
			
         builder.getParagraphFormat().setStyleIdentifier(defaultStyleIdentifier);
		 //I have commented this here intentionally, and calling it at table of content generation time when all replacement is complete
       //builder.getDocument().updateFields();
        font.setBold(false);
		
		}
		
	case "asdf"	:
	   break;
	   
	   ....so on
	
	return ReplaceAction.REPLACE;
	
}

}

Pintutrnt · June 26, 2019, 5:49am

Hi Team,

I am using the find and replace feature of Aspose word java.

My work is to find the token based on regex and replace the whole token with new dynamic value (it could be table , a single word or paragraph (s)). While replacement is happening, the location of replaced text is changed. Attached files has both input docx and output docx with token and replaced text highlighted as yellow. The expected position of replaced text is highlighted green. We need the help on this ASAP as we are generating corporate documents that cannot have such issues.

Attached files contain the input template/document (input_temp.docx) with tokens and the output document (output_temp.docx) containing the replaced tokens. I have highlighted the misplaced token with blue background and put the expected position in green background. I am also copy pasting the issue briefly below.
1.) if you see the input template (input_temp.docx) , we have put token |agStartT| to |agEndT| but output docx (output_temp.docx) has replaced text as 11:58 AM, 05:30 PM to <expected position> .

2.) if you see the input template (input_temp.docx) , we have put token |cinAddr|, |cinSName| but output docx (output_temp.docx), has replaced text as 159 B K PAUL AVENUE KOLKATA WB 700005 IN SHEKHAR, <expected_position>

code Snippet:
public static final String TOKEN_REGEX = “(\|[a-zA-z]+[\{[a-zA-Z0-9\,\-\/\=\:\s*]\}]\|)”;

public class FindAndReplaceCallBack implements IReplacingCallback {
@Override
public int replacing(ReplacingArgs e) throws Exception {

    String token = e.getMatch().group();
    DocumentBuilder builder = new DocumentBuilder((Document) e.getMatchNode().getDocument());
    logger.info("token : " + token);
    logger.info("Match Node Text : "+e.getMatchNode().getText());
    //Skipping in the main replace engine , do it at the end
    Node currentNode = e.getMatchNode();
    builder.moveTo(e.getMatchNode());
	
	
    switch (tokenName) {
        case "agTp":          
                    builder.insertHtml("Meeting", useBuilderFormatting);

            break;
        case "agNtDt":
               builder.insertHtml("25-06-2019", useBuilderFormatting);
            break;
        case "agDt"
		........ so on
		
		}
	
	return ReplaceAction.REPLACE;
	
}

}

New folder (2).zip (14.2 KB)

awais.hafeez · June 26, 2019, 11:15am

@Pintutrnt,

Please also create a simplified standalone runnable Java application (source code without compilation errors) that helps us to reproduce your current problem on our end and attach it here for testing. Please do not include Aspose.Words JAR files in it to reduce the file size. Thanks for your cooperation.

Pintutrnt · June 27, 2019, 10:21am

@awais.hafeez,
final_input_output_docx.zip (29.7 KB)
SampleCode.zip (1.7 KB)

Please find the attached running code and input & output generated template.

Pintutrnt · June 27, 2019, 10:23am

@awais.hafeez,

FYI,

Attached source code and input & output generated file are for both of the above mentioned issues.

awais.hafeez · June 27, 2019, 2:58pm

@Pintutrnt,

You can build logic on the following code that inserts HTML at the exact position where the pattern string is found:

Document document = new Document("E:\\temp\\final_input_output_docx\\input_temp.docx");

Pattern pattern = Pattern.compile("(\\|[a-zA-z]+[\\{[a-zA-Z0-9\\,\\-\\/]*\\}]*\\|)", Pattern.CASE_INSENSITIVE);
FindReplaceOptions opts = new FindReplaceOptions();
opts.setDirection(FindReplaceDirection.BACKWARD);
opts.setReplacingCallback(new ReplaceEvaluator());

document.getRange().replace(pattern, "", opts);

document.save("E:\\temp\\final_input_output_docx\\19.6.docx");

static class ReplaceEvaluator implements IReplacingCallback {
    int i = 1;
    public int replacing(ReplacingArgs e) throws Exception {
        // This is a Run node that contains either the beginning or the complete match.
        Node currentNode = e.getMatchNode();

        // The first (and may be the only) run can contain text before the match,
        // in this case it is necessary to split the run.
        if (e.getMatchOffset() > 0)
            currentNode = splitRun((Run) currentNode, e.getMatchOffset());

        ArrayList runs = new ArrayList();

        // Find all runs that contain parts of the match string.
        int remainingLength = e.getMatch().group().length();
        while ((remainingLength > 0) && (currentNode != null) && (currentNode.getText().length() <= remainingLength)) {
            runs.add(currentNode);
            remainingLength = remainingLength - currentNode.getText().length();

            // Select the next Run node.
            // Have to loop because there could be other nodes such as BookmarkStart etc.
            do {
                currentNode = currentNode.getNextSibling();
            } while ((currentNode != null) && (currentNode.getNodeType() != NodeType.RUN));
        }

        // Split the last run that contains the match if there is any text left.
        if ((currentNode != null) && (remainingLength > 0)) {
            splitRun((Run) currentNode, remainingLength);
            runs.add(currentNode);
        }

        //// to insert Table
        DocumentBuilder builder = new DocumentBuilder((Document) e.getMatchNode().getDocument());
        builder.moveTo((Run) runs.get(runs.size() - 1));

        builder.insertHtml("<b>bold</b>" + i);
        i++;

        for (Run run : (Iterable<Run>) runs)
            run.remove();

        // Signal to the replace engine to do nothing because we have already done all what we wanted.
        return ReplaceAction.SKIP;
    }

    /**
     * Splits text of the specified run into two runs. Inserts the new run just
     * after the specified run.
     */
    private Run splitRun(Run run, int position) throws Exception {
        Run afterRun = (Run) run.deepClone(true);
        afterRun.setText(run.getText().substring(position));
        run.setText(run.getText().substring((0), (0) + (position)));
        run.getParentNode().insertAfter(afterRun, run);
        return afterRun;
    }
}

Hope, this helps.

Pintutrnt · June 28, 2019, 6:02am

Thank you @awais.hafeez. it worked.

Can you please reply for the below problem as well.

Pintutrnt:

Hi Team,

I am using the FindAndReplace feature of Aspose word java. For this we are finding the tokens from a given template and replacing them with proper data. I have a use case wherein I need to extract the format (font size, type, color etc) of the token and then retain/apply the same on the replaced text and additionally set the Heading style so that the table of contents picks the replaced text. I don’t intend to apply any default format of Heading style. I just need the Heading style so that Table of Contents picks it up.

Below is the code snippet. The issue I am facing is that the font size is not being retained from my token font size. Instead default heading font size of 16 is being set. How can I control the font size as well? Or in general , is there an easier way to apply the token’s format as is (exactly) to the replaced text and then add additional formats/styles?

public String TOKEN_REGEX="(|[a-zA-z]+[{[a-zA-Z0-9,-/=:\s*] }] |)";

public class FindAndReplaceCallBack implements IReplacingCallback {
@Override
public int replacing(ReplacingArgs e) throws Exception {
    String token = e.getMatch().group();
    DocumentBuilder builder = new DocumentBuilder((Document) e.getMatchNode().getDocument());
    logger.info("token : " + token);
    logger.info("Match Node Text : "+e.getMatchNode().getText());
    //Skipping in the main replace engine , do it at the end
    Node currentNode = e.getMatchNode();
    builder.moveTo(e.getMatchNode());
	
	
    switch (tokenName) {
        case "token1":          
                    builder.insertHtml("Meeting", useBuilderFormatting);

            break;
        case "token2":
               builder.insertHtml("25-06-2019", useBuilderFormatting);
            break;
        case "token3":
		      Font font = builder.getFont();
              font.setBold(true);

             ParagraphFormat paragraphFormat = builder.getParagraphFormat();
             paragraphFormat.setAlignment(ParagraphAlignment.LEFT);
             paragraphFormat.setKeepTogether(true);
			 //This is default format of token 
            int defaultStyleIdentifier = builder.getParagraphFormat().getStyleIdentifier();

            //Here we are applying heading style to reflcet in TOC, but it should retain the token format that is above line
            builder.getParagraphFormat().setStyleIdentifier(StyleIdentifier.HEADING_1);
           if(null!=Object.getTitle()){
                 builder.writeln((null!=serialNumValue?serialNumValue+ONE_SPACE:"")+ ObjectsSize + ": " + Object.getTitle());
             }
			
         builder.getParagraphFormat().setStyleIdentifier(defaultStyleIdentifier);
		 //I have commented this here intentionally, and calling it at table of content generation time when all replacement is complete
       //builder.getDocument().updateFields();
        font.setBold(false);
		
		}
		
	case "asdf"	:
	   break;
	   
	   ....so on
	
	return ReplaceAction.REPLACE;
	
}
}

awais.hafeez · June 29, 2019, 4:59am

@Pintutrnt,

Please replace the ‘ReplaceEvaluator’ class code that I shared in my previous post with the following code and then observe the behavior:

static class ReplaceEvaluator implements IReplacingCallback {
    int i = 1;
    public int replacing(ReplacingArgs e) throws Exception {
        // This is a Run node that contains either the beginning or the complete match.
        Node currentNode = e.getMatchNode();

        // The first (and may be the only) run can contain text before the match,
        // in this case it is necessary to split the run.
        if (e.getMatchOffset() > 0)
            currentNode = splitRun((Run) currentNode, e.getMatchOffset());

        ArrayList runs = new ArrayList();

        // Find all runs that contain parts of the match string.
        int remainingLength = e.getMatch().group().length();
        while ((remainingLength > 0) && (currentNode != null) && (currentNode.getText().length() <= remainingLength)) {
            runs.add(currentNode);
            remainingLength = remainingLength - currentNode.getText().length();

            // Select the next Run node.
            // Have to loop because there could be other nodes such as BookmarkStart etc.
            do {
                currentNode = currentNode.getNextSibling();
            } while ((currentNode != null) && (currentNode.getNodeType() != NodeType.RUN));
        }

        // Split the last run that contains the match if there is any text left.
        if ((currentNode != null) && (remainingLength > 0)) {
            splitRun((Run) currentNode, remainingLength);
            runs.add(currentNode);
        }

        DocumentBuilder builder = new DocumentBuilder((Document) e.getMatchNode().getDocument());
        Run tempRun = (Run) runs.get(0);
        builder.moveTo(tempRun);

        builder.insertHtml("<b>bold</b>" + i, true);
        i++;

        for (Run run : (Iterable<Run>) runs)
            run.remove();

        // Signal to the replace engine to do nothing because we have already done all what we wanted.
        return ReplaceAction.SKIP;
    }

    /**
     * Splits text of the specified run into two runs. Inserts the new run just
     * after the specified run.
     */
    private Run splitRun(Run run, int position) throws Exception {
        Run afterRun = (Run) run.deepClone(true);
        afterRun.setText(run.getText().substring(position));
        run.setText(run.getText().substring((0), (0) + (position)));
        run.getParentNode().insertAfter(afterRun, run);
        return afterRun;
    }
}

Hope, this helps.

Pintutrnt · July 2, 2019, 6:30am

@awais.hafeez

I tried your updated code but still my second issues persist that i quoted above in this thread.

if you can guide me in applying the custom format on predefined “StyleIdentifier.HEADING_1”. so that it should be updated in TOC as well as in dynamic generated paragraph title.

awais.hafeez · July 3, 2019, 3:36am

@Pintutrnt,

Generally, you can iterate through the collection of Paragraphs in document, check if it is a heading 1 paragraph and then format each Run inside with custom formatting by using the following code:

Document document = new Document("E:\\temp\\in.docx");

for (Paragraph para : (Iterable<Paragraph>) document.getChildNodes(NodeType.PARAGRAPH, true)){
    if (para.getParagraphFormat().getStyleIdentifier() == StyleIdentifier.HEADING_1){
        for (Run run : (Iterable<Run>) para.getRuns()){
            run.getFont().setColor(Color.RED);
            run.getFont().setItalic(true);
            run.getFont().setSize(18);
        }
    }
}

document.save("E:\\temp\\awjava-19.6.docx");

Pintutrnt · July 3, 2019, 11:08am

@awais.hafeez,

I more query regarding TOC. I am using below code snippet for TOC:

builder.insertTableOfContents(" \o “1-3” \h \z \u");
builder.getDocument().updateFields();

The above code is generating the TOC w.r.t applied HEADING in Template (DOCx file) . but my requirement is to customize the Table of Contents (like set different font, size etc…). Can you please guide me…

Thank you…

awais.hafeez · July 4, 2019, 3:29am

@Pintutrnt,

I think, you can meet this requirement after using the following code:

Document doc = new Document("E:\\temp\\toc.docx");
// Insert TOC Field and then add the following lines
for (Field field : doc.getRange().getFields()) {
    if (field.getType() == (FieldType.FIELD_HYPERLINK)) {
        FieldHyperlink hyperlink = (FieldHyperlink) field;
        if (hyperlink.getSubAddress() != null && hyperlink.getSubAddress().startsWith("_Toc")) {
            Paragraph tocItem = (Paragraph) field.getStart().getAncestor(NodeType.PARAGRAPH);
            System.out.println("processing..." + tocItem.toString(SaveFormat.TEXT).trim());

            for (Run run : tocItem.getRuns()) {
                run.getFont().setColor(Color.RED);
                run.getFont().setItalic(true);
                run.getFont().setSize(18);
            }
        }
    }
}

doc.save("E:\\temp\\awjava-19.6.docx");

Pintutrnt · July 4, 2019, 6:13am

@awais.hafeez,

Thank you !!
It resolved my problem.

Pintutrnt · July 9, 2019, 7:19am

Hi Awais,

I am using below code snippet to get the original style of token present in template.
int defaultStyleIdentifier = builder.getParagraphFormat().getStyleIdentifier();
builder.getParagraphFormat().setStyleIdentifier(StyleIdentifier.HEADING_1); // Applying Heading for below text
builder.write(“Text inserted here”); //After that i am setting some text value
builder.getParagraphFormat().setStyleIdentifier(defaultStyleIdentifier);// now retaining the default style of token of Template
builder.insertHtml(“
Html Inserted
”);
…

My issue , sometimes we are able to generate file but some time i am getting below exception:

java.lang.IllegalArgumentException: Cannot return user defined styles by style identifier.
com.aspose.words.StyleCollection.zzZv(Unknown Source)
com.aspose.words.StyleCollection.getByStyleIdentifier(Unknown Source)
com.aspose.words.StyleCollection.zzze(Unknown Source)
com.aspose.words.ParagraphFormat.setStyleIdentifier(Unknown Source)

Please help me out.

Thank you…

awais.hafeez · July 9, 2019, 11:09am

@Pintutrnt,

Instead of using getStyleIdentifier/setStyleIdentifier members, please try using the getStyle()/setStyle() or getStyleName()/setStyleName("") members of ParagraphFormat. Hope, this helps.