Alternative way to extract a MergeField from a TextBox to its relative position

I am using the below source code in order to identify any text boxes containing MergeField

Pattern pattern = Pattern.compile("(MERGEFIELD.+\\* MERGEFORMAT)");
try {
	// Find all text boxes containing merge fields
	DocumentBuilder builder = new DocumentBuilder(document);
	AtomicInteger bmCount = new AtomicInteger(1);
	for (Object obj : document.getChildNodes(NodeType.SHAPE, true)) {
		if (obj instanceof Shape) {
			int bm = bmCount.getAndIncrement();
			Shape shape = (Shape) obj;
			if (shape.isInline()) {
				NodeCollection<Paragraph> paragraphs = shape.getChildNodes(NodeType.PARAGRAPH, true);
				Paragraph refNode = shape.getParentParagraph();
				builder.moveTo(refNode);
				builder.startBookmark("BM_" + bm);
				builder.endBookmark("BM_" + bm);

				// Extract text from text box and insert into relative position
				boolean isFound = false;
				for (int i = 0; i < paragraphs.getCount(); i++) {
					Matcher matcher = pattern.matcher(paragraphs.get(i).getText());
					// Case merge field
					if (matcher.find()) {
						builder.moveTo(shape);
						builder.insertField(matcher.group(1));
						isFound = true;
					}
				}

				// Remove text box
				if (paragraphs.getCount() > 0 && isFound) {
					shape.remove();
				}
			}
		}

		for (Bookmark bookmark : document.getRange().getBookmarks()) {
			{
				if (bookmark.getName().startsWith("BM_")) {
					Node node = bookmark.getBookmarkStart().getParentNode();
					if (StringUtil.isNullOrEmpty(node.toString(SaveFormat.TEXT), true)) {
						node.remove();
					}
				}
			}
		}
	}
	document.getMailMerge().execute(keys, datas);
} catch (Exception e) {
	throw new RuntimeException(e);
}

But since using regex, this approach is limited if requirement is changed to include other types of fields, or when field tag is invalid (there are cases that the MergeField in the text box is missing MERGEFORMAT suffix). I also tried using document.getRange().getFields(), which is able to get com.aspose.words.Field class, but not sure on whether I can copy that field to somewhere else.
So I want to ask, are there any alternative ways to:

  • Extract text from a text box, determine if content is indeed a MergeField
  • Convert content from text box into a com.aspose.words.Field class

Thank you in advance

@m16vip123

To extract a MergeField from a TextBox and convert its content into a com.aspose.words.Field class, you can utilize the DocumentBuilder and the Field class provided by Aspose.Words. Here’s a step-by-step approach to achieve this:

  1. Identify Text Boxes: Loop through the shapes in the document to find text boxes that may contain MergeFields.

  2. Extract Text: For each text box, check if it contains any MergeFields. You can use the getRange().getFields() method to retrieve all fields in the document.

  3. Insert Fields: If a MergeField is found, you can insert it into the desired location in the document using the DocumentBuilder.

  4. Remove Text Box: After extracting the MergeField, you can remove the text box if it is no longer needed.

Here’s a code example that demonstrates this process:

import com.aspose.words.*;

public void extractMergeFieldsFromTextBox(Document document) {
    try {
        DocumentBuilder builder = new DocumentBuilder(document);
        for (Shape shape : (Iterable<Shape>) document.getChildNodes(NodeType.SHAPE, true)) {
            if (shape.isInline()) {
                NodeCollection<Paragraph> paragraphs = shape.getChildNodes(NodeType.PARAGRAPH, true);
                boolean isFound = false;

                for (Paragraph paragraph : paragraphs) {
                    for (Field field : document.getRange().getFields()) {
                        if (field.getType() == FieldType.FIELD_MERGE_FIELD) {
                            // Move to the shape and insert the field
                            builder.moveTo(shape);
                            builder.insertField(field.getFieldCode());
                            isFound = true;
                            break;
                        }
                    }
                    if (isFound) {
                        break;
                    }
                }

                // Remove the text box if a MergeField was found
                if (isFound) {
                    shape.remove();
                }
            }
        }
    } catch (Exception e) {
        throw new RuntimeException(e);
    }
}

Explanation:

  • The code iterates through all shapes in the document, checking if they are inline shapes (text boxes).
  • It retrieves paragraphs within each shape and checks for MergeFields.
  • If a MergeField is found, it uses the DocumentBuilder to insert the field at the position of the text box and then removes the text box.

This approach allows you to dynamically handle MergeFields without relying on regex, making it more robust against variations in field formatting or missing suffixes like MERGEFORMAT.

For further details, you can refer to the Aspose documentation on working with fields and document builders.

Is there any other ways to do it besides using DocumentBuilder.insertField()?

Also, I’m not sure about the performance of the above solution using document.getRange().getFields() inside a Paragraph loop inside a Shape loop.
Assuming there’s always 1 Paragraph inside the Shape, does the scope of this function find and return all fields inside the target Shape, or every fields inside the Document itself?

@m16vip123 To check whether shape contains MERGEFIELD fields, you can use the following code:

Document doc = new Document("C:\\Temp\\in.docx");

// Iterate over all shapes in the document.
for (Shape s : (Iterable<Shape>)doc.getChildNodes(NodeType.SHAPE, true))
{
    boolean hasMergefields = false;
    for (Field f : s.getRange().getFields())
        hasMergefields |= f.getType() == FieldType.FIELD_MERGE_FIELD;

    System.out.println(hasMergefields);
}

You cannot manipulate the Field as a Node because it is not a Node. Field in MS Word documents are represented by several nodes. So to move the field, it is required to move all nodes that represent the field. This might be quite complex task if the field is spanned over several block level nodes. So the easiest way is to recreate the field at the new location.