Removing marker fields and Inserting the content in its position

vchau · June 25, 2024, 2:11pm

So I am trying to implement a scenarios - Where I have to remove some marker fields and the whole content within them and I was able to achieve that. lets say there are many instances of a group of these fields in the document -

<<UserNoteStart>>This is content<<UserNoteDescriptionStart>>This is description<<UserNoteDescriptionEnd>><<UserNoteEnd>>

The goal is to remove a specific position of this group lets say first instance of it - I have to remove everything from UserNoteStart and UserNoteEnd and insert a value that would be coming in the incoming payload say UserNote1 , 1 being the position of the the group that I need to remove and insert the value inside UserNote1.

How should I do it for all the group of these marker fields like 1, 2, 3…etc

What should be the right approach to remove the whole group and insert a new mergefield called << UserNote1 >> and then replace the value or there is a better way to do it.

Let us know how we can achieve this - We are completely blocked.

vyacheslav.deryushev · June 25, 2024, 8:56pm

@vchau Answered in another thread Content Extraction and Removing Nested Merge Fields - #15 by vyacheslav.deryushev

joeberth · June 25, 2024, 9:13pm

This is one of the ways we are tried to solve this issue:
Does anyone now how can we solve this? replace the <<XPTOStart>> for <<XPTO2>>? We need to take care also when treating IF fields instead direct MergeFields

public static void processUserNotes(final Document doc, final String tagName, final String currTag) {
    // Get start and end merge fields.
    FieldMergeField start = null;
    FieldMergeField end = null;
    int count = 1;
    for (Field f : doc.getRange().getFields())
    {
        if ((f.getType() == FieldType.FIELD_MERGE_FIELD) || (f.getType() == FieldType.FIELD_IF)) {
            if (f instanceof FieldMergeField) {
                FieldMergeField mf = (FieldMergeField) f;
                if (mf.getFieldName().contains(tagName)) {
                    if (count == (mergeCounter.get(tagName))) {
                        start = mf;
                    }
                    count += 1;
                }
                if (mf.getFieldName().contains(tagName.replace("Start", "End")))
                    if (count == (mergeCounter.get(tagName))) {
                        end = mf;
                    }
            }  else if (f instanceof FieldIf) {
            FieldIf ifField = (FieldIf) f;
            String fieldCode = ifField.getFieldCode();

        // Check if the IF field contains 
        if (fieldCode.contains(currTag)) {
            startField = ifField.getStart();
            deleteContent = true; // Start deleting content from here
        } else if (fieldCode.contains(currTag.replace("Start", "End"))) {
            endField = ifField.getEnd();
            deleteContent = false; // Stop deleting content after this
        }
    }

        if (start != null && end != null) {
            DocumentBuilder builder = new DocumentBuilder(doc);
            builder.moveTo(start.getStart());
            builder.insertField("MERGEFIELD " + currTag + " \\* MERGEFORMAT", null);
        }

        break;
    }
   // Extract content between mergefields.
    int counterToRun = 1;
    for (Section section : doc.getSections()) {
        boolean insideUserNote = false;
        Run startRun = null;
        for (Paragraph paragraph : section.getBody().getParagraphs()) {

            for (Run run : paragraph.getRuns()) {
                String text = run.getText();
                if (text.contains(tagName)) {
                    if (Objects.equals(mergeCounter.get(tagName), counterToRun)) {
                        insideUserNote = true;
                        startRun = run;
      
                        continue;
                    }
                    counterToRun += 1;
                }

                if (insideUserNote) {
                    if (text.contains(tagName.replace("Start", "End"))) {
                        insideUserNote = false;

                        // Remove the start and end markers
                        run.remove();

                        startRun.remove();
                        return;
                    } else {
                        run.remove();
                    }
                }
            }
        }
    }
}

vyacheslav.deryushev · June 26, 2024, 6:47am

@joeberth Please check my post here: Content Extraction and Removing Nested Merge Fields - #16 by vyacheslav.deryushev

To insert new fields with the content you can wrap the original content inside bookmark, then remove the content and insert a new one including new fields. Also, you can leave original fields
<<UserNoteStart>> and <<UserNoteEnd>> and just rename it to <<UserNote1>> or any other names you want. To do it, just add bookmark only for content between fields.

And it’s not clear what you need to do with the IF fields? Just remove them?

vchau · June 30, 2024, 6:49am

@vyacheslav.deryushev - We have almost solved the problem by renaming the Marker fields << UserNoteStart >>…<< UserNoteEnd >> and any other type of UserNote that have UserNoteStart in it for example << MandatoryUserNoteStart >>…<< MandatoryUserNoteEnd >> with new merge fields with the position of each type of userNote like renaming it to << UserNote1 >>… << MandatoryUserNote1 >>. There is one type of MandatoryUserNote where the dom structure is complex and has more data and formatting underneath it is not getting renamed correctly. Can you help us resolve the last bit. Attach is the input.docx and output.docx
input.docx (643.2 KB)

output.docx (454.1 KB)

vchau · June 30, 2024, 6:51am

@vyacheslav.deryushev - Here is the code for it - It just works perfectly for everything except for the place where Marker Fields Start and End has more formatted data within it.

 @GetMapping("/mapDocument")
    public Document mapDocument() throws Exception {
        Document doc = new Document("input.docx");
        //To keep track of each type of UserNoteStart
        Map<String, Integer> counters = new HashMap<>();

        processAllFields(doc, counters);

        //update all fields in document
        // just in case if anything got missed after replace
        doc.updateFields();

        doc.save("output.docx");

        return doc;

    }

private void processAllFields(final Document doc, final Map<String, Integer> counters) throws Exception {

        NodeCollection<FieldStart> fieldStarts = doc.getChildNodes(NodeType.FIELD_START, true);
        DocumentBuilder builder = new DocumentBuilder(doc);

        for (FieldStart fieldStart: fieldStarts) {
            if (fieldStart.getFieldType() == FieldType.FIELD_MERGE_FIELD) {
                processUserNoteField(fieldStart, builder, counters);
            }
        }
    }

private void processUserNoteField(final FieldStart fieldStart, final DocumentBuilder builder,
                                      final Map<String, Integer> counters) throws Exception {

        //Start looking for only mergeFields of certain name pattern *UserNoteStart
        String fieldCode = fieldStart.getField().getFieldCode();
        Pattern pattern = Pattern.compile("(\\w*)UserNoteStart");
        Matcher matcher = pattern.matcher(fieldCode);

        //Only if mergeField name matches we will take action otherwise nothing
        if (matcher.find()) {
            String prefix = matcher.group(1);
            FieldEnd endField = findEndFieldOfFieldStart(fieldStart, prefix);
            if (endField != null) {
                //Increment the counter of UserNote prefix which will be "" when UserNote is actually called UserNoteStart
                int counter = counters.getOrDefault(prefix, 1);

                //Move to the startField
                builder.moveTo(fieldStart);

                //Insert a new field at the FieldStartPosition
                //This part is important as from tagMap received to do a MailMerge will have UserNote in sequence
                // Like UserNote1, UserNote2 or MandatoryUserNote1 , MandatoryUserNote1
                Field newField = builder.insertField("MERGEFIELD " + prefix + "UserNote" + counter);

                //Ensure this newly added field is updated
                newField.update();

                //Remove all nodes between start and end fields cuz after the new mergeField is inserted that will render we dont need marker fields
                removeNodesBetweenFieldStartAndEnd(fieldStart, endField);

                //Update counter
                counters.put(prefix, counter + 1);
            }

        }
    }

private FieldEnd findEndFieldOfFieldStart(final FieldStart fieldStart, final String prefix) {
        Node currentNode = fieldStart.getNextSibling();
        while (currentNode != null) {
            if (currentNode.getNodeType() == NodeType.FIELD_END) {
                FieldEnd fieldEnd = (FieldEnd) currentNode;
                if (fieldEnd.getFieldType() == FieldType.FIELD_MERGE_FIELD
                        && fieldEnd.getField().getFieldCode().contains(prefix + "UserNoteEnd")) {
                    return fieldEnd;
                }
            }
            currentNode = currentNode.getNextSibling();
        }
        return null;
    }

private void removeNodesBetweenFieldStartAndEnd(final Node start, final Node end) {
        Node currentNode = start.getNextSibling();
        //Remove anything between start and end
        while (currentNode != null && currentNode != end) {
            Node nextNode = currentNode.getNextSibling();
            currentNode.remove();
            currentNode = nextNode;
        }
        //Remove start and end in the last
        start.remove();
        end.remove();
    }

vchau · June 30, 2024, 6:55am

@vyacheslav.deryushev When debugging - We found that for the 3rd MandatoryUserNoteStart in the document - its not finding the end field - MandatoryUserNoteEnd for some reason and so the removeNotebetweenFieldStartAndEnd cannot do anything and skipping the field, because findEndFieldOfFieldStart() returned null.

Please help us with the last bit of the edge case

vyacheslav.deryushev · June 30, 2024, 10:14pm

@vchau If you have several paragraphs between nodes, you can use NextPreOrder. Change your code like this:

private FieldEnd findEndFieldOfFieldStart(final FieldStart fieldStart, final String prefix) {
    Node currentNode = fieldStart.getNextSibling();
    while (currentNode != null) {
        if (currentNode.getNodeType() == NodeType.FIELD_END) {
            FieldEnd fieldEnd = (FieldEnd) currentNode;
            if (fieldEnd.getFieldType() == FieldType.FIELD_MERGE_FIELD
                    && fieldEnd.getField().getFieldCode().contains(prefix + "UserNoteEnd")) {
                return fieldEnd;
            }
        }
        currentNode = currentNode.nextPreOrder(fieldStart);
    }
    return null;
}

private void removeNodesBetweenFieldStartAndEnd(final Node start, final Node end) {
    // Array to collect paragraphs for further deletion.
    // We can't delete the paragraphs first, so we need to delete them in the end.
    ArrayList<Node> paragraphs = new ArrayList<>();

    Node currentNode = start.getNextSibling();
    //Remove anything between start and end
    while (currentNode != null && currentNode != end) {
        Node nextNode = currentNode.nextPreOrder(start);
        if (currentNode.getNodeType() != NodeType.PARAGRAPH)
            currentNode.remove();
        else
            paragraphs.add(currentNode);

        currentNode = nextNode;
    }

    // Remove empty paragraphs
    for (Node para : paragraphs)
        para.remove();

    //Remove start and end in the last
    start.remove();
    end.remove();
}

Here is my output:

output.docx (453.9 KB)

vchau · July 1, 2024, 12:26am

@vyacheslav.deryushev - Thanks so much - It works for me.