I have been struggling to remove the NestedMerge fields from the Content extracted between a Start and End field. I tried various approaches but in the end I still have one End Merge field left and when I am trying to print the output as HTML - I am getting invalid documentModel error.
Here is what I tried till now -
Used the HelperMethod to extract content as provided by the ASPOSE team and it worked just great
Here is my MergeStart and End Setup and I would like to pull the Content between the MergeFields
UserNoteStart and UserNoteEnd - The content could have another mergeField in between called UserNoteDescriptionStart and UserNoteDescriptionEnd. I need to remove the nested mergeFields from the content.
I need to return the extracted content with removed Nested DescriptionStart and End mergeFields so that response can be passed to a Rich Text Editor as HTML
@vchau In your code you are trying to convert each extracted node to HTML. This is not quite correct since field start can belong to one extracted not and end belongs to another. So the individual extracted node DOM might be incomplete. I would suggest to generate a separate document from extracted nodes and them convert the whole document to HTML:
ArrayList<Node> extractedMainNodes = ExtractContentHelper.extractContent(startField, endField, false);
Document extractedDocument = ExtractContentHelper.generateDocument(srcDoc, extractedMainNodes);
// Remove mergefields if required.
for (Field f : extractedDocument.getRange().getFields())
{
if (f.getType() == FieldType.FIELD_MERGE_FIELD)
f.remove();
}
// Get HTML
String html = extractedDocument.toString(SaveFormat.HTML);
I tried this but this is removing the FIELD_MERGE_FIELD in the new document but not the content within it - I dont want the content in the response - The nested merged field START and END should be removed along with its content.
@vchau Could you please save the extracted content as DOCX document and attach it here for our reference? The provided code should remove all merge fields from the document.
For example if the marker fields are setup is like this
«UserNoteStart»«UserNoteDescriptionStart»This is Description«UserNoteDescriptionEnd»This is formatted main text«UserNoteEnd»
I would like to pull the textValue content between UserNoteDescriptionStart and UserNoteDescriptionEnd which is fairly easy using the extractMethod helper function but when I do the same for UserNoteStart and UserNoteEnd - I have to remove the internal UserNoteDescriptionStart and UserNoteDescriptionEnd along with its content “This is Description” and only return html for “This is formatted main text”
@vchau Thank you for additional information. The easiest way to achieve this is wrapping the content that should be removed into a bookmark and then remove the bookmark’s content. For example see the following code:
Document doc = new Document("C:\\Temp\\in.docx");
// Get start and end merge fields.
FieldMergeField start = null;
FieldMergeField end = null;
for (Field f : doc.getRange().getFields())
{
if (f.getType() == FieldType.FIELD_MERGE_FIELD)
{
FieldMergeField mf = (FieldMergeField)f;
if (mf.getFieldName().equals("UserNoteStart"))
start = mf;
if (mf.getFieldName().equals("UserNoteEnd"))
end = mf;
}
}
// Extract content between mergefields.
ArrayList<Node> extractedNodes = ExtractContentHelper.extractContent(start.getEnd(), end.getStart(), false);
Document extractedDocument = ExtractContentHelper.generateDocument(doc, extractedNodes);
// Wrap content that should be removed to bookmark to make it easier to remove.
// Get start and end merge fields.
FieldMergeField removeStart = null;
FieldMergeField removeEnd = null;
for (Field f : extractedDocument.getRange().getFields())
{
if (f.getType() == FieldType.FIELD_MERGE_FIELD)
{
FieldMergeField mf = (FieldMergeField)f;
if (mf.getFieldName().equals("UserNoteDescriptionStart"))
removeStart = mf;
if (mf.getFieldName().equals("UserNoteDescriptionEnd"))
removeEnd = mf;
}
}
String tmpBkName = "tmpBkName";
removeStart.getStart().getParentNode().insertBefore(new BookmarkStart(extractedDocument, tmpBkName), removeStart.getStart());
removeEnd.getEnd().getParentNode().insertAfter(new BookmarkEnd(extractedDocument, tmpBkName), removeEnd.getEnd());
// Remove content inside bookmark.
extractedDocument.getRange().getBookmarks().get(tmpBkName).setText("");
// Remove tmp bookmark.
extractedDocument.getRange().getBookmarks().get(tmpBkName).remove();
extractedDocument.save("C:\\Temp\\out.docx");
Thanks @alexey.noskov - It worked - The only problem I am seeing it I would like to return HTML not with the inline styles but with regular html tags for formatting.
For example :
<span style="font-family:Arial; font-weight:bold; letter-spacing:-0.1pt">days from when the notice was sent or (2) the date services will change)</span>
Should be
<span><b>days from when the notice was sent or (2) the date services will change)<b></span>
Thanks you @alexey.noskov for your help on this - we have gotten another challenge - What if we need to extractContent between the Start and End marker field for each instance of those mergefields int the document. Lets say the UserNoteStart and UserNoteEnd was used 3 times in the document with different content in it. We would like to extract the content between 1, 2 and 3 times usage and record it seperately.
Right now it loop and try to get the content between the last occurence of those marker field.
We can possibly pass from the source like UserNote1, UserNote2 so that the backend retrieves the content from that occurence of the marker fields
@vchau Yes, the easiest way to achieve this is giving different names to the marker fields, foe example by adding counter - UserNoteStart1…UserNoteEnd1, UserNoteStart2…UserNoteEnd2 etc. In this case you will be able to distinguish different notes. Also, in the loop you can extract the first occurrences of the marker fields, extract content and then move to the next occurrence. The technique is the same.
THanks @alexey.noskov - How should we apply the text or html under UserNoteStart1 , UserNote2…into
1st instance «UserNoteStart»«UserNoteDescriptionStart»This is Description«UserNoteDescriptionEnd»This is formatted main text«UserNoteEnd»
2nd instance «UserNoteStart»«UserNoteDescriptionStart»This is Description«UserNoteDescriptionEnd»This is formatted main text«UserNoteEnd»
Should we completely remove Marker Fields and inject a new field with UserNOte1 or UserNote2…Or inject the content of UserNOte1 and UserNote2 in the first position of << UserNoteStart >> and so one…What would be the best way to achieve this.
We also have this marker field inside a IF statement also like so we need to have an approach that works within IF also
@vchau I think Alexey meant that you need to manually set different field names. If I understand you correctly, you need to get content from every UserNoteStart...UserNoteEnd in the document. You can do it in the following way:
Document doc = new Document("input.docx");
// Get start and end merge fields.
FieldMergeField start = null;
FieldMergeField end = null;
int i = 1;
for (Field f : doc.getRange().getFields()) {
if (f.getType() == FieldType.FIELD_MERGE_FIELD) {
FieldMergeField mf = (FieldMergeField) f;
if (mf.getFieldName().equals("UserNoteStart")) {
start = mf;
}
if (mf.getFieldName().equals("UserNoteEnd")) {
end = mf;
}
}
if (start != null && end != null) {
Document extractedDocument = exctractContent(doc, start, end);
removeContent(extractedDocument);
extractedDocument.save(getArtifactsDir() + String.format("out%d.docx", i));
i++;
start = null;
end = null;
}
}
private Document exctractContent(Document doc, FieldMergeField start, FieldMergeField end) throws Exception {
ArrayList<Node> extractedNodes = ExtractContentHelper.extractContent(start.getEnd(), end.getStart(), false);
Document extractedDocument = ExtractContentHelper.generateDocument(doc, extractedNodes);
return extractedDocument;
}
private void removeContent(Document extractedDocument) throws Exception {
// Wrap content that should be removed to bookmark to make it easier to remove.
// Get start and end merge fields.
FieldMergeField removeStart = null;
FieldMergeField removeEnd = null;
for (Field f : extractedDocument.getRange().getFields()) {
if (f.getType() == FieldType.FIELD_MERGE_FIELD) {
FieldMergeField mf = (FieldMergeField) f;
if (mf.getFieldName().equals("UserNoteDescriptionStart"))
removeStart = mf;
if (mf.getFieldName().equals("UserNoteDescriptionEnd"))
removeEnd = mf;
}
}
String tmpBkName = "tmpBkName";
removeStart.getStart().getParentNode().insertBefore(new BookmarkStart(extractedDocument, tmpBkName), removeStart.getStart());
removeEnd.getEnd().getParentNode().insertAfter(new BookmarkEnd(extractedDocument, tmpBkName), removeEnd.getEnd());
// Remove content inside bookmark.
extractedDocument.getRange().getBookmarks().get(tmpBkName).setText("");
// Remove tmp bookmark.
extractedDocument.getRange().getBookmarks().get(tmpBkName).remove();
}
If you need to update field names using Aspose.Words, you can use following code:
for (Field f : doc.getRange().getFields()) {
if (f.getType() == FieldType.FIELD_MERGE_FIELD) {
FieldMergeField mf = (FieldMergeField) f;
if (mf.getFieldName().equals("UserNoteStart")) {
mf.setFieldName("UserNoteStart1");
mf.update();
start = mf;
}
if (mf.getFieldName().equals("UserNoteEnd")) {
mf.setFieldName("UserNoteEnd1");
mf.update();
end = mf;
}
}
}
Also, you can use increment parameter for the value.
Another option is to create HashMap where you can collect all merge fields and then retrieve them by field name.