Remove field codes from Word documents

newinformatics · February 25, 2010, 10:25am

Hi,
Users sometimes need to clean up a Word document so they can send it to a production system or other publication platform without problematic elements like field codes.
How can i check a Word document (.doc) and replace each field by its values with Aspose.Words? Can you post me a code snippet or a link? It is very important for me. Thank you very much.

newinformatics · February 25, 2010, 10:42am

I just found this code snipped that is usefull for me, but not is complete because i need replace it (not only remove it).

// Open document
Document doc = new Document(@"Test007\RemoveFields.doc");
// Get collection of FieldStarts from the document
NodeCollection starts = doc.FirstSection.Body.GetChildNodes(NodeType.FieldStart, true);
NodeCollection ends = doc.FirstSection.Body.GetChildNodes(NodeType.FieldEnd, true);
// Loop through all field starts and remove content between field start and end
foreach(FieldStart start in starts)
{
    Node currentnode = start.NextSibling;
    // Remove content
    while (currentnode.NodeType != NodeType.FieldEnd || currentnode.NodeType != NodeType.FieldStart)
    {
        currentnode = currentnode.NextSibling;
        if (currentnode != null)
            currentnode.PreviousSibling.Remove();
        else
            break;
    }
}
// Remove field starts and ends
starts.Clear();
ends.Clear();
// Save result document
doc.Save(@"Test007\out.doc");

How can i insert the fields values while i just removing them?
Thanks

AndreyN · February 25, 2010, 10:47am

Hi

Thanks for your inquiry. Yes you can do that. You should remove field code, FieldStart, FieldSeparator and FieldEnd nodes. Please try using the following code:

// Open source document
Document doc = new Document("in.doc");
// unlink fields in the document
UnlinkFields(doc);
// Save outut document
doc.Save("out.doc");

private void UnlinkFields(Document doc)
{
    // Get collection of FieldStart nodes
    NodeCollection fieldStarts = doc.GetChildNodes(NodeType.FieldStart, true);
    // Get collection of FieldSeparator nodes
    NodeCollection fieldSeparators = doc.GetChildNodes(NodeType.FieldSeparator, true);
    // And get collection of FieldEnd nodes
    NodeCollection fieldEnds = doc.GetChildNodes(NodeType.FieldEnd, true);
    // Loop through all FieldStart nodes
    foreach(FieldStart start in fieldStarts)
    {
        // Search for FieldSeparator node. it is needed to remove field code from the document
        Node curNode = start;
        while (curNode.NodeType != NodeType.FieldSeparator && curNode.NodeType != NodeType.FieldEnd)
        {
            curNode = curNode.NextPreOrder(doc);
            if (curNode == null)
                break;
        }
        // Remove all nodes between Fieldstart and FieldSeparator (of FieldEnd, depending from field type)
        if (curNode != null)
        {
            RemoveSequence(start, curNode);
        }
    }
    // Now we can remove FieldStart, FieldSeparator and FieldEnd nodes
    fieldStarts.Clear();
    fieldSeparators.Clear();
    fieldEnds.Clear();
}
///
/// Remove all nodes between start and end nodes, except start and end nodes
///
/// The start node
/// The end node
public void RemoveSequence(Node start, Node end)
{
    Node curNode = start.NextPreOrder(start.Document);
    while (curNode != null && !curNode.Equals(end))
    {
        // Move to next node
        Node nextNode = curNode.NextPreOrder(start.Document);
        // Check whether current contains end node
        if (curNode.IsComposite)
        {
            if (!(curNode as CompositeNode).GetChildNodes(NodeType.Any, true).Contains(end) &&
                !(curNode as CompositeNode).GetChildNodes(NodeType.Any, true).Contains(start))
            {
                nextNode = curNode.NextSibling;
                curNode.Remove();
            }
        }
        else
        {
            curNode.Remove();
        }
        curNode = nextNode;
    }
}

Hope this helps.
Best regards,

newinformatics · February 25, 2010, 11:11am

Very thanks Andrey!! it works fine!! It is perfect. It is what i was searching.
Thanks very much again!!.

chrisl · April 20, 2010, 10:51am

Andrey,

This works for all fields except ‘IF’ fields, where it outputs the entire conditional statement as text.

e.g. Apple = “Apple” “An Apple” " Not an Apple"

Is there a way to remove the field, but retain the evaluated value for IF fields?

Chris.

AndreyN · April 20, 2010, 11:06am

Hi
Thanks for your request. MS Word document field looks like the following:
[FieldStart] here is field code [FieldSeparator] here is field value [FieldEnd]
You can also open your document using DocumentExplorer (Aspose.Words demo application) to investigate your document’s structure. DocumentExplorer is included to Aspose.Words installation package. You can find it here:
C:\Program Files\Aspose\Aspose.Words\Demos\CSharp\DocumentExplorer
I think, in your case, you can use DocumentVisitor. Please follow the link to learn more
https://docs.aspose.com/words/net/how-to-extract-selected-content-between-nodes-in-a-document/
You can find some code example here:
https://forum.aspose.com/t/76601
Best regards,