Removing Corrupted DocProperties from a document

Hi,

In the attached document, you can see that a couple of the document properties have been corrupted.

They have the text: Error! Unknown document property name.

Is it possible to scan the document and remove these properties?

Cheers

Paul

Hi

Thanks for your request. Please try using the following code:

// Open document.
Document doc = new Document("YYYY_0_1.doc");
// Get all FieldStart from the document.
Node[] fieldStarts = doc.GetChildNodes(NodeType.FieldStart, true).ToArray();
// Loop through all FieldStart.
foreach(FieldStart fieldStart in fieldStarts)
{
    if (fieldStart.FieldType == FieldType.FieldDocProperty)
    {
        string fieldCode = string.Empty;
        Node currentNode = fieldStart;
        // Get Field code
        while (currentNode.NodeType != NodeType.FieldSeparator)
        {
            if (currentNode.NodeType == NodeType.Run)
                fieldCode += (currentNode as Run).Text;
            currentNode = currentNode.NextSibling;
        }
        currentNode = fieldStart;
        // We should get Property name from field code
        Regex regex = new Regex(@"\s*(?\S+)\s+(?\S+)\s+(?.+)");
        Match match = regex.Match(fieldCode);
        string propertyName = match.Groups["propname"].Value;
        // Check if this Property exist in BuiltInDocumentProperties or CustomDocumentProperties
        if (doc.BuiltInDocumentProperties[propertyName] == null && doc.CustomDocumentProperties[propertyName] == null)
        {
            // Remove this field
            while (currentNode.NodeType != NodeType.FieldEnd)
            {
                currentNode = currentNode.NextSibling;
                currentNode.PreviousSibling.Remove();
            }
            currentNode.Remove();
        }
    }
}
doc.Save("Out.doc");

Best regards,

Excellent.

Thanks a million for that.