Remove Metadata from Word DOCX File using Java | Ink annotations, Track Changes, Attached Templates etc

I want to remove the Meta data from a Docx file. Can you please prove to me the steps or sample code to do so?

@gs562,

Please ZIP and attach the following resources here for testing:

  • Your simplified Word DOCX document you want to remove metadata from
  • Screenshot(s) of the area(s) showing the unwanted metadata that you want to delete
  • Your expected DOCX document showing the desired output. You can create this document manually by using MS Word.

As soon as you get these pieces of information ready, we will start investigation into your particular scenario/issue and provide you more information.

Sample.zip (6.5 MB)

I have uploaded the sample Docx files from which I want to remove the metadata.
Along with the list of all the attributes of meat, data which I want to remove.
Please let me know if you need any other infomation.

Sample.zip (6.5 MB)

I have uploaded the sample Docx files from which I want to remove the metadata.
Along with the list of all the attributes of meat, data which I want to remove.
Please let me know if you need any other infomation.

Sample.zip (6.51 MB)

@gs562,

We are working on your query and will get back to you soon.

@gs562,

Regarding removing Track Changes from Word Document, you can either accept all changes or reject individual revisions by using the following code:

Document doc = new Document("C:\\temp\\sample\\TestDocWMetaData.docm");

doc.AcceptAllRevisions();
// doc.Revisions.RejectAll();

//foreach (Revision revision in doc.Revisions)
//    //revision.Accept();
//    revision.Reject();

doc.Save("C:\\temp\\sample\\21.3.docm");

Regarding removing Comments from Word document, please use the following code:

static void RemoveComments(Document doc)
{
    // Collect all comments in the document
    NodeCollection comments = doc.GetChildNodes(NodeType.Comment, true);
    // Remove all comments.
    comments.Clear();
}

Regarding removing Text smaller than 5pt, Hidden text and White text, please use the following code:

foreach (Run run in doc.GetChildNodes(NodeType.Run, true))
    if (run.Font.Size < 5 || run.Font.Hidden || run.Font.Color == Color.FromArgb(255, 255, 255, 255))
        run.Remove();

To remove all user information from comments, revisions and document properties, please use the following code:

Document doc = new Document("C:\\temp\\input.docx");
doc.RemovePersonalInformation = true;
doc.Save("C:\\temp\\21.3.docx");

To remove built-in and custom document properties, please use the following code:

Document doc = new Document("C:\\temp\\input.docx");

// to remove individual built-in or custom property
doc.CustomDocumentProperties.Remove("Authorized By");
doc.BuiltInDocumentProperties.Remove("some prop");

// to remove all built-in or custom properties
doc.CustomDocumentProperties.Clear();
doc.BuiltInDocumentProperties.Clear();

doc.Save("C:\\temp\\21.3.docx");

To remove Macros from Word document, please use following code:

Document doc = new Document("C:\\temp\\input.docm");
doc.RemoveMacros();
doc.Save("C:\\temp\\21.3.docx");

To remove Footnotes from Word document, please use following code snippet:

foreach (Footnote footnote in doc.GetChildNodes(NodeType.Footnote, true))
    if (footnote.FootnoteType == FootnoteType.Footnote)
        footnote.Remove();

To remove field(s) from Word document, please try the following code:

// removes all types of fields from Word document
// doc.Range.Fields.Clear();

// removes all instances of a particular type of field from Word document
foreach (Field field in doc.Range.Fields)
    if (field.Type == FieldType.FieldMergeField) // removes all merge fields
        field.Remove();

To assign Normal template to document, just assign empty string to Document.AttachedTemplate property.

In case you have further inquiries or may need any help, please let us know.

Thank you so much, Awais. I really appreciate your efforts.
I will try to implement these changes in the next few days. If any issue came, I will let you know.
Once again thanks for your support.

@gs562,

Sure. Please let us know if you may need any more information in future; we are always glad to help you.

Hello Awais,

I am not able to call the inbuilt method which you have mentioned in your previous mail. Like ::

doc.Revisions.RejectAll();
doc.RemoveMacros();
doc.RemovePersonalInformation = true;

Built and Custom properties are able to remove but other methods are not able to call. Please let me know how to resolve this issue. I have added the jar and import the lib as shown in the image.

image.png

Hello,

I am able to remove all the mentioned attributes except two things which include the GetNodeChilds method. How can I fix this?

error = "Type mismatch: cannot convert from element type Object to Footnote". How can I convert the Nodetype object to a Foodnote object. Similar issue is with the ::

foreach (Run run in doc.GetChildNodes(NodeType.Run, true)) if (run.Font.Size < 5 || run.Font.Hidden || run.Font.Color == Color.FromArgb(255, 255, 255, 255)) run.Remove();

image.png (30.9 KB)

image.png (33.8 KB)

image.png (14.5 KB)

@gs562,

Please use following Java equivalent code of above lines:

doc.getRevisions().rejectAll();
doc.removeMacros();
doc.setRemovePersonalInformation(true);

Please use following Java equivalent code of above lines:

for (Footnote footnote : (Iterable<Footnote>) doc.getChildNodes(NodeType.FOOTNOTE, true))
    if (footnote.getFootnoteType() == FootnoteType.FOOTNOTE)
        footnote.remove();

and

for (Run run : (Iterable<Run) doc.getChildNodes(NodeType.RUN, true))
    if (run.getFont().getSize() < 5 || run.getFont().getHidden() || run.getFont().getColor() == new Color(255, 255, 255, 255))
        run.remove();

Hello Awais,

I am stuck in removing the below items from word docx. Also Attaching the sample docx which has the screenshot properties. Most of the properties from 1- 7 might not present in the attached document. I don’t have sufficient tools to add those properties. So if you can add from your end to test it, then it will be very beneficial.
Can you please provide any workaround or methods to removes the below meta data?
Please let me know if you need any other information.
1 -White text on any background

2- Track Changes
3- Document variable

4-Document reviewers
5-Ink annotation
6-Attached templates
7-Smart tags words (2003/2007)

image006.jpg

image005.jpg

PFA,

image005.jpg (25.3 KB)

image006.jpg (19.3 KB)

(Attachment InputTestDocWMetaData2_updated_OutputFile.docx is missing)

@gs562,

I am afraid, we do not see any Word documents attached in your previous post . Can you please ZIP and reattach those Word files here for our reference?

PFA,

InputTestDocWMetaData2_updated_OutputFile.zip (701 KB)

@gs562,

We are working on your query and will get back to you soon.

Thank you for your response. I will be waiting for your feeback.

@gs562,

Please try running the following code:

Document doc = new Document("C:\\temp\\InputTestDocWMetaData2_updated_OutputFile\\InputTestDocWMetaData2_updated_OutputFile.docx");

// To remove attached template from Word document
doc.setAttachedTemplate("");

// To remove built-in and custom document properties
for (DocumentProperty prop : doc.getBuiltInDocumentProperties()) {
    doc.getBuiltInDocumentProperties().remove(prop.getName());
}
doc.getBuiltInDocumentProperties().clear();

for (DocumentProperty prop : doc.getCustomDocumentProperties()) {
    doc.getCustomDocumentProperties().remove(prop.getName());
}
doc.getCustomDocumentProperties().clear();

// To remove all headers/footers along with its content (watermarks etc)
for (Section sec : doc.getSections()) {
    sec.deleteHeaderFooterShapes();
    sec.clearHeadersFooters();
}

// to remove content controls from Word document
for (StructuredDocumentTag contentControl : (Iterable<StructuredDocumentTag>)
        doc.getChildNodes(NodeType.STRUCTURED_DOCUMENT_TAG, true)) {
    contentControl.removeAllChildren();
    contentControl.remove();
}

doc.save("C:\\Temp\\InputTestDocWMetaData2_updated_OutputFile\\21.4.docx");

In case the problem still remains, then please also create a screenshot highlighting the problematic area(s) in MS Word that you want to programmatically remove by using Aspose.Words and attach it here for our reference. We will then investigate the scenario further on our end and provide you more information.

Hello Awais,

Thanks for your help. I will try to implement this and let you know.