Remove Filter Malicious Content inside Word Documents | PowerShell Commands inside DOCX XML | dc:title Shape alt Target C# Java

Hi,

I have some docx documents but the documents might be malicious so I not going to attach it here. When I unzip the docx and I open up the core.xml, it contain powershell command like so:

<dc:title > poWErsHell.eXe -EXEcUtIONpOLiCY bYpAsS -nOprOFiLe -windoWsTYle hiddEn -encODeDcOMmAnD`

In another document, document.xml there’s a shape tag and the attribute of this tag is trying to run a cmd.

Example:

<v:shape id="_x0000_i1032" type="#_x0000_t75" alt="cmD.exe /c P^O^W^E^R^S^H^E^L^L ^-^N^o^P^r^o^f^i^l^e^ -^E^x^e^cutionPolicy`

while in some other documents, it contain links to external website which I think it might be malicious link like so:

Target=“file://62.8.193.206/Normal.dotm” TargetMode=“External”`

How can I check whether such a thing exists in my document, Does Aspose have any function call to access to it ? If there’s a function call to it, how can I use it to remove it?

thanks

@zhilin39,

The value of dc:title can be altered by using the Document.BuiltInDocumentProperties.Title Property. You can check the old value and set/override it with new value.

This can be controlled by Shape.AlternativeText property. You can loop through all Shapes in Word document and reset their Alternative Text values.

Target is an internal value read from the package, and there does not seem to be a way to change its value.

Please let me know if I can be of any further assistance.

@zhilin39,

It may be possible to control ‘target’ by using IResourceLoadingCallback Interface. Can you please ZIP and upload a simplified such Word document here for our reference. We will then investigate/analyze this particular document/scenario on our end and provide you more information.

Hi @awais.hafeez,

Sorry for my super late reply. I have attached the samples with password “sample” and please read the README file inside.

sample_unzip_doc.zip (41.9 KB)

I would like to remove the Target=“file://62.8.193.206/Normal.dotm” link in the relationship tag

thanks

@zhilin39,

We have logged your requirement in our issue tracking system. Your ticket number is WORDSNET-21143. We will further look into the details of this requirement as to how we can filter such malicious content inside Word documents. We will keep you updated here on the status of the linked issue.

@zhilin39,

Regarding WORDSNET-21143, unfortunately, there is not a generic/common way to check and filter malicious content in Word documents using Aspose.Words.

In this case, the first issue (Normal.dotm target from 7234800d9fe43ba9edea1d7435a1b030712e7bce035334c4a8ed76ed573dbfa1 sample) can be fixed by using the following code:

var doc = new Document(@"723...docx");
doc.AttachedTemplate = string.Empty;
doc.Save(@"723...-out.docx");

And the second issue (image.html target from 5074bb1fafbfc3863b0c43c8ad9d2cdf95ea56e91b06c0853ce5aa9e28fa594c sample) can be fixed by using the following code:

var doc = new Document(@"507...docx");
const int ImageSourceFullName = 4104;
foreach (ShapeBase shape in doc.GetChildNodes(NodeType.Shape, true))
{
    var imageSource = shape.FetchShapeAttr(ImageSourceFullName) as string;
    if ((imageSource != null) && imageSource.Contains("canarytokens"))
    {
        Console.WriteLine(imageSource);
        shape.SetShapeAttr(ImageSourceFullName, string.Empty);
    }
}
doc.Save(@"507...-out.docx");

@awais.hafeez

Thanks for the solution, I’m able to remove these values. Can I check for this code shape.SetShapeAttr(ImageSourceFullName, string.Empty); if I just set as empty for every shape attr will it affect those normal image/picture in the document ? meaning I don’t check for this imageSource.Contains(“canarytokens”)

thanks

@zhilin39,

We are working on your query and will get back to you soon.

@zhilin39,

ImageSourceFullName presents external image link and used to load an external image data to Aspose.Words. We do not recommend to remove all these attributes because in this case the links will be lost after saving the document. Moreover after saving to some formats the image may be unavailable.