Remove Filter Malicious Content inside Word Documents | PowerShell Commands inside DOCX XML | dc:title Shape alt Target C# Java

Hi,

I have some docx documents but the documents might be malicious so I not going to attach it here. When I unzip the docx and I open up the core.xml, it contain powershell command like so:

<dc:title > poWErsHell.eXe -EXEcUtIONpOLiCY bYpAsS -nOprOFiLe -windoWsTYle hiddEn -encODeDcOMmAnD`

In another document, document.xml there’s a shape tag and the attribute of this tag is trying to run a cmd.

Example:

<v:shape id="_x0000_i1032" type="#_x0000_t75" alt="cmD.exe /c P^O^W^E^R^S^H^E^L^L ^-^N^o^P^r^o^f^i^l^e^ -^E^x^e^cutionPolicy`

while in some other documents, it contain links to external website which I think it might be malicious link like so:

Target=“file://62.8.193.206/Normal.dotm” TargetMode=“External”`

How can I check whether such a thing exists in my document, Does Aspose have any function call to access to it ? If there’s a function call to it, how can I use it to remove it?

thanks

@zhilin39,

The value of dc:title can be altered by using the Document.BuiltInDocumentProperties.Title Property. You can check the old value and set/override it with new value.

This can be controlled by Shape.AlternativeText property. You can loop through all Shapes in Word document and reset their Alternative Text values.

Target is an internal value read from the package, and there does not seem to be a way to change its value.

Please let me know if I can be of any further assistance.

@zhilin39,

It may be possible to control ‘target’ by using IResourceLoadingCallback Interface. Can you please ZIP and upload a simplified such Word document here for our reference. We will then investigate/analyze this particular document/scenario on our end and provide you more information.

Hi @awais.hafeez,

Sorry for my super late reply. I have attached the samples with password “sample” and please read the README file inside.

sample_unzip_doc.zip (41.9 KB)

I would like to remove the Target=“file://62.8.193.206/Normal.dotm” link in the relationship tag

thanks

@zhilin39,

We have logged your requirement in our issue tracking system. Your ticket number is WORDSNET-21143. We will further look into the details of this requirement as to how we can filter such malicious content inside Word documents. We will keep you updated here on the status of the linked issue.

@zhilin39,

Regarding WORDSNET-21143, unfortunately, there is not a generic/common way to check and filter malicious content in Word documents using Aspose.Words.

In this case, the first issue (Normal.dotm target from 7234800d9fe43ba9edea1d7435a1b030712e7bce035334c4a8ed76ed573dbfa1 sample) can be fixed by using the following code:

var doc = new Document(@"723...docx");
doc.AttachedTemplate = string.Empty;
doc.Save(@"723...-out.docx");

And the second issue (image.html target from 5074bb1fafbfc3863b0c43c8ad9d2cdf95ea56e91b06c0853ce5aa9e28fa594c sample) can be fixed by using the following code:

var doc = new Document(@"507...docx");
const int ImageSourceFullName = 4104;
foreach (ShapeBase shape in doc.GetChildNodes(NodeType.Shape, true))
{
    var imageSource = shape.FetchShapeAttr(ImageSourceFullName) as string;
    if ((imageSource != null) && imageSource.Contains("canarytokens"))
    {
        Console.WriteLine(imageSource);
        shape.SetShapeAttr(ImageSourceFullName, string.Empty);
    }
}
doc.Save(@"507...-out.docx");

@awais.hafeez

Thanks for the solution, I’m able to remove these values. Can I check for this code shape.SetShapeAttr(ImageSourceFullName, string.Empty); if I just set as empty for every shape attr will it affect those normal image/picture in the document ? meaning I don’t check for this imageSource.Contains(“canarytokens”)

thanks

@zhilin39,

We are working on your query and will get back to you soon.

@zhilin39,

ImageSourceFullName presents external image link and used to load an external image data to Aspose.Words. We do not recommend to remove all these attributes because in this case the links will be lost after saving the document. Moreover after saving to some formats the image may be unavailable.

Hi,

for this part of code:

const int ImageSourceFullName = 4104;
foreach (ShapeBase shape in doc.GetChildNodes(NodeType.Shape, true))
{
    var imageSource = shape.FetchShapeAttr(ImageSourceFullName) as string;
    if ((imageSource != null) && imageSource.Contains("canarytokens"))
    {
        Console.WriteLine(imageSource);
        shape.SetShapeAttr(ImageSourceFullName, string.Empty);
    }
}

The FetchShapeAttr and SetShapeAttr is deprecated and removed from version 24.5 onwards. Is there an alternative? Also would like to know how do you get the “4104” integer?

Thanks

@raine93 You can access this property using ImageData.SourceFullName property. “4104” integer is attribute key for internal use.

Thanks for the help.

Now my code change to:

foreach (ShapeBase shapebase in doc.GetChildNodes(NodeType.Shape, true))
{
    if (shapebase is Aspose.Words.Drawing.Shape shape)
    {
        string imageSource = shape.ImageData.SourceFullName;
       if ((imageSource != null) && imageSource.Contains("canarytokens") )
       {
          shape.Remove();
          //shapebase.SetShapeAttr(ImageSourceFullName, string.Empty);
       }

  }
  else
  {
      Console.WriteLine("shapebase is not a shape");
  }
}

it should not go to the else statement since doc.GetChildNodes get the ones that are Shape

@raine93 ShapeBase is a base class for Shape and GorupShape classes. doc.GetChildNodes(NodeType.Shape, true) returns only Shape nodes, so there is no need in the following condition:

if (shapebase is Aspose.Words.Drawing.Shape shape)

GorupShape nodes has different node type, i.e. NodeType.GroupShape.

You can modify your code like this:

foreach (Shape shape in doc.GetChildNodes(NodeType.Shape, true))
{
    string imageSource = shape.ImageData.SourceFullName;
    if ((imageSource != null) && imageSource.Contains("canarytokens"))
    {
        shape.Remove();
    }
}

Thanks, it works :smiling_face:

1 Like