I have some docx documents but the documents might be malicious so I not going to attach it here. When I unzip the docx and I open up the core.xml, it contain powershell command like so:
How can I check whether such a thing exists in my document, Does Aspose have any function call to access to it ? If there’s a function call to it, how can I use it to remove it?
It may be possible to control ‘target’ by using IResourceLoadingCallback Interface. Can you please ZIP and upload a simplified such Word document here for our reference. We will then investigate/analyze this particular document/scenario on our end and provide you more information.
We have logged your requirement in our issue tracking system. Your ticket number is WORDSNET-21143. We will further look into the details of this requirement as to how we can filter such malicious content inside Word documents. We will keep you updated here on the status of the linked issue.
Regarding WORDSNET-21143, unfortunately, there is not a generic/common way to check and filter malicious content in Word documents using Aspose.Words.
In this case, the first issue (Normal.dotm target from 7234800d9fe43ba9edea1d7435a1b030712e7bce035334c4a8ed76ed573dbfa1 sample) can be fixed by using the following code:
var doc = new Document(@"723...docx");
doc.AttachedTemplate = string.Empty;
doc.Save(@"723...-out.docx");
And the second issue (image.html target from 5074bb1fafbfc3863b0c43c8ad9d2cdf95ea56e91b06c0853ce5aa9e28fa594c sample) can be fixed by using the following code:
var doc = new Document(@"507...docx");
const int ImageSourceFullName = 4104;
foreach (ShapeBase shape in doc.GetChildNodes(NodeType.Shape, true))
{
var imageSource = shape.FetchShapeAttr(ImageSourceFullName) as string;
if ((imageSource != null) && imageSource.Contains("canarytokens"))
{
Console.WriteLine(imageSource);
shape.SetShapeAttr(ImageSourceFullName, string.Empty);
}
}
doc.Save(@"507...-out.docx");
Thanks for the solution, I’m able to remove these values. Can I check for this code shape.SetShapeAttr(ImageSourceFullName, string.Empty); if I just set as empty for every shape attr will it affect those normal image/picture in the document ? meaning I don’t check for this imageSource.Contains(“canarytokens”)
ImageSourceFullName presents external image link and used to load an external image data to Aspose.Words. We do not recommend to remove all these attributes because in this case the links will be lost after saving the document. Moreover after saving to some formats the image may be unavailable.