We're sorry Aspose doesn't work properply without JavaScript enabled.

Free Support Forum - aspose.com

Extract only pictures of persons out of word files

Hello,

I already succeeded with extracting all kinds of images out of word files. But what I especially need is to extract only images of persons, mostly application pictures, and not certificates with text in it and so on.

Is there a possibility to do that via C#.NET with Aspose words?

Best regards,

Julian

Hi<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" />

Thanks for your request. There is no way to know what images should be extracted or not. But if there is some kind of placeholders in your document, for example bookmark then you can try to achieve that. Could you please provide me your document for testing and show which images you would like to extract.

Best regards.

Hello again,

thank you for you quick answer.

I attached a possible word test file with one picture of the "person" that applied and one certificate of the person. As described I would like to extract only the one of the person.

Best regards,

Julian

Hi<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" />

Thank you for additional information. If image you would like to extract is placed in the same paragraph with word “name” then you can achieve this using ReplaceEvaluator. See the following code for example:

public void Test017()

{

//Open document

Document doc = new Document(@"Test017\in.doc");

//Search for "name" word

Regex regex = new Regex("name");

doc.Range.Replace(regex, new ReplaceEvaluator(ReplaceExtractImage), false);

}

//Create conter

int imgCounter = 0;

private ReplaceAction ReplaceExtractImage(object sender, ReplaceEvaluatorArgs e)

{

//Get paragraph

Paragraph par = e.MatchNode.ParentNode as Paragraph;

//Get collection of shapes in thsi paragraph

NodeCollection shapes = par.GetChildNodes(NodeType.Shape, true);

foreach (Shape sp in shapes)

{

if (sp.ShapeType == ShapeType.Image)

{

sp.ImageData.Save(String.Format(@"Test017\out_{0}.jpg", imgCounter));

imgCounter++;

}

}

return ReplaceAction.Skip;

}

Otherwise you can’t determine the difference between images programmatically.

Best regards.

Hi again,

thank you very much, this is already an interesting solution. I tried it as well with different words but so far I do not concretely understand where the paragraphs limits/borders are that determine all the words to identify the picture I am searching.

Can please help me with that as well.

Best regards.

Hi<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" />

Thanks for your request. You can enable “Show All Formatting Marks” in MS Word to see the end of each paragraph. (select “Options” from “Tools” menu, select “View” tab in the “Formatting Marks” section check “All” checkbox).

Also you can use DocumentExplorer (Aspose.Words demo project) to inspect structure of document.

Best regards.