Trouble extracting shapes from Word docs

pfoster · March 23, 2012, 12:34pm

I'm trying to parse a word doc, in particular extract figures. Figures in these documents are a set of Shapes on the drawing layer and InlineShapes on the text layer. They aren't grouped, or children of a particular canvas.

Before I started trying Aspose.Words, I was using the Microsoft Interop API to first turn the Inline shapes into Shapes (a conversion method is available on the InlineShape interface), then reason on the shape's location to determine what shapes make up a single figure. That's where I stopped with Interop. I need to be able to extract the set of shapes as a single SVG (preferebly), or bitmap. Word has not such functionality in the interop api.

Enter Aspose.Words. I've attempted to group the shapes under a single ShapeGroup object, but when I call GetShapeRenderer().Save(...) the output image files are blank. I added the individual Shapes to the ShapeGroup using AppendChild(), then set the Width and Height of the ShapeGroup to a boundary that encompasses all shapes.

What am I doing wrong?

BTW, these word docs are the result of PDF->DOCX conversion. I've done this because I find the word apis easier than the PDF ones. However, if the PDF API for Aspose is as easy to use/understand as the Aspose.Words Document Object Model, I may read the PDF directly. Thoughts on that?

Update:

I tried using .GetChildNodes(Aspose.Words.NodeType.Shape, true) to find all Shapes in the document or section, and was surprised to find I only got a small fraction of them. Trying the snapshot vs. live didn't seem to make a difference.

awais.hafeez · March 26, 2012, 12:01pm

Hi Paul,

Thanks for your inquiry. Could you please attach the input Word document, you want to extract the shapes from, here for testing? I will investigate the structure of your document and provide you a code snippet.

Best Regards,

pfoster · March 26, 2012, 4:15pm

Here’s a sample document with the types of objects I’m referring to.

adam.skelton · March 29, 2012, 11:03pm

Hi there,

Thanks for your input document.

Could you also attach your code here for testing?

Thanks,

pfoster · March 30, 2012, 6:24am

I've moved past this issue using Microsoft Interop. I've decided to use it instead of Aspose.

Regardless,

Thanks for your attention to this issue. You can consider it closed.

adam.skelton · April 2, 2012, 12:21am

Hi there,

It’s great you were able to find a solution to your problem. Please feel free to ask us if you require anything else.

Thanks,