Get formatted text out of shapes

Hello!

I’m converting docx files to HTML. In my source files, there are a variety of shapes containing text (Text boxes, WordArt, etc).

The default behavior of Aspose appears to be to convert these shapes to PNG images. I’ve been able to replace those images with the extracted text via doing something like this:

shapes = doc.getChildNodes(NodeType.SHAPE, true)
for (shape in shapes) {
    parentNode.appendChild(new Run(doc, shape.toString(SaveFormat.TEXT)))
}

… However, this method is just getting the raw text, with no formatting preserved. Is there something similar I can do that will maintain formatting? (By formatting I mean things like bold, underline, etc. And, hopefully it can be extended to include links, tables and lists, etc).

I tried casting each Shape result to a CompositeNode (the getChildNodes method returns objects of type Node), then calling getChildNodes, and calling parent.appendChild with each child Node, but that results in “Cannot insert a node of this type at this location” errors.

Hi Dave,

Thanks for your inquiry. You might be able to achieve this just by getting the HTML representation of Shape and then inserting this HTML inside parentNode as follows:

shapes = doc.getChildNodes(NodeType.SHAPE, true);

DocumentBuilder builder = new DocumentBuilder(doc);

for (Shape shape : (Iterable)shapes)
{
    builder.moveTo(parentNode());
    builder.insertHtml(shape.toString(SaveFormat.HTML));

}

Hope, this helps.

Best regards,