We're sorry Aspose doesn't work properply without JavaScript enabled.

# Images from HTML + captions

Hi There,
I’ve been banging my head against this for a day or so now, so thought I would ask the forum for ideas. I apreciate that Aspose.Words doesn’t support this nativly, but would like any workaround/clues to how to acheive the following.
I have some HTML that I am inserting into a document. The HTML has img tags that display fine, but I want them to have captions in the Word doc. In the HTML I can add alt, title, paragraph, whatever is necessary to store the text.
What I would like to be able to do is to iterate over the images in the document, and if they have the alt/title/para (some images will not, like ones originally in the doc template), add it as the image’s caption. Nothing too complicated I would have thought, but my tiny brain can’t get it to work (only eval’ing for the past week - very impressed thus far).
Thanks for any help that anyone can give me.
– Mike.

Hi
Thanks for your request. Captions in MS Word are represented as “Lable” + “SEQ field”. You can use DocumentBuilder for inserting captions. For example see the following code:

Document doc = new Document(@"Test213\in.doc");
DocumentBuilder builder = new DocumentBuilder(doc);
// Get collection of shapes inthe document
NodeCollection shapes = doc.GetChildNodes(NodeType.Shape, true);
// Loop through all shpes
foreach (Shape shape in shapes)
{
if (shape.ShapeType == ShapeType.Image)
{
// Move DocumentBuilder cursor to shape
// Cursor will be placed after shape
if (shape.NextSibling != null)
builder.MoveTo(shape.NextSibling);
else
builder.MoveTo(shape.ParentParagraph);
// Insert paragraph break
builder.Writeln();
// Insert lable
builder.Write("Picture");
// Insert SEQ field
builder.InsertField(@"SEQ Picture \* ARABIC", "");
}
}
// Save result document
doc.Save(@"Test213\out.doc");


But note that Aspose.Words does not evaluate SEQ fields. So you should update fields inside the document manually (ctrl+A and F9). Also you can use macro to update fields. See the following link to learn more.
https://support.microsoft.com/en-us/topic/the-filename-field-does-not-automatically-update-when-you-open-a-document-in-word-de2bfb95-d990-1ced-a618-5ac0a2ec1be4
Also if you need that some text appears just after image you can try using the following HTML:

<html>
<body>
<img src="http://www.aspose.com/Images/aspose-logo.jpg" />
<p><span>This is caption</span></p>
</body>
</html>


Hope this helps.
Best regards.

Many thanks Alexey, this works really well, other than in the following condition…
Say I have a HTML doc where I want some images to have captions, and others not. If all images need captions, your solution above works really well However, if I only want some to have captions, I can’t easily just pull the next paragraph and use it as the caption (it might just be “normal” text).
The first (prefered) is to perhaps use the HTML Fig markup http://www.w3.org/MarkUp/html3/figures.html - that way there’s is a clear association with the figure and caption. Then 's get captions, whereas s would not. However, I can’t see how Aspose.Word deals with this kind of markup, and how to identify it.
The other method is to use the alt tag of the image - if there is an alt attribute, then that becomes the caption (or could dictate that the next paragraph is/is-not a caption- I don’t really mind either way). Once again, I can’t see how the library deals with attributes as the title and alternativetext properties always seem to be empty.
Any pointers on what would be the best method?
Cheers,
Mike.

Hi
Thanks for your request. I think you can insert placeholder right after image and after inserting replace this placeholder with caption. Placeholder is just plain text. See the following HTML:

<html>
<body>
<img src="http://www.aspose.com/Images/aspose-logo.jpg" />
<p><span>==CAPTION==</span></p>
</body>
</html>


After inserting you can replace the placeholder with caption using ReplaceEvaluator. See the following code:

public void Test006()
{
// Create Document and DocumentBuilder
Document doc = new Document();
DocumentBuilder builder = new DocumentBuilder(doc);
// Insert HTML into the document
builder.InsertHtml(html);
// Replace placeholders with captions
Regex regex = new Regex("==CAPTION==");
doc.Range.Replace(regex, new ReplaceEvaluator(ReplaceInsertCaption), false);
// Save output document
doc.Save(@"Test006\out.doc");
}
private ReplaceAction ReplaceInsertCaption(object sender, ReplaceEvaluatorArgs e)
{
// Create document builder
DocumentBuilder builder = new DocumentBuilder(e.MatchNode.Document);
// Move to matched node
builder.MoveTo(e.MatchNode);
// Insert caption
// Insert lable
builder.Write("Picture ");
// Insert SEQ field
builder.InsertField(@"SEQ Picture \* ARABIC", "");
// Remove placeholder
e.Replacement = string.Empty;
return ReplaceAction.Replace;
}


Hope this helps.
Best regards.