Images from HTML + captions

Hi There,
I’ve been banging my head against this for a day or so now, so thought I would ask the forum for ideas. I apreciate that Aspose.Words doesn’t support this nativly, but would like any workaround/clues to how to acheive the following.
I have some HTML that I am inserting into a document. The HTML has img tags that display fine, but I want them to have captions in the Word doc. In the HTML I can add alt, title, paragraph, whatever is necessary to store the text.
What I would like to be able to do is to iterate over the images in the document, and if they have the alt/title/para (some images will not, like ones originally in the doc template), add it as the image’s caption. Nothing too complicated I would have thought, but my tiny brain can’t get it to work (only eval’ing for the past week - very impressed thus far).
Thanks for any help that anyone can give me.
– Mike.

Hi
Thanks for your request. Captions in MS Word are represented as “Lable” + “SEQ field”. You can use DocumentBuilder for inserting captions. For example see the following code:

Document doc = new Document(@"Test213\in.doc");
DocumentBuilder builder = new DocumentBuilder(doc);
// Get collection of shapes inthe document
NodeCollection shapes = doc.GetChildNodes(NodeType.Shape, true);
// Loop through all shpes
foreach (Shape shape in shapes)
{
    if (shape.ShapeType == ShapeType.Image)
    {
        // Move DocumentBuilder cursor to shape
        // Cursor will be placed after shape
        if (shape.NextSibling != null)
            builder.MoveTo(shape.NextSibling);
        else
            builder.MoveTo(shape.ParentParagraph);
        // Insert paragraph break
        builder.Writeln();
        // Insert lable
        builder.Write("Picture");
        // Insert SEQ field
        builder.InsertField(@"SEQ Picture \* ARABIC", "");
    }
}
// Save result document
doc.Save(@"Test213\out.doc");

But note that Aspose.Words does not evaluate SEQ fields. So you should update fields inside the document manually (ctrl+A and F9). Also you can use macro to update fields. See the following link to learn more.
https://support.microsoft.com/en-us/topic/the-filename-field-does-not-automatically-update-when-you-open-a-document-in-word-de2bfb95-d990-1ced-a618-5ac0a2ec1be4
Also if you need that some text appears just after image you can try using the following HTML:

<html>
<body>
<img src="http://www.aspose.com/Images/aspose-logo.jpg" />
<p><span>This is caption</span></p>
</body>
</html>

Hope this helps.
Best regards.

Many thanks Alexey, this works really well, other than in the following condition…
Say I have a HTML doc where I want some images to have captions, and others not. If all images need captions, your solution above works really well :slight_smile: However, if I only want some to have captions, I can’t easily just pull the next paragraph and use it as the caption (it might just be “normal” text).
Thinking about this, I could use two approaches…
The first (prefered) is to perhaps use the HTML Fig markup http://www.w3.org/MarkUp/html3/figures.html - that way there’s is a clear association with the figure and caption. Then 's get captions, whereas s would not. However, I can’t see how Aspose.Word deals with this kind of markup, and how to identify it.
The other method is to use the alt tag of the image - if there is an alt attribute, then that becomes the caption (or could dictate that the next paragraph is/is-not a caption- I don’t really mind either way). Once again, I can’t see how the library deals with attributes as the title and alternativetext properties always seem to be empty.
Any pointers on what would be the best method?
Cheers,
Mike.

Hi
Thanks for your request. I think you can insert placeholder right after image and after inserting replace this placeholder with caption. Placeholder is just plain text. See the following HTML:

<html>
<body>
<img src="http://www.aspose.com/Images/aspose-logo.jpg" />
<p><span>==CAPTION==</span></p>
</body>
</html>

After inserting you can replace the placeholder with caption using ReplaceEvaluator. See the following code:

public void Test006()
{
    // Create Document and DocumentBuilder
    Document doc = new Document();
    DocumentBuilder builder = new DocumentBuilder(doc);
    // Read HTM from file
    string html = File.ReadAllText(@"Test006\test.html");
    // Insert HTML into the document
    builder.InsertHtml(html);
    // Replace placeholders with captions
    Regex regex = new Regex("==CAPTION==");
    doc.Range.Replace(regex, new ReplaceEvaluator(ReplaceInsertCaption), false);
    // Save output document
    doc.Save(@"Test006\out.doc");
}
private ReplaceAction ReplaceInsertCaption(object sender, ReplaceEvaluatorArgs e)
{
    // Create document builder
    DocumentBuilder builder = new DocumentBuilder(e.MatchNode.Document);
    // Move to matched node
    builder.MoveTo(e.MatchNode);
    // Insert caption
    // Insert lable
    builder.Write("Picture ");
    // Insert SEQ field
    builder.InsertField(@"SEQ Picture \* ARABIC", "");
    // Remove placeholder
    e.Replacement = string.Empty;
    return ReplaceAction.Replace;
}

Hope this helps.
Best regards.