How to inline images with text by using aspose.words for .NET

I want to find all the images from the html document and align them with the paragraph.

My html document has several image paragraphs and text paragraphs. I want to find all the images and inline them with the paragraph. Could you please help me for that?
I put a html code part below.

<p>
	<img width="334" title="imagetest" imageid="fa197b43-5b2a-4c53-bddd" />
</p>

<p>
	<strong>
		SLOW
		THINKING
	</strong>&nbsp;
</p>

<p>
	eryyfhjc djhhjdc adjhcs,jhcsjk adkjhschaihilahdlia ashaihailh ahcsahahiahc ahcwsacjahcjas ahchahihd shahdahd
	eryyfhjc djhhjdc adjhcs,jhcsjk adkjhschaihilahdlia ashaihailh ahcsahahiahc ahcwsacjahcjas ahchahihd shahdahd
	eryyfhjc djhhjdc adjhcs,jhcsjk adkjhschaihilahdlia ashaihailh ahcsahahiahc ahcwsacjahcjas ahchahihd shahdahd
	eryyfhjc djhhjdc adjhcs,jhcsjk adkjhschaihilahdlia ashaihailh ahcsahahiahc ahcwsacjahcjas ahchahihd shahdahd
	eryyfhjc djhhjdc adjhcs,jhcsjk adkjhschaihilahdlia ashaihailh ahcsahahiahc ahcwsacjahcjas ahchahihd shahdahd

</p>

@nethmi Could you please elaborate your requirement in more details or attach your current and expected output documents? I have used the following simple code:

Document doc = new Document(@"C:\Temp\in.html");
doc.Save(@"C:\Temp\out.docx");

Image in output document is lined with the text: out.zip (10.0 KB)

your in.html is correct, I want this output.(check the attached image ). There are many images and paragraphs like this in my html document. I want to do this output to every image.(check the attached image)
out.PNG (186.2 KB)

My html document is like this. many images and paragraphs
inHTML.docx (15.0 KB)

so my current output is like this(check the image )
current.docx (10.5 KB)

I want to do this below output to each and every image in the document,
out.PNG (186.2 KB)

Hii Team,

Is this possible??? Could you please help me?

@nethmi When you import HTML document the image in it is imported as inline image. What you need is shape with WrapType.Square. You can change WrapType using code like the following:

Document doc = new Document(@"C:\Temp\in.html");
NodeCollection shapes = doc.GetChildNodes(NodeType.Shape, true);
foreach (Shape s in shapes)
{
    s.WrapType = WrapType.Square;
    s.Top += s.Height + 10;
}
doc.Save(@"C:\Temp\out.docx");

Also, in the code the shape is shifted bottom to get the output you expect: out.docx (10.7 KB)

1 Like

Hii team,

thank you very much for the support. I have an another question.

  1. After the html is converted to word, is there a way to store the html id, and the class of the image as metadata on the Shape object???

@nethmi

No, there is no such ability.

1 Like

Hii,

ok Thank you very much . I have another question.

  1. Is there a way to add authentication header to the image requests that are in the html.??? Currently the word includes the images from the image src defined in Html as long as authentication header is not needed. (please find my current html document below.)
    CurrentHTML.zip (5.9 KB)

@nethmi

You can achieve this using IResourceLoadingCallback. In the callback you can implement code to get the requested resources from the resources that requires authentication.

This is my covert method and I want to get image source path to load the image.

public List<byte[]> Convert(ContentsDto content, IDictionary<string, string> options, System.Func<DependentContent, Task<ContentsDto>> GetDependency = null)
{
    License htmlLicense1 = new License();
    htmlLicense1.SetLicense("Aspose.Words.NET.lic");

    var byteArray = new List<byte[]>();

    using (var dataStream = new MemoryStream(content.Data))
    {
        HtmlLoadOptions docOptions = new HtmlLoadOptions();
        var document = new Aspose.Words.Document(dataStream, docOptions);

        using (var outputStream = new MemoryStream())
        {
            Node node = document;
            NodeCollection shapes = document.GetChildNodes(NodeType.Shape, true);

            foreach (Shape shape in shapes)
            {
                if (shape.ShapeType == ShapeType.Image)
                {
                    WebClient webClient = new WebClient();
                    AssetteAuthenticationHelper assetteAuthenticationHelper = new AssetteAuthenticationHelper(_authSettings, _settingsProvider.User.UserId, _settingsProvider.User.ClientId);
                    var token = assetteAuthenticationHelper.GetAuthToken();
                    webClient.Headers.Add("Authorization", $"Bearer {token}");

                    var result = webClient.DownloadData(shape.ImageData.SourceFullName);

                    Image x = null;
                    //covertion byte to image
                    using (var ms = new MemoryStream(result))
                    {
                        x = Image.FromStream(ms);
                    }
                    shape.ImageData.SetImage(x);

                    shape.WrapType = WrapType.Square;
                }
            }
            document.Save(outputStream, SaveFormat.Docx);
            byteArray.Add(outputStream.ToArray());
        }
    }
    return byteArray;
}

From “var result=webClient.DownloadData(shape.ImageData.SourceFullName);” this line I am trying to get image source path. but it’s not working. How can I get the image source path to set authentication header?

@nethmi As i mentioned in your case you should use IResourceLoadingCallback. Please see the modified code:

public List<byte[]> Convert(ContentsDto content, IDictionary<string, string> options, System.Func<DependentContent, Task<ContentsDto>> GetDependency = null)
{
    License htmlLicense1 = new License();
    htmlLicense1.SetLicense("Aspose.Words.NET.lic");

    var byteArray = new List<byte[]>();

    using (var dataStream = new MemoryStream(content.Data))
    {
        HtmlLoadOptions docOptions = new HtmlLoadOptions();
        docOptions.ResourceLoadingCallback = new ResourceLoadingCallback();
        var document = new Aspose.Words.Document(dataStream, docOptions);

        using (var outputStream = new MemoryStream())
        {
            NodeCollection shapes = document.GetChildNodes(NodeType.Shape, true);
            foreach (Shape shape in shapes)
            {
                if (shape.ShapeType == ShapeType.Image)
                    shape.WrapType = WrapType.Square;
            }
            document.Save(outputStream, SaveFormat.Docx);
            byteArray.Add(outputStream.ToArray());
        }
    }
    return byteArray;
}

private class ResourceLoadingCallback : IResourceLoadingCallback
{
    public ResourceLoadingAction ResourceLoading(ResourceLoadingArgs args)
    {
        if (args.ResourceType == ResourceType.Image)
        {
            WebClient webClient = new WebClient();
            AssetteAuthenticationHelper assetteAuthenticationHelper = new AssetteAuthenticationHelper(_authSettings, _settingsProvider.User.UserId, _settingsProvider.User.ClientId);
            var token = assetteAuthenticationHelper.GetAuthToken();
            webClient.Headers.Add("Authorization", $"Bearer {token}");

            var result = webClient.DownloadData(args.OriginalUri);
            args.SetData(result);

            return ResourceLoadingAction.UserProvided;
        }

        return ResourceLoadingAction.Default;
    }
}
1 Like

Hii team,

Thank you very much for your support.
Imageissue.PNG (109.8 KB)

How can I fix this image issue??? could you please help me??

@nethmi Could you please attach your input and output documents and provide the expected output? We will check your documents and provide you more information. Unfortunately, from screenshot it is not quite clear what is your expected output.

In here Imageissue.PNG (109.8 KB) image is out of the margin.

I want to adjust it according to the margins.expectedOutput.PNG (117.6 KB)

@nethmi Could you please attach actual documents? It is impossible to analyze the problem without actual document you process.

this zip file includes both html file and converted word file. In the second page, image is out of the margin. Could you please help me to fix it??
documents.zip (230.5 KB)

@nethmi I think the only way to achieve what you need is using layout information. You can achieve this using LayoutCollector and LayoutEnumerator classes. I created a code example that postprocess your DOCX document to produce the expected output:

Document doc = new Document(@"C:\Temp\currentOutput.docx");

// LayoutCollector and LayoutEnumerator will be used to calculate position of shapes.
LayoutCollector collector = new LayoutCollector(doc);
LayoutEnumerator enumerator = new LayoutEnumerator(doc);

NodeCollection shapes = doc.GetChildNodes(NodeType.Shape, true);
foreach (Shape s in shapes)
{
    // LayoutCollector and LayoutEnumerator do not work with nodes in header and footer.
    // Skip them.
    if (s.GetAncestor(NodeType.HeaderFooter) != null)
        continue;

    PageSetup ps = ((Section)s.GetAncestor(NodeType.Section)).PageSetup;
    // Rectangle inside page margin.
    float width = (float)(ps.PageWidth - ps.LeftMargin - ps.RightMargin);
    float height = (float)(ps.PageHeight - ps.TopMargin - ps.BottomMargin);
    RectangleF rect = new RectangleF((float)ps.LeftMargin, (float)ps.TopMargin, width, height);

    // Get shape rectangle on the page.
    enumerator.Current = collector.GetEntity(s);
    RectangleF shapeRect = enumerator.Rectangle;

    // Update shape position to place it inside page margins.
    if (shapeRect.Left < rect.Left)
        s.Left += (rect.Left - shapeRect.Left);

    if(shapeRect.Right > rect.Right)
        s.Left -= (shapeRect.Right - rect.Right);

    if (shapeRect.Top < rect.Top)
        s.Top += (rect.Top - shapeRect.Top);

    if (shapeRect.Bottom > rect.Bottom)
        s.Top -= (shapeRect.Bottom - rect.Bottom);
}

doc.Save(@"C:\Temp\out.docx");

Hii Team,

the above code is not working for me.
I put my full code below. could you please check it??

public List<byte[]> Convert(ContentsDto content, IDictionary<string, string> options, System.Func<DependentContent, Task<ContentsDto>> GetDependency = null)
{
    License htmlLicense1 = new License();
    htmlLicense1.SetLicense("Aspose.Words.NET.lic");

    HtmlDocument htmlDocument = new HtmlDocument();
    using (var htmlStream = new MemoryStream(content.Data))
    {
        htmlDocument.Load(htmlStream);
    }

    var byteArray = new List<byte[]>();

    using (var dataStream = new MemoryStream(content.Data))
    {
        HtmlLoadOptions docOptions = new HtmlLoadOptions();

        docOptions.ResourceLoadingCallback = new AsposeWordsAuthHandler(_settingsProvider, _authSettings, _logger);

        var document = new Aspose.Words.Document(dataStream, docOptions);

        using (var outputStream = new MemoryStream())
        {
            using (var htmlStream = new MemoryStream(content.Data))
            {
                htmlDocument.Load(htmlStream);
            }

            ApplyImageFormatting(document, htmlDocument);
            FormatOutput(GetDependency, document, htmlDocument);
            document.Save(outputStream, SaveFormat.Docx);
            byteArray.Add(outputStream.ToArray());
        }
    }
    return byteArray;
}

private static void FormatOutput(Func<DependentContent, Task<ContentsDto>> GetDependency, Aspose.Words.Document document, HtmlDocument htmlDocument)
{
    document.Sections[0].PageSetup.LeftMargin = 50f;
    document.Sections[0].PageSetup.RightMargin = 50f;
    document.Sections[0].PageSetup.TopMargin = 50f;
    document.Sections[0].PageSetup.BottomMargin = 60f;

    AddHeader(document, GetDependency);
    AddFooter(document);
}

private static void AddHeader(Document document, System.Func<DependentContent, Task<ContentsDto>> GetDependency)
{

    var dependentContent = new DependentContent()
    {
        Id = "fa197b43-5b2a-4c53-bddd-3dad534f0284",
    };
    var content = GetDependency(dependentContent).Result;
    if (content.Found)
    {

        DocumentBuilder builder = new DocumentBuilder(document);

        Section currentSection = builder.CurrentSection;
        PageSetup pageSetup = currentSection.PageSetup;

        pageSetup.HeaderDistance = 20;
        pageSetup.DifferentFirstPageHeaderFooter = true;

        builder.MoveToHeaderFooter(HeaderFooterType.HeaderFirst);
        builder.ParagraphFormat.Alignment = ParagraphAlignment.Center;

        //Initialize a Header Instance

        using (var inputStream = new MemoryStream(content.Data))
        {

            var inputImageFromStream = Image.FromStream(inputStream);

            builder.InsertImage(inputImageFromStream, RelativeHorizontalPosition.Page, 10, RelativeVerticalPosition.Page, 10, 50, 50, WrapType.Through);
            builder.ParagraphFormat.Alignment = ParagraphAlignment.Right;

        }
    }
}

private static void AddFooter(Document document)
{
    DocumentBuilder builder = new DocumentBuilder(document);

    builder.MoveToHeaderFooter(HeaderFooterType.FooterPrimary);
    builder.Write("Page ");
    builder.InsertField("PAGE", "");
    builder.Write(" of ");
    builder.InsertField("NUMPAGES", "");
}

private static void ApplyImageFormatting(Document document, HtmlDocument htmlDocument)
{
    HtmlNodeCollection images = htmlDocument.DocumentNode.SelectNodes("//img");
    Node node = document;
    NodeCollection shapes = document.GetChildNodes(NodeType.Shape, true);
    DocumentBuilder builder = new DocumentBuilder(document);
    var img = 0;

    if (images != null)
    {
        var imgCount = 0;
        foreach (Shape shape in shapes)
        {
            var image = images.ElementAt(imgCount);
            if (image.HasClass("fr-fil"))
            {
                shape.WrapType = WrapType.Square;

            }
            else
            {
                shape.HorizontalAlignment = HorizontalAlignment.Center;
            }
            shape.AllowOverlap = false;
            imgCount++;
        }
    }
}

@nethmi Thank you for additional information, but could you please be more specific and elaborate you problem a bit more? Also, it would be great if you create a simple console application that will allow us to reproduce the problem, since some classes used in your code are not available on my side and I cannot simply run the code. And as usual, please, provide your input, output and expected output documents.