I want to find all the images from the html document and align them with the paragraph.
My html document has several image paragraphs and text paragraphs. I want to find all the images and inline them with the paragraph. Could you please help me for that?
I put a html code part below.
@nethmi Could you please elaborate your requirement in more details or attach your current and expected output documents? I have used the following simple code:
Document doc = new Document(@"C:\Temp\in.html");
doc.Save(@"C:\Temp\out.docx");
Image in output document is lined with the text: out.zip (10.0 KB)
your in.html is correct, I want this output.(check the attached image ). There are many images and paragraphs like this in my html document. I want to do this output to every image.(check the attached image) out.PNG (186.2 KB)
@nethmi When you import HTML document the image in it is imported as inline image. What you need is shape with WrapType.Square. You can change WrapType using code like the following:
Document doc = new Document(@"C:\Temp\in.html");
NodeCollection shapes = doc.GetChildNodes(NodeType.Shape, true);
foreach (Shape s in shapes)
{
s.WrapType = WrapType.Square;
s.Top += s.Height + 10;
}
doc.Save(@"C:\Temp\out.docx");
Also, in the code the shape is shifted bottom to get the output you expect: out.docx (10.7 KB)
Is there a way to add authentication header to the image requests that are in the html.??? Currently the word includes the images from the image src defined in Html as long as authentication header is not needed. (please find my current html document below.) CurrentHTML.zip (5.9 KB)
You can achieve this using IResourceLoadingCallback. In the callback you can implement code to get the requested resources from the resources that requires authentication.
This is my covert method and I want to get image source path to load the image.
public List<byte[]> Convert(ContentsDto content, IDictionary<string, string> options, System.Func<DependentContent, Task<ContentsDto>> GetDependency = null)
{
License htmlLicense1 = new License();
htmlLicense1.SetLicense("Aspose.Words.NET.lic");
var byteArray = new List<byte[]>();
using (var dataStream = new MemoryStream(content.Data))
{
HtmlLoadOptions docOptions = new HtmlLoadOptions();
var document = new Aspose.Words.Document(dataStream, docOptions);
using (var outputStream = new MemoryStream())
{
Node node = document;
NodeCollection shapes = document.GetChildNodes(NodeType.Shape, true);
foreach (Shape shape in shapes)
{
if (shape.ShapeType == ShapeType.Image)
{
WebClient webClient = new WebClient();
AssetteAuthenticationHelper assetteAuthenticationHelper = new AssetteAuthenticationHelper(_authSettings, _settingsProvider.User.UserId, _settingsProvider.User.ClientId);
var token = assetteAuthenticationHelper.GetAuthToken();
webClient.Headers.Add("Authorization", $"Bearer {token}");
var result = webClient.DownloadData(shape.ImageData.SourceFullName);
Image x = null;
//covertion byte to image
using (var ms = new MemoryStream(result))
{
x = Image.FromStream(ms);
}
shape.ImageData.SetImage(x);
shape.WrapType = WrapType.Square;
}
}
document.Save(outputStream, SaveFormat.Docx);
byteArray.Add(outputStream.ToArray());
}
}
return byteArray;
}
From “var result=webClient.DownloadData(shape.ImageData.SourceFullName);” this line I am trying to get image source path. but it’s not working. How can I get the image source path to set authentication header?
@nethmi Could you please attach your input and output documents and provide the expected output? We will check your documents and provide you more information. Unfortunately, from screenshot it is not quite clear what is your expected output.
this zip file includes both html file and converted word file. In the second page, image is out of the margin. Could you please help me to fix it?? documents.zip (230.5 KB)
@nethmi I think the only way to achieve what you need is using layout information. You can achieve this using LayoutCollector and LayoutEnumerator classes. I created a code example that postprocess your DOCX document to produce the expected output:
Document doc = new Document(@"C:\Temp\currentOutput.docx");
// LayoutCollector and LayoutEnumerator will be used to calculate position of shapes.
LayoutCollector collector = new LayoutCollector(doc);
LayoutEnumerator enumerator = new LayoutEnumerator(doc);
NodeCollection shapes = doc.GetChildNodes(NodeType.Shape, true);
foreach (Shape s in shapes)
{
// LayoutCollector and LayoutEnumerator do not work with nodes in header and footer.
// Skip them.
if (s.GetAncestor(NodeType.HeaderFooter) != null)
continue;
PageSetup ps = ((Section)s.GetAncestor(NodeType.Section)).PageSetup;
// Rectangle inside page margin.
float width = (float)(ps.PageWidth - ps.LeftMargin - ps.RightMargin);
float height = (float)(ps.PageHeight - ps.TopMargin - ps.BottomMargin);
RectangleF rect = new RectangleF((float)ps.LeftMargin, (float)ps.TopMargin, width, height);
// Get shape rectangle on the page.
enumerator.Current = collector.GetEntity(s);
RectangleF shapeRect = enumerator.Rectangle;
// Update shape position to place it inside page margins.
if (shapeRect.Left < rect.Left)
s.Left += (rect.Left - shapeRect.Left);
if(shapeRect.Right > rect.Right)
s.Left -= (shapeRect.Right - rect.Right);
if (shapeRect.Top < rect.Top)
s.Top += (rect.Top - shapeRect.Top);
if (shapeRect.Bottom > rect.Bottom)
s.Top -= (shapeRect.Bottom - rect.Bottom);
}
doc.Save(@"C:\Temp\out.docx");
@nethmi Thank you for additional information, but could you please be more specific and elaborate you problem a bit more? Also, it would be great if you create a simple console application that will allow us to reproduce the problem, since some classes used in your code are not available on my side and I cannot simply run the code. And as usual, please, provide your input, output and expected output documents.