We're sorry Aspose doesn't work properply without JavaScript enabled.

Free Support Forum - aspose.com

Issue in Word to Html and Back to Word File Conversion

Hi team,

using aspose.Words .net 4.7
I am converting word file into html, which is not in proper format but it is similar to web layout option in MS Word Office, So, we cant expect better html then this option thats fine.
but issue is when I am regenerating Docx file back from same html file, I am getting distorted docx file, I expect to get same regenrated docx file like original one.

This is the code sample I am using for docx to html conversion.

Aspose.Words.Saving.HtmlSaveOptions options = new Aspose.Words.Saving.HtmlSaveOptions();
options.Encoding = System.Text.Encoding.UTF8;
options.AllowNegativeIndent = true;
options.ExportImagesAsBase64 = true;
options.ExportFontResources = true;
options.ExportFontsAsBase64 = true;
options.ExportTocPageNumbers = true;
options.ExportOriginalUrlForLinkedImages = true;
options.ExportPageSetup = true;
options.ExportLanguageInformation = true;
options.ExportRelativeFontSize = true;
options.ExportTextBoxAsSvg = true;
options.ExportHeadersFootersMode = Aspose.Words.Saving.ExportHeadersFootersMode.PerSection;
//options.ExportPageMargins = true;

and this is sample code I am using to reconvert html file back to docx.

Document doc = new Document(filepath);
doc.Save(docxfile,SaveFormat.Docx);

Please find attached files for your reference(includes original file , html file, regenerated file from html)
there is vast difference in regenerated and original file
Documents.zip (1005.6 KB)

can you suggest me If i need some improvement in code part…or is there any possible way to improve this.

@kotharishrey,

You are using a very old version on your end i.e. Aspose.Words for .NET 17.12. We suggest you please upgrade to the latest version of Aspose.Words for .NET i.e. 18.12 and see how it goes on your end. Hope, this helps.

Please also set the HtmlSaveOptions.ExportRoundtripInformation to ‘true’. Please see these output documents which were produced on our end by using the following code:

Attachment: 18.12-html-docx-outputs.zip (755.8 KB)

Document doc = new Document("E:\\Documents\\CTC_Process_Flows__28final_29__2800_29.docx");

HtmlSaveOptions options = new HtmlSaveOptions(SaveFormat.Html);
options.ExportRoundtripInformation = true;
options.Encoding = System.Text.Encoding.UTF8;
options.AllowNegativeIndent = true;
options.ExportImagesAsBase64 = true;
options.ExportFontResources = true;
options.ExportFontsAsBase64 = true;
options.ExportTocPageNumbers = true;
options.ExportOriginalUrlForLinkedImages = true;
options.ExportPageSetup = true;
options.ExportLanguageInformation = true;
options.ExportRelativeFontSize = true;
options.ExportTextBoxAsSvg = true;
options.ExportHeadersFootersMode = Aspose.Words.Saving.ExportHeadersFootersMode.PerSection;

doc.Save("E:\\Documents\\18.12.html", options);

Document html = new Document("E:\\Documents\\18.12.html");
html.Save("E:\\Documents\\18.12.docx");

Hi @awais.hafeez

Thanks for our reply.

options.ExportRoundtripimformation is by default true.

And after updating package to 18.12 still issue is not resolved …still I am seeing many distortions in regenerated docx.

I am facing one more issuie in Html Conversion…that Image is coming ii front of text …which I am not able to see in weblayout option of original docx(MS Office).
Please find attached document and screenshot showing error

Docx3.zip (7.1 MB)

@kotharishrey,

Thanks for your inquiry. But when you save this .docx to .html format by using Microsoft Word 2019, you will observe the same behavior. Please see attached Microsoft Word 2019 generated .html document (msw-2019.zip (696.8 KB)). So, this seems to be an expected behavior as Aspose.Words mimics Microsoft Word in this case. If we can help you with anything else, please feel free to ask.

However, may be you should convert this document to HTML Fixed format. We can offer you another solution i.e. converting the Word document to the HTML format using absolutely positioned elements (HtmlFixed) may produce the expected output for you. Here is how you can use HtmlFixedSaveOptions to get output in HtmlFixed save format. But please also note that the HTML Fixed format cannot be used for Word-HTML-Word round-trips.

Document doc = new Document("E:\\Docx3\\ArrMaz_PS_May 28 2015 (1).docx");

HtmlFixedSaveOptions opts = new HtmlFixedSaveOptions();
opts.ExportEmbeddedImages = (true);
opts.ExportEmbeddedCss = (true);
opts.ShowPageBorder = (false);

doc.Save("E:\\Docx3\\18.12.html", opts);

Hi awaiz

when I tried to save file with ms office with save as option to web page I am getting correct output(Image is not in front of text)

please find attached files
File.zip (8.6 MB)

@kotharishrey,

We have converted your ArrMaz_PS_May 28 2015 (1).docx document to plain HTML format (see 18.12.File.zip (3.8 MB)). Please open this 18.12.html file with MS Word and set Web Layout from View tab. You will notice that the images in question will cause no content overlapping. The overlapping occurs only when viewing the HTML files with web browsers such as Chrome.

Yes thats the issue…I dont want overlapping in browser…is there any solution for that.
My requirement is converting docx to html but some content of html is not accessiblt because of image overlapping.
How to avoid that.

Additional Issue I am facing in Html conversion of this file is
As seen in Screenshoot “Executive Summary” box is treated as image in html but it is text box in original docx file
how get access to such text?

@kotharishrey,

Regarding the ‘ArrMaz_PS_May 28 2015 (1).docx’ file, we have logged the overlapping problem and TextBox being converted to Image in HTML in our issue tracking system. Your ticket number is WORDSNET-17943. We will further look into the details of this problem and will keep you updated on the status of the linked issue. We apologize for your inconvenience.

A post was split to a new topic: Information required - Aspose.Words

Hi awaiz

I found solution for text box getting converted to image in html.

If you refer to screenshot which I provided there image+Rectange+Textbox is included as Group in source file.

If we manually ungroup it and then convert file to html, we will get access to text box.

So now pending issue is even If I save docx to html manually with MS Office I am not able to see image overlapping…but when I am converting with aspose image is overlapped with precedding text.
Seeking for this solution sooner.
Thanks for your help

@kotharishrey,

It is great that you were able to resolve this issue on your end.

Sure, we will inform you via this thread as soon as this issue is resolved. We apologize for any inconvenience.

HiTeam

I am not able to trace ticket status formy defect can you help me with what is the status of the ticket raised here

@kotharishrey,

Your issue is currently ‘pending for analysis’ and is in the queue. We will inform you via this thread as soon as this issue is resolved. We apologize for any inconvenience.