Save As Word DOCX Document to HTML C# .NET | Images As Base64 | Export Headers Footers | Update Word Count

hi,

I am using Aspose.words Evaluation copy for dotnet 4.0 runtime , the version is 16.11.0.0

I am getting “Object reference not set to an instance of an object.” when I save the attached word file using Document.save method to a html file as below.

Other files that I was testing worked fine for me.

Looks like error is due to the presence of chart in the word, if I remove it the saving to html works fine , so can you let me know how to save to html with chart included?

var sourceFilePath = HostingEnvironment.MapPath(pathtothefile);
string htmlString = string.Empty;
Aspose.Words.Saving.HtmlSaveOptions htmlSaveOptions = new Aspose.Words.Saving.HtmlSaveOptions();
htmlSaveOptions.ExportImagesAsBase64 = true;
htmlSaveOptions.ExportHeadersFootersMode = Aspose.Words.Saving.ExportHeadersFootersMode.PerSection;
htmlSaveOptions.SaveFormat = Aspose.Words.SaveFormat.Html;
htmlSaveOptions.PrettyFormat = true;

Aspose.Words.Document doc = new Aspose.Words.Document(sourceFilePath);
doc.Save(destinationFileName, htmlSaveOptions);

Could you let me know what is the issue and how to fix it?

Hi,

Thanks for your inquiry.

While using the latest version of Aspose.Words i.e. 16.11.0, we managed to reproduce this issue on our end. We have logged this issue in our bug tracking system. The ID of this issue is WORDSNET-14487. Your request has also been linked to the appropriate issue and you will be notified as soon as it is resolved. Sorry for the inconvenience.

Best regards,

Hi,

In addition to the above issue that we have, I also came across few more issue when saving document as html.

1> whenever we import a document with Image with the formatting set to “Line” Solid-line in word document, when we convert this document to HTML, we always see that there are double border for the images instead of single solid line as in word document.
You can checkout the attached document 1.doc and 1.html

2> whenever we have page numbers setup in the word document and converted to HTML, the generated HTML shows 1st page number in the end after some space, is it possible to ignore page numbers when converting to HTML?

I have attached the example file 2.doc.

3> We have a usecase to restrict the files if they have certain number of characters in them say 15000 characters with spaces, I have used the below code to get the number of characters with space
Aspose.Words.Document doc = new Aspose.Words.Document(sourceFilePath);
var charactersWithSpace = doc.BuiltInDocumentProperties.CharactersWithSpaces

but we always see that they are different from what word document show, how can we make sure that the number of characters are same as word so that we can validate the document , you can use the 1.docx in the attached file to see that the number of characters that Aspose is providing is different than word.

All the files that I referenced are in the attached zip file.

Can you please get back to us soon on these 4 issues(Including the original).

Hi,

1 - We have logged this issue as WORDSNET-14537. We will inform you as soon as this issue is resolved. We apologize for any inconvenience.

2 - Please use the following code:

Document doc = new Document(MyDir + @"2.docx.docx");
HtmlSaveOptions opts = new HtmlSaveOptions(SaveFormat.Html);
opts.ExportHeadersFootersMode = ExportHeadersFootersMode.None; 
doc.Save(MyDir + @"16.11.0.html", opts);

3 - Please use the following code:

Document doc = new Document(MyDir + @"1.docx.docx");
doc.UpdateWordCount();
Console.WriteLine(doc.BuiltInDocumentProperties.CharactersWithSpaces);

Hope, this helps.

Best regards,

Thanks for your answers,

I was able to figure out the answers to number 3 issue when I looked into other answer in the forum.

Thanks for your answer for for issue number 2, I hope it affects only Page numbers and not text in the Header footer?

Yes, please let me know once you have answer for the first issue.

Hi,

1 - We will inform you via this thread as soon as this issue is resolved.

2 - The ExportHeadersFootersMode.None option will not export any content (page numbers, text, images, tables etc) in headers/footers to HTML format. Here are possible options:

Member Name Description
None Headers and footers are not exported.
PerSection Primary headers and footers are exported at the beginning and the end of each section.
FirstSectionHeaderLastSectionFooter Primary header of the first section is exported at the beginning of the document and primary footer is at the end.
FirstPageHeaderFooterPerSection First page header and footer are exported at the beginning and the end of each section.

Best regards,

The issues you have found earlier (filed as WORDSNET-14487) have been fixed in this Aspose.Words for .NET 16.12.0 update and this Aspose.Words for Java 16.12.0 update.

This message was posted using Notification2Forum from Downloads module by aspose.notifier.

Thanks for the fix and updated version.

I have used this version and I can confirm that it indeed fixed our issue. We are definitely buying licence for this product.

In my second message in the thread, I have had the below problem where in some images in the word when converted to html appear with double Image border,

1> whenever we import a document with Image with the formatting set to “Line” Solid-line in word document, when we convert this document to HTML, we always see that there are double border for the images instead of single solid line as in word document.

I have re attached the file that is causing this issue, can you please look into this as well and see if we have any solution for it.

Thanks
Swamy

Hi Swamy,

Thanks for your inquiry.

Unfortunately your other issue (WORDSNET-14537) is not resolved yet. We will inform you via this thread as soon as this issue is resolved. We apologize for your inconvenience.

Best regards,

Hi thanks for looking into the issue, Our customer has tested the fix that you made related my original post. They have come with the following issues.

  1. If there is Image and Text next to each other , some times they switch place from Left to right , See original document and the highlighted item in generated HTML in Red

  2. Some text from the text area is missing in the generated HTML, please look at the output HTML green highlighted item and compare with the original word.

  3. The chart size seems to be altered and the gap between the text is more , see highlighted in yellow

Please let us know what are the issues in the above can be fixed

Thanks in advance,
Swamy

Hi Swamy,

Thanks for your inquiry.

Aspose.Words seems to mimic the way the Microsoft Word works. I converted your ‘source html.docx’ file to Html format using Microsoft Word 2016 and have attached the resultant file (msw-2016.htm) here for your reference. You can see that Aspose.Words 16.12.0 almost produces an output similar to Microsoft Word 2016. I used the following code on my end:

Document doc = new Document(MyDir + @"source+html.docx");
Aspose.Words.Saving.HtmlSaveOptions htmlSaveOptions = new Aspose.Words.Saving.HtmlSaveOptions();
htmlSaveOptions.ExportImagesAsBase64 = true;
htmlSaveOptions.ExportHeadersFootersMode = Aspose.Words.Saving.ExportHeadersFootersMode.PerSection;
htmlSaveOptions.SaveFormat = Aspose.Words.SaveFormat.Html;
htmlSaveOptions.PrettyFormat = true;
doc.Save(MyDir + @"16.12.0.html", htmlSaveOptions);

Please create a comparison screenshot which shows the problematic/unacceptable area(s) in Aspose.Words generated Html (as compared to msw-2016.htm) and attach it here for our reference. We will investigate the issue(s) further on our end and provide you more information.

Best regards,

Hi ,

Thanks for your response, Ok, here in I have attached the HTML that Aspose is generated and the differences.PNG file to show the differences to your uploaded html.

You can see the differences mainly in the converted Image, and position of some of the images, some time left ones appear to be in right as pointed in my differences.png file

Please let me know if these will be fixed.

Thanks,
Swamy

Hi Swamy,

Thanks for your inquiry.

While using the latest version of Aspose.Words i.e. 16.12.0, we managed to reproduce these issues on our end. We have logged the following issues in our bug tracking system.

WORDSNET-14704: Text in image is missing when exporting to HTML
WORDSNET-14705: More vertical spacing between lines added when exporting to HTML
WORDSNET-14706: Chart image size seems bigger than original when exporting to HTML
WORDSNET-14707: Image position is different when exporting to HTML

Your thread has also been linked to the appropriate issues and you will be notified as soon as they are resolved. Sorry for the inconvenience.

Best regards,

The issues you have found earlier (filed as WORDSNET-14537) have been fixed in this Aspose.Words for .NET 17.2.0 update and this Aspose.Words for Java 17.2.0 update.

This message was posted using Notification2Forum from Downloads module by aspose.notifier.

The issues you have found earlier (filed as WORDSNET-14705) have been fixed in this Aspose.Words for .NET 17.3.0 update and this Aspose.Words for Java 17.3.0 update.

This message was posted using Notification2Forum from Downloads module by aspose.notifier.

The issues you have found earlier (filed as WORDSNET-14704;WORDSNET-14705) have been fixed in this Aspose.Words for .NET 17.3.0 update and this Aspose.Words for Java 17.3.0 update.

This message was posted using Notification2Forum from Downloads module by aspose.notifier.

hi Thanks for the update,

This is to let you know that 17.2.0 version which fixed the issue reported in WORDSNET-14537 works fine,

however, the version 17.3.0 which is supposedly fixed WORDSNET-14705 , WORDSNET-14704;
is not working as expected, I see the same issue that I originally reported in my reply Id
18897 in reply to 818735 is still occurring and this update not fixed those issues, .

Additionally this(17.3.0) also reverted the fix that was made in 17.2.0 for issue WORDSNET-14537 . so I had to revert my changes back to 17.2.0 which seems to be stable. Please update me on my other issues that I reported.

Additional to the above issues, I also want to report one new issue that I have encountered,

It looks like if there is a table in the word document and if its Alignment is set to center, then when we convert that word document to html, even if the cell values are left aligned, the converted html doesn’t seem to carry this cell level styles, only the table’s style is carried over and in the converted htm the table all cells appear center aligned.

Please test the attached word file (test file.docx), I also have attached the resultant html that gets generated, please note the difference in cell value alignment.

Thanks,
Swamy

Hi Swamy,

Thanks for your inquiry.

*Swamy:
Additional to the above issues, I also want to report one new issue that I have encountered,

It looks like if there is a table in the word document and if its Alignment is set to center, then when we convert that word document to html, even if the cell values are left aligned, the converted html doesn’t seem to carry this cell level styles, only the table’s style is carried over and in the converted htm the table all cells appear center aligned.

Please test the attached word file (test file.docx), I also have attached the resultant html that gets generated, please note the difference in cell value alignment.*

In this case, Aspose.Words for .NET 17.4 mimics the behavior of MS Word 2016. I have converted this document to HTML using MS Word 2016 and attached it here for your reference. You can see that MS Word 2016 and Aspose.Words for .NET 17.4 both produce similar HTML.

Also, we are working over your other queries and will get back to you soon.

Best regards,

Hi Swamy,

*Swamy:
however, the version 17.3.0 which is supposedly fixed WORDSNET-14705 , WORDSNET-14704;
is not working as expected, I see the same issue that I originally reported in my reply Id
18897 in reply to 818735 is still occurring and this update not fixed those issues, .

Additionally this(17.3.0) also reverted the fix that was made in 17.2.0 for issue WORDSNET-14537 . so I had to revert my changes back to 17.2.0 which seems to be stable. Please update me on my other issues that I reported.*

We have verified that these problems do not occur when using Aspose.Words for .NET 17.4 on our end. Please upgrade to latest version of Aspose.Words and see how it goes on your end. Hope, this helps. Moreover, we have attached output HTML files here for your reference.

Best regards,

The issues you have found earlier (filed as WORDSNET-14706) have been fixed in this Aspose.Words for .NET 17.5 update and this Aspose.Words for Java 17.5 update.

This message was posted using Notification2Forum from Downloads module by aspose.notifier.