FileCorruptedException when converting HTML to DOCX - Verified Bug

Hello,


We (Docstoc Inc, Los Angeles) are using Aspose Products to convert documents to different formats such as Docx, Html, Pdf etc.

We have a flow where our internal system generates a valid MS WORD document (.docx format, not using Aspose) and we convert this document to HTML using Aspose Words .NET 14.7.0. Then we let people to customize this HTML document using a limited WYSIWYG editor, and converting this HTML back to MS WORD.

The flow is:
1. Docx to Html
2. Html to Docx

We are having a “Aspose.Words.FileCorruptedException - The document appears to be corrupted and cannot be loaded” exception while trying to convert (Aspose Words .Net generated) HTML back to Docx format.

I created a sample console app to find out what was breaking HTML to DOCX conversion and found out that HTML file had some elements with inline styles such as -aw-headerfooter-type: header-primary; -aw-different-first-page: true; etc. Here is an example:


 



I noticed that these styles were not defined in the Html file thus I stripped out any inline style starting with ‘-aw-’ using a Regular Expression and the Html file was able to convert to Docx properly. I assume these styles are being used for preserving formatting in HTML somehow but they were causing a FileCorruptedException.

I am attaching the sample html and the code so you guys can also verify the issue. Using .NET 4.5, library version is 14.7.0 which is latest at the moment.

Thanks,
Cihan

Hi Cihan,


Thanks for your inquiry.

While using the latest version of Aspose.Words i.e. 14.7.0, I managed to reproduce this exception on my side. I have logged this issue in our bug tracking system. The ID of this issue is WORDSNET-10597. Your thread has also been linked to this issue and you will be notified as soon as it is resolved. Sorry for the inconvenience.

Best regards,

Thanks Awais. I am looking forward to download the next version asap.

Best,

Cihan

Hi Cihan,


Thanks for your inquiry. We will inform you via this thread as soon as this issue is resolved. We apologize for any inconvenience.

Best regards,

The issues you have found earlier (filed as WORDSNET-10597) have been fixed in this .NET update and this Java update.


This message was posted using Notification2Forum from Downloads module by aspose.notifier.