Word to html conversion using Java | The document appears to be corrupted and cannot be loaded

Hi,
We are using aspose words for Java 11.4 for converting word documents to html.

We are trying to read the word document as :
Document srcDoc = new Document(srcDocPath);

but getting below exception :
Exception in thread “main” com.aspose.words.FileCorruptedException: The document appears to be corrupted and cannot be loaded.
at com.aspose.words.FileFormatUtil.Z(Unknown Source)
at com.aspose.words.Document.Z(Unknown Source)
at com.aspose.words.Document.Ã(Unknown Source)
at com.aspose.words.Document.(Unknown Source)
at com.aspose.words.Document.(Unknown Source)
at GenerateHtml.getDocuments(GenerateHtml.java:106)
at GenerateHtml.main(GenerateHtml.java:79)
Caused by: java.lang.IllegalArgumentException: duplicate
at asposewobfuscated.G2.Z(Unknown Source)
at com.aspose.words.CustomXmlPropertyCollection.add(Unknown Source)
at com.aspose.words.NZ.Rj(Unknown Source)
at com.aspose.words.ZXQ.Ã(Unknown Source)
at com.aspose.words.ZXQ.Ã(Unknown Source)
at com.aspose.words.ZXQ.Y(Unknown Source)
at com.aspose.words.ZXQ.Ã(Unknown Source)
at com.aspose.words.ZXQ.Ã(Unknown Source)
at com.aspose.words.ZXQ.Zm(Unknown Source)
at com.aspose.words.ZXQ.parse(Unknown Source)
at com.aspose.words.VV.read(Unknown Source)
at com.aspose.words.Document.Z(Unknown Source)
… 5 more

Tested with latest 11.6 as well but getting same exception.
We will attach the sample document later. Mean while please let us know if there is any way (for 11.4) to ignore any file corrupt exceptions or if we can load the document ignoring xml markups nodes/info in document.

Thank you.

Hi Sonali,

Thanks for your query. It would be great if you please share your document for investigation purposes. We will let you know about the details of this issue once we have sample document.

Hi Sonali,


Thanks for your inquiry. We always encourage our customers to use the latest release versions of Aspose.Words as they contains newly introduced features, enhancements and fixes for issues reported earlier. I would suggest you please upgrade to the latest version (11.6.0) and let us know how it goes on your side. You can download it from the following:
http://www.aspose.com/community/files/72/java-components/aspose.words-for-java/default.aspx


Best Regards,

Hi Awais, Tahir,
As mentioned earlier ,same issue exists with latest 11.6 as well.
The sample document is attached.

We guess it is due to xml tags in document.
Let us know how to ignore the exception/xml tag nodes while loading the document. We are not using xml nodes later in our application so we do not want to stop when some issue in these xml tags while loading document.

If the exception is due to some other issue ,then let us know the cause and possible workaround with aspose. (We do not have control on input document to alter it manually and then process).

Please let us know at the earliest.

Hi Sonali,

Thanks for sharing the details. Yes, this issue is due to first xml tags (researchcomponent). I have managed to reproduce the same issue at my side. I have logged this issue as WORDSNET-6844 in our issue tracking system. I have linked this forum thread to the same issue and you will be notified via this forum thread once this issue is resolved.

We apologize for your inconvenience.

The issues you have found earlier (filed as WORDSNET-6844) have been fixed in this .NET update and this Java update.


This message was posted using Notification2Forum from Downloads module by aspose.notifier.