Word document modified with Apache POI is no readable by Aspose

We are doing some automatic document processing, at the moment with Apache POI and conversion services in the cloud to create PDFs out of it. Now we are trying to find a software we can use to do the PDF conversion on premise.

Aspose.Words seem to work fine, better than most of the other libraries I tested already. But unfortunately it is not able to open the docx modified with the Apache POI library. The document is readable by other libraries/services, libreoffice and MS Word without any problems.

I just get this error message:

com.aspose.words.FileCorruptedException: The document appears to be corrupted and cannot be loaded.
	at com.aspose.words.FileFormatUtil.zzZ(Unknown Source)

Is there any way to get more information why this fails? Or is there a “fix this document” option in Aspose.Words?
I uploaded the file here: https://drive.google.com/file/d/1Lf6HTMhU2OqRniRg9PJeBsJusyd7ecWx/view?usp=sharing

Thanks!

@paulwellnerbou,

We tested the scenario and have managed to reproduce the same problem on our end. For the sake of correction, we have logged this problem in our issue tracking system. The ID of this issue is WORDSNET-18712. We will further look into the details of this problem and will keep you updated on the status of correction. We apologize for your inconvenience.

Thank you very much.

As we will need the on premise solution soon, can you give us a rough time estimation when we may hear back from you? (Just to know if there is any chance to go with Aspose.Words or if we have to look for an alternative.)

By the way: We are using Java (Kotlin), not sure if the “NET” in WORDSNET has to do anything with .NET.

Thank you very much and kind regards
Paul

@paulwellnerbou,

This issue is currently pending for analysis and is in the queue. I am afraid, there is no ETA (time frame) available at the moment. We will inform you via this thread as soon as any estimates or further updates are available. We apologize for your inconvenience.

Secondly, the latest version of Aspose.Words for Java is completely auto-ported from .NET, i.e. we do not write code for Aspose.Words for Java; it is generated out automatically from C# code of Aspose.Words for .NET. In your case, the issue which was logged with WORDSNET prefix, would be auto resolved for Java variant of Aspose.Words. Your problem (WORDSNET-18712) will be fixed in Aspose.Words for Java as soon as the linked issue is resolved.

Thank you very much.

I will look for alternatives then as well. If there’s any workaround we could do programmatically to “fix” the document to make it work with Aspose, please let me know.

@paulwellnerbou,

I am afraid, you will not be able to fix this document programmatically by using Aspose.Words. However, you can use MS Word to fix this issue during loading with Aspose.Words. Please open ‘out.docx’ with MS Word, go to File | Save As, specify another File Name and choose Type as *.docx. Click Save. Now, please try loading this newly saved document with Aspose.Words. Hope, this helps.

Rest assured, we will inform you via this thread as soon as this issue is resolved. We apologize for any inconvenience.

@paulwellnerbou,

We have good news for you i.e. WORDSNET-18712 has now been resolved. The fix of this issue will be included in the 19.7 (next version) of Aspose.Words. We will inform you via this thread as soon as the next version containing the fix of your issue will be released at the start of next month.

The issues you have found earlier (filed as WORDSNET-18712) have been fixed in this Aspose.Words for .NET 19.7 update and this Aspose.Words for Java 19.7 update.