Aspose.Words - Opening non-Word files has different behavior

We’ve recently renewed our Aspose.Total for Java license and was additionally looking to see about bumping to the latest versions of the Aspose.Xxx families of products.

One post-processing flow we have converts Office documents to a PDF, but I’m noticing a difference in behavior between versions 20.11 and 23.3.

Before, if we were to send a document through any of the libraries that was NOT supported by that library, we relied on this behavior to throw an exception so we could use a strategy pattern to see if we need to try converting the document with a different Aspose.Xxx family. Here is the gist of the code:

package com.teamnorthwoods.serverless.office

import com.amazonaws.services.s3.model.S3ObjectInputStream
import com.aspose.words.Document
import com.aspose.words.SaveFormat

class WordConverter(stream: S3ObjectInputStream) : PdfConverter() {
    private var document: Document? = null

    init {
        try {
            this.document = Document(stream)
        } catch (ignored: Exception) {
        }
    }

    override fun isSupported() = document != null

    override fun savePdf(outputPath: String) {
        document?.save(outputPath, SaveFormat.PDF)
    }
}

Note that the isSupported method relies on the fact that the init would raise an Exception and say “No, this is not supported.” But with 23.3 rather than behaving like this, it produces a PDF with a large amount of garbage in it. A canonical example of this is using the above code (It’s Kotlin) and feed it a ZIP file and you will see what we are talking about.

Is there something in the latest releases that would allow for us to obtain a similar behavior?

Thanks in advance.

@leviwilson We already fixed this problem in the crrent codebase, the fix will be included into the next 23.4 version of Aspose.Words.
As a temporary solution, you can use FileFormatUtil to detect file format before loading the file:

FileFormatInfo info = FileFormatUtil.detectFileFormat("C:\\Temp\\in.zip");
System.out.println(LoadFormat.toString(info.getLoadFormat())); // Returns Unknown for unknown file formats

Does this only impact Aspose.Words? And if so, which version? We’re a few behind, so if I can go to a somewhat recent that would work as well (without adding these additional checks).

@leviwilson Yes, this impact only Aspose.Words. The problem started to occur after 22.10 version. So while you are waiting for 23.4 version, you can use 22.10 version.

Still seems to happen in 22.10 as well, by the way.

@leviwilson Could you please attach your problematic file here for testing?

It was just a ZIP file, so any would work. I don’t know if it was Aspose.Words or one of the other flavors as we basically loop through each library to ask “Can you give me a PDF from this?” and move onto the next one if it’s an unsupported file. So I didn’t check to see if it was Aspose.Words or something else.

@leviwilson Thank you for additional information. I have rechecked with a simple ZIP file and the following code throws an exception in 22.10 version:

Document doc = new Document("C:\\Temp\\in.zip");

So most likely the problem is causes by some other Aspose library.

The issues you have found earlier (filed as WORDSNET-24982) have been fixed in this Aspose.Words for .NET 23.4 update also available on NuGet.

@alexey.noskov is there a fix for Java or just .NET?

@leviwilson Currently the fix is available only in .NET version. It will be also included into the next 23.4 version Aspose.Words for Java. It will be released within a couple of weeks.