Loading a corrupted file does not lead to FilecorruptedException

Hello,
I have an issue when loading a corrupted file.
Before 22.12, loading a document using

new Document(new ClassPathResource("path/to/corrupted.docx"))

was throwing a FileCorruptedException, but it looks like it does not since this release.
But loading the document using

new Document("path/to/corruped.docx")

throws exception properly.

Is that the expected behavior ?
Thank you
Best regards

@concord_tech Could you please attach the problematic document here for testing? Do I understand correctly that in both cases the same document is used? Please try loading the document from input stream.

@alexey.noskov thanks for fast answer
I wrote 3 quick tests to illustrate

@Test
public void corruptedFileTest() {
	final String resourcePath = "path/to/corrupted.docx";
	assertThatExceptionOfType(UnsupportedFileFormatException.class)
			.isThrownBy(() -> {
				new com.aspose.words.Document(new ByteArrayInputStream(IOUtils.toByteArray(new ClassPathResource(resourcePath).getInputStream())));
			});
}

@Test
public void corruptedFileTest2() {
	final String resourcePath = "path/to/corrupted.docx";
	assertThatExceptionOfType(UnsupportedFileFormatException.class)
			.isThrownBy(() -> {
				new com.aspose.words.Document(new ClassPathResource(resourcePath).getInputStream());
			});
}

@Test
public void corruptedFileTest3() {
	final String absolutePath = "/absolute/path/to/corrupted.docx";
	assertThatExceptionOfType(UnsupportedFileFormatException.class)
			.isThrownBy(() -> {
				new com.aspose.words.Document(absolutePath);
			});
}

In 22.5, they all succeed
In 22.12, the 2 first fails, but the 3rd succeed

I joined the corrupted file used on those tests
corrupted.docx (4.0 KB)

@concord_tech
We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.

Issue ID(s): WORDSNET-26218

You can obtain Paid Support Services if you need support on a priority basis, along with the direct access to our Paid Support management team.

@concord_tech We have completed analyzing the issue. The behavior of document format detection was changed in 22.12 version - it was implemented that if format cannot be determined, and it is not possible to get file extension, the document is opened as a txt document. That is why when the document is loaded from stream no exception is thrown.

Hello @alexey.noskov, thanks for your answer.
I tried to set the file type in LoadOptions, but it leads to the same result, the following test fails (tried on 22.12 and 23.11)

@Test
public void corruptedFileTest() throws Exception {
	final String resourcePath = "path/to/corrupted.docx";
	final var docStream = new ClassPathResource(resourcePath).getInputStream();
	final var loadOptions = new LoadOptions();
	loadOptions.setLoadFormat(LoadFormat.DOCX);
	assertThatExceptionOfType(Exception.class)
			.isThrownBy(() -> new com.aspose.words.Document(docStream, loadOptions));
}

Is it still possible to detect or force a file extension on Document loading when the file is corrupted?

@concord_tech you can use FileFormatUtil.detectFileFormat method to detect file format of the specified file or stream. In your case Aspose.Words returns LoadFormat.Unknown for the corrupted document you have attached earlier:

FileFormatInfo info = FileFormatUtil.detectFileFormat("C:\\Temp\\corrupted.docx");
System.out.println(LoadFormat.toString(info.getLoadFormat()));

The issues you have found earlier (filed as WORDSNET-26218) have been fixed in this Aspose.Words for Java 24.11 update.