Hello,
I have an issue when loading a corrupted file.
Before 22.12, loading a document using
new Document(new ClassPathResource("path/to/corrupted.docx"))
was throwing a FileCorruptedException, but it looks like it does not since this release.
But loading the document using
new Document("path/to/corruped.docx")
throws exception properly.
Is that the expected behavior ?
Thank you
Best regards
@concord_tech Could you please attach the problematic document here for testing? Do I understand correctly that in both cases the same document is used? Please try loading the document from input stream.
@alexey.noskov thanks for fast answer
I wrote 3 quick tests to illustrate
@Test
public void corruptedFileTest() {
final String resourcePath = "path/to/corrupted.docx";
assertThatExceptionOfType(UnsupportedFileFormatException.class)
.isThrownBy(() -> {
new com.aspose.words.Document(new ByteArrayInputStream(IOUtils.toByteArray(new ClassPathResource(resourcePath).getInputStream())));
});
}
@Test
public void corruptedFileTest2() {
final String resourcePath = "path/to/corrupted.docx";
assertThatExceptionOfType(UnsupportedFileFormatException.class)
.isThrownBy(() -> {
new com.aspose.words.Document(new ClassPathResource(resourcePath).getInputStream());
});
}
@Test
public void corruptedFileTest3() {
final String absolutePath = "/absolute/path/to/corrupted.docx";
assertThatExceptionOfType(UnsupportedFileFormatException.class)
.isThrownBy(() -> {
new com.aspose.words.Document(absolutePath);
});
}
In 22.5, they all succeed
In 22.12, the 2 first fails, but the 3rd succeed
I joined the corrupted file used on those tests
corrupted.docx (4.0 KB)
@concord_tech
We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.
Issue ID(s): WORDSNET-26218
You can obtain Paid Support Services if you need support on a priority basis, along with the direct access to our Paid Support management team.
@concord_tech We have completed analyzing the issue. The behavior of document format detection was changed in 22.12 version - it was implemented that if format cannot be determined, and it is not possible to get file extension, the document is opened as a txt document. That is why when the document is loaded from stream no exception is thrown.
Hello @alexey.noskov, thanks for your answer.
I tried to set the file type in LoadOptions, but it leads to the same result, the following test fails (tried on 22.12 and 23.11)
@Test
public void corruptedFileTest() throws Exception {
final String resourcePath = "path/to/corrupted.docx";
final var docStream = new ClassPathResource(resourcePath).getInputStream();
final var loadOptions = new LoadOptions();
loadOptions.setLoadFormat(LoadFormat.DOCX);
assertThatExceptionOfType(Exception.class)
.isThrownBy(() -> new com.aspose.words.Document(docStream, loadOptions));
}
Is it still possible to detect or force a file extension on Document loading when the file is corrupted?
@concord_tech you can use FileFormatUtil.detectFileFormat
method to detect file format of the specified file or stream. In your case Aspose.Words returns LoadFormat.Unknown
for the corrupted document you have attached earlier:
FileFormatInfo info = FileFormatUtil.detectFileFormat("C:\\Temp\\corrupted.docx");
System.out.println(LoadFormat.toString(info.getLoadFormat()));
The issues you have found earlier (filed as WORDSNET-26218) have been fixed in this Aspose.Words for Java 24.11 update.