OutOfMemory exception during detection of file format (943)

Hi,

I’m trying to detect the file format of the attached .rtf file.


If I load the file via FileInputStream, I have to increase java heap space -Xmx to 145MB in order not to get OutOfMemory exception.

If I just pass filepath as the argument to FileFormatUtil.detectFileFormat, it only needs 3MB of heap space to successfully detect format.

Here’s the code for both cases:

InputStream:

String path = “InfoForSite0005.rtf”;
try (InputStream inputStream = new FileInputStream(path)) {

FileFormatUtil.detectFileFormat(inputStream);
} catch (Exception e) {
throw new RuntimeException();
}

Filepath:

String path = “InfoForSite0005.rtf”;
FileFormatUtil.detectFileFormat(path);

Can you check this?

Thanks,
Zeljko

Hi Zeljko,


Thanks for your inquiry. I have tested the scenario using Eclipse IDE with following heap size but unable to notice OutOfMemory exception. Please share some more details to reproduce the issue.


-Xms32m

-Xmx64m


Please note memory usage is completely dependent on document size and document complexity. Usually, Aspose.Words needs 10 times more memory than the original document size to build its DOM in the memory. It is recommended to set heap space -Xmx to 2048, to process most of the documents successful.

Furthermore, Aspose Words needs two times more memory to load document from stream as compared to file path. It uses two-way scrollable stream to read DOC, RTF and few other formats. Therefore, it copies source input stream into internal stream. Document’s constructor contract does not allow to close source stream. So stream constructor uses more memory than file path constructor.

Regards,

Hi,
You said:

tilal.ahmad:
Aspose Words needs two times more memory to load document form stream as compared to file path.


Is this also the case with Aspose.Slides, Aspose.Diagram and Aspose.Cells?

Thanks,
Zeljko.

Zeljko:

Is this also the case with Aspose.Slides, Aspose.Diagram and Aspose.Cells?
Hi Zeljko,

Thanks for contacting support.

If you are facing similar issues while using above mentioned API’s, please share the input documents, so that we can test the scenario in our environment. We are sorry for this inconvenience.

Hi,

I don’t have any specific files, i just need information about memory usage(difference in loading document from stream and from path) in mentioned API’s.

Thanks,
Zeljko.

Zeljko:
Hi,

I don’t have any specific files, i just need information about memory usage(difference in loading document from stream and from path) in mentioned API’s.
Hi Zeljko,

Thanks for contacting support.

When loading the files through Stream object or even when accessing them from certain location on file system, the complete document is loaded inside Memory so that further manipulation is performed. However it is not necessary that every time these API’s load the input file, an exception or error is generated, because the scenario varies from document to document, as it depends upon the structure and complexity of input file.

In case you encounter any issue while using our APIs, please share the sample files, so that we can further investigate them in our environment.

Hi,

You said this regarding, Aspose.Words :

tilal.ahmad:


So stream constructor uses more memory than file path constructor.



I just need confirmation that this is the case with all API’s ( Aspose.Slides, Aspose.Diagram and Aspose.Cells)

Thanks,
Zeljko.

Zeljko:


I just need confirmation that this is the case with all API’s ( Aspose.Slides, Aspose.Diagram and Aspose.Cells)
Hi,

When loading a document from either Stream object or accessing the file through directory structure, the amount of memory required remains same for above mentioned API’s. However we are further looking into related details and will keep you updated with our findings.

Zeljko:

Is this also the case with Aspose.Slides, Aspose.Diagram and Aspose.Cells?
Hi Zeljko,

Thanks for your patience.

I have further discussed the scenarios with related teams and specified below are the comments.

Aspose.Cells

Aspose.Cells may use more memory when getting the document from streams. But there is very minor difference for the used memory for loading a workbook from file or when loading it from a stream. Furthermore, as you know, when the stream is built in memory by a user, such as he builds one ByteArrayInputStream(java) or MemoryStream(.NET) first, and then uses it to load the workbook, so such scenarios causes certain memory cost and the total memory for his program will be sure a bit more high.

Aspose.Slides

The loading from stream utilizes little more resources as compared to loading from disc. As you know, our Java APIs are ported from .NET, so when we use Stream in .NET, the streams are bidirectional and we can seek, read and write. However, in case of Java, we have internal Aspose.MS library that mimics the .NET behavior of bidirectional streams as we have separate input and output streams. This is in fact a limitation or issue but internal implementation of API is such owing to AutoPorting Java from .NET release. Also, when we are loading the files from disc, using .NET API, it’s respective porting to Java is same and there is no internal overhead. So, for this reason, loading from disc in Java consumes less memory then loading from Inputstream.

Similar is the observation for other API’s. Should you have any further query, please feel free to contact.