Aspose Pdf reading from memory stream

jepinfotjenester.no · September 7, 2015, 10:03am

It looks like you have a rather bad implementation of reading from a memory stream, that gives poor performance.

We have implemented our own implementation of the System.Io.Stream class, where we read chunks of binary data from sql server.

1) The interface has a CanSeek property. Even if this is set to false, it doesn’t seem as if you respect this setting, and use the Seek method.

2) Aspose pdf reads a lot of single byte values from the stream, but we can see from the traces that you read the same bytes from the stream several times. It looks as if you reference the stream way too much, instead of keeping some values or larger portions in memory in private variables. What is worse, is that there is a lot of jumping from the start of the stream to the end, where the same values gets read over and over. If this is a large file, this will result in a lot of traffic.

codewarior · September 8, 2015, 5:39am

Hi Jens Erik,

Thanks for contacting support.

Can you please share some code snippet and details which can help us in replicating the issue in our environment i.e. please share the details/steps on how we can verify the inappropriate memory utilization during file reading.

jepinfotjenester.no · September 24, 2015, 9:16am

Hi,

Have a look at the attached debug.txt file, which shows what happens when the attached pdf file is opened with Aspose.pdf. Here I have documented how you read and seek in the stream.

Look especially at the end of the text file. Here you will see that your component reads one byte from the start of the file, then one byte from the end of the file, and this goes on and on, several thousand times for this file.

Now, consider what happens when this file is stored in a database, and the database reads chunks of data from the database (this is the whole purpose with a stream, you can’t pump a whole Gigabyte of for example a movie down to the client in a browser, you have to feed small chunks at a time). The jumping between the start and the end of the file, results in thousands of calls to the database server, just for this relatively small file.

This is in my opinion a serious design flaw. It looks as the person who has implemented this has no understanding of what a stream is and how to use it. What you should have done is read the whole stream into memory, and then start the seeking on that byte array.

codewarior · September 28, 2015, 5:10pm

Hi Jens Erik,

Thanks for sharing the details and in-depth analysis of Stream handling mechanism. However can you please share some code snippet or sample application which can help us in reproducing the same irregular file reading mechanism of Stream handling. We will surely consider reviewing the design for Stream handling. We are sorry for this inconvenience.