Problems with closing memory streams after using Concatonate in static method

JonBaggaley · December 12, 2011, 6:09am

thanks for your reply. I have just tested the new version with the sample I provided and the problem still exists. To replicate, simply set the variable closeStreamsAfterUsing = true and then run it.

Press “G” a couple of times to move through the memory capture points then watch it crash….

Exactly the same error is returned i.e. stream already closed.

Please can you look into this ASAP…

Thanks

Jon

codewarior · December 16, 2011, 3:55am

Hello Jon,

I have again tested the scenario where I have tried using the sample application that you shared earlier and I am able to replicate the same problem. In fact during my earlier attempt, I used an internal code snippet to verify the scenario. For the sake of correction, I have re-opened the issue and have intimated to the development team to further investigate this problem. Please be patient and spare us little time. We are really sorry for your inconvenience.

JonBaggaley · December 16, 2011, 4:30am

Great, thanks for that. Any ETA on when the developers will be able to schedule this in as we hope to go live with our app next week and whilst we can live with having to reset the service every day to clear the leaks it is obviously not a long (or even short) term solution!

Thanks

Jon

codewarior · January 25, 2012, 10:28am

Hello Jon,

Thanks for your patience.

We have further investigated this problem and have found that it does not seem to be an issue related to Aspose.Pdf for .NET. As per our understanding, you are doing following steps in MErgeRenderDocsToPdf method

//1. calls PdfEditor.Concatenate and saves its result into memory stream:

MemoryStream finalDoc = new MemoryStream();

pdfEditor.Concatenate(itemLists, finalDoc);

//2. Creates Aspose.Pdf.Document by this memory stream data:

returnItem = new Aspose.Pdf.Document(finalDoc);

//3. Closes memory stream:

finalDoc.Close();

//4. returns created document.

return returnItem;

and after getting this returnIntem as result, you are trying to work with this document (save it into a file). Please note that this approach does not work because all document data is stored in the memory stream (even if we created Document object we still need the source stream to work with it)

In more simple words, following code demonstrates this behavior.

MemoryStream stream = new MemoryStream(File.ReadAllBytes(“InFile.pdf”));

Document doc = new Document(stream);

stream.Close();

//do something with document (add new page) and save

doc.Pages.Add(); // <— exception “Closed stream” here

doc.Save(“outfile.pdf”));

Please note that Stream must be closed only when Document object is no more needed. The appropriate code can be something like:

MemoryStream stream = new MemoryStream(File.ReadAllBytes(“InFile.pdf”));

Document doc = new Document(stream);

//do something with document (add new page) and save

doc.Pages.Add();

doc.Save(“outfile.pdf”));

stream.Close(); // close stream when document is no more needed

You are trying to close memory stream in order to avoid memory leaks, but you need to redesign your code to avoid closing of MemoryStream object when it is still required. You may also return stream with document data from MergeRenderDocsToPdf instead of Document object.

If you just need to save the data on disk in main program, you can simply save this stream data instead of creating a document (additionally, this is more efficient becase we dont waste time on document parsing). And if you need to do some additional operations over the document, you can create a document object in main program, pass returned stream into constructor. In case it does not resolve your problem or you have any further query, please feel free to contact. We apologize for your inconvenience.

JonBaggaley · January 25, 2012, 11:18am

Hi there,

Thanks for getting back to me but I don’t think I understand your answer. From what I can gather, you want me to keep track of the memory stream even after I have done a save (which must internally fill in all the body of the PDF to your Document object). Whilst I understand the need to keep track of the stream up to that point, after it, the document should be standalone. I HAVE to close the stream because it is a static function.

If I load a PDF document from disk file using the standard Load(filename) method and then delete the physical file from my hard disk, will the object then fail to work if I attempt to add pages to it or carry out any other operation on it? If not then how is this significantly different to creating with the stream and then calling Save([no parameters])?

When merging pages in my application, I do not know how the final document will be used because that is dynamic which is why I call a static function. Additionally, there may be multiple PDFs created as a result of merging different groups of documents together before being saved to a database/emailed/saved to an archive folder/ sent to the browser or a combination of all of them (obviously something sent to the browser would have to be merged again to a single document). E.g. A statement to our customer could consist to two PDF files each of which has been generated by merging combinations of Word and Excel files together and are then mailed as a set of complete documents with a copy added to an archive folder and the original versions still in Words and Cell sent to the printer.

Within that static function, I call the Document.Save() method (please see the code example).

At this point, I am quite reasonably expecting ALL dependency on the initial stream should cease and the document should internally fill itself out into a complete object tree (otherwise what possible use is that save() function for items loaded as a memory stream?) This could be quite easily resolved by internally modifying that save function to convert a memory stream to a full in memory document which you have to call if you write to disk anyway. Save would continue to work as before where the document was created from a disk file. Nothing should be broken because Save() doesn’t seem to currently have any purpose when used with a memory stream.

Perhaps you could refactor the static function in the sample code to show a method that makes more sense but still works how I need it to?

I hope this makes sense.

Thanks

Jon

JonBaggaley · February 20, 2012, 4:05pm

Any progress with this as it has now been almost another month?

codewarior · February 22, 2012, 11:13am

Hello Jon,

Sorry for the delay in response.

We are working over this query and trying to resolve this issue ASAP. Please be patient and spare us little time. Soon you will be updated with the status of correction. We apologize for your inconvenience.

andrey.nekrasov · March 7, 2012, 3:31pm

Hello Jon!

The main problem in scenario we are discussing is that all data of document are stored in memory stream.

You load returnItem document from finalDoc memory stream :

public static Aspose.Pdf.Document MergeRenderDocsToPDF(Dictionary itemsToMerge)

{

...

MemoryStream finalDoc = new MemoryStream();

... //some action...

pdfEditor.Concatenate(itemLists, finalDoc);

... //some action...

returnItem = new Aspose.Pdf.Document(finalDoc);

finalDoc.Close(); // after closing memory stream document data are no more accessible

// and we can get "Stream can not be read" error.

return returnItem;

}

Please note that not all data from memory stream are loaded into document object

after creating document and loading it from stream.

Some parts of document are loaded from document data (memory stream in our case)

only when they are required i.e. when these parts were used.

The purpose of this approach is achievement of better performance and reduction of required memory.

For example if we work with a large file (for example 500Mb) we should not load all document into memory;

else we would have excess memory allocation and loss of the performance

(because we would need to allocate additional 500Mb memory and to waste time to load all document data )

That's why data of document are required even after document created by stream.

And yes, you document was loaded from file and file was deleted or damaged

(or for example removeable device where file was placed was removed)

we will get error "Stream can not be read" when we try work with the document.

You can avoid this problem if you will return Stream from your method MergeRenderDocsToPDF

instead of Document:

public static Stream MergeRenderDocsToPDF(Dictionary itemsToMerge)

{

...

MemoryStream finalDoc = new MemoryStream();

... //some action...

pdfEditor.Concatenate(itemLists, finalDoc);

... //some action...

return finalDoc; //return memory stream instead of document

//remove these lines.

//returnItem = new Aspose.Pdf.Document(finalDoc);

//finalDoc.Close();

//return returnItem;

}

and create new document from this data when you need this.

public void UseOfDocument()

{

//get data from method

Stream data = MyClass.MergeRenderDocsToPDF(itemsToMerge);

//load document from data

Document doc = new Document(data);

//make some operations with document.....

//......

//and save it for example into file

doc.Save("myfile.pdf");

//we dont need document data (and document too because it is already saved) anymore, so close it

data.Close();

}

Please note that you still need to close stream in your code since it was created in your code too.

max-3 · February 3, 2016, 8:22pm

seems like a good case for the “using” statement

What are the uses of "using" in C#? - Stack Overflow

codewarior · February 6, 2016, 12:41pm

max-3:

seems like a good case for the "using" statement

http://stackoverflow.com/questions/75401/uses-of-using-in-c-sharp

Hi Max,

Thanks for sharing the details.

Yes the scope of code lines specified inside Using statement is disposed as soon as using block is completed. Should you have any further query related to our API's, please feel free to contact.