OutOfMemory Issue

Hello,

we are trying to process a ~15 MB word Document. our requirement is read each line with it’s paragraph number and page number. but using Aspose.word library, we are getting “OutOfMemory
error” and we learnt from https://docs.aspose.com/words/net/memory-requirements/#:~:text=It%20is%20important%20to%20note,memory)%20available%20on%20your%20 side that it takes up to 20x memory. however we are not doing any complex operations like rendering (converting to fixed page formats), updating fields, splitting page. we are just reading each line page wise and getting it’s paragraph number.

Is there any config or settings missing ?
please find below code block, we are using to perform the operations.

Document document = new(memStream);
var pageCount = document.PageCount;
for (int pageIndex = 0; pageIndex < pageCount; pageIndex++)
{
    int paragraphNumber = 1;
    int pageNumber = pageIndex + 1;
    var extractedPage = document.ExtractPages(pageIndex, 1);
    NodeCollection Paragraphs = extractedPage.GetChildNodes(NodeType.Paragraph, true);
    foreach (Paragraph paragraph in Paragraphs)
    {
        string paragraphtext = paragraph.GetText().Trim();
    }
}

we are getting at second line.
var pageCount = document.PageCount;
Could you please let us know the reason why it’s not working?

we have 2GB of RAM in my POD,

Regards,
Sandip.

@SandipC87 Actually, your code performs document layout operation. As you may know MS Word documents are flow documents and do not contain any information about their layout. Consumer applications build document layout on the fly. To calculate number of pages in the document and then use Document.ExtractPages method, Aspose.Words needs to build whole document layout, which is quite resource consuming operation.
Also, memory consumption depends on the document content. If possible please share your problematic document here for testing, we will check it on our side and provide you more information. Generally it is not recommended to use huge documents, because even “native” consumers like MS Word might not be able to process such document.

Thank you for quick response.
we are trying just 15 MB document.
https://docs.google.com/document/d/1P-aP8OJkKE0Y4PCcyL-eRhPzMJLS3syC/edit?usp=sharing&ouid=111033042152519713311&rtpof=true&sd=true

Please get the sample document from above link.

@SandipC87 On my side OutOfMemoryException exception occurs upon extraction page 454.
We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.

Issue ID(s): WORDSNET-25864

You can obtain Paid Support Services if you need support on a priority basis, along with the direct access to our Paid Support management team.

As a temporary workaround of this issue you can call extractedPage.Cleanup(); operation. In this case OutOfMemoryException does not occur on my side.

1 Like

Thank you…

It worked.

I am another 26MB document.
It’s location is different.
https://docs.google.com/document/d/1LOtSKBpjuP2Pni92FvRufTs8v-Zq8mwd/edit?usp=sharing&ouid=111033042152519713311&rtpof=true&sd=true

could you please check now suggest the fix?
we are getting below error

at System.Globalization.CultureInfo..ctor(String name, Boolean useUserOverride)
at ng..cctor()    --- End of inner exception stack trace ---
at ng.c(String a)
at Tf..cctor()    --- End of inner exception stack trace ---
at Tf.a(Double a, Int32 b)
at Tf.g(Double a)
at tQ.a(Double a)
at m1.a(Double a, Double b, String c)
at m1.U0A0pYi()
at Q2.a(U0[] a)
at Q2.c(U0[] a, Int32 b)
at Q2.b(U0[] a, Int32 b)
at Q2.a(U0[] a, Int32 b)
at Q2.N2A0pYa(U0[] a)
at N2.b()
at N2.d()
at N2.k()
at kQ..ctor(X2 a, D2 b, sQ c, RectangleF d, Boolean e)
at qQ.a(D2 a)
at qQ.c()
at qQ.a()
at vQ.a()
at vQ.a(ShapeBase a, sQ b)
at GT.bTA0pYa(NT a)
at YT.a(ShapeBase a)
at kW.a(ShapeBase a, yX b)
at kW.a(yX a, Boolean b)
at Iqa.c(dqa a)
at Iqa.a(ppa a)
at Mma.MmaA0pYx()
at Mma.a()
at Nma.MmaA0pYc()
at ppa.JoaA0pYHa()
at Joa.ZkaA0pYv()
at Yma.g()
at Yma.a(Boolean a, Int32 b)
at Yma.a(Boolean a)
at ena.a(Yma a)
at ena.a(dna a, Int32 b)
at Tma.c(dna a)
at Tma.a(dna a)
at Vma.b(dna a)
at Vma.b(dna a, Int32 b, Boolean c)
at Hla.b()
at Hla.b(Gla a, Int32 b, Boolean c)
at Foa.b()
at Foa.b(bla a, Int32 b, Boolean c, Boolean d)
at vla.b()
at vla.b(ula a, Int32 b)
at vla.a(ula a, Int32 b)
at Una.b(ula a)
at Una.d()
at Una.BmaA0pYa(Zka a)
at zla.a(Zka a)
at Nla.a(Tna a)
at Mla.h()
at Pqa.c()
at Aspose.Words.Document.UpdatePageLayout()
at Aspose.Words.Document.get_PageCount()

@SandipC87 Unfortunately, I cannot reproduce the problem on my side using 23.8 version of Aspose.Words. Do you use the same code?

Yes. we are using the latest version 23.8.

could you please check it in Linux servers ?
In windows server, we are not getting the error. the error is coming in linux PODs.

Thank you for your prompt response and efforts.

@SandipC87 I have tested on Linux Debian 11 docker image and the code works without any issues. What Linux distribution do you use? Does the problem occurs only with the provided document or with all documents?

The issues you have found earlier (filed as WORDSNET-25864) have been fixed in this Aspose.Words for .NET 23.10 update also available on NuGet.