Joining multiple documents into one

After we do word merge with Asponse we need to combine all the documents into one, and also we need to preserve page numbers.

The code I have is working perfectly, except when we reach about 100 documents to join, we get a 'System.OutofMemoryException'

I executed the same thing on my development computer, and while it's executing the IIS worker process w3wp.exe peaks at about 2.2 Gbytes of memory, it takes a long time to process, but it finishes without errors, and returns the combined document. The end document is 1.1 MBytes, and over 150 pages... Why does it take over 2.2 GBytes of ram to process if the document is only 1.1 MBytes?

The same process (same documents) on the server returns the "out of memory" error.

Any idea how to make it use less memory?

This is the code I am using:
The first function is writen by me, the rest are copied and pasted from your forum.

Private Sub cmd_CombineForms()
    Dim TL As cSelectTemplateList = CType(Session("SelectForms_TemplateList"), cSelectTemplateList)
    Dim GeneratedPath As String = CType(Session("GeneratedPath"), String)
    Dim FilesAdded As Boolean = False
    Dim FirstDoc As Boolean = True
impersonator.BeginImpersonation() ' impersonate user to have access to folders on the network
Dim doc As Aspose.Words.Document = Nothing

For Each TI As cSelectTemplateList.cTemplateInfo In TL.TemplateList
    If TI.GeneratedFileName.Length() > 0 AndAlso System.IO.File.Exists(GeneratedPath & TI.GeneratedFileName) Then

        If FirstDoc Then
            doc = New Aspose.Words.Document(GeneratedPath & TI.GeneratedFileName)

            FirstDoc = False
        Else
            Dim append_doc As Aspose.Words.Document = New Aspose.Words.Document(GeneratedPath & TI.GeneratedFileName)
            append_doc.FirstSection.HeadersFooters.LinkToPrevious(False)

            append_doc.FirstSection.PageSetup.SectionStart = SectionStart.NewPage
            append_doc.FirstSection.PageSetup.RestartPageNumbering = True
            append_doc.FirstSection.PageSetup.PageStartingNumber = 1

            doc.AppendDocument(append_doc, ImportFormatMode.KeepSourceFormatting)

            append_doc = Nothing
        End If

        FilesAdded = True
    End If
Next

If FilesAdded AndAlso doc IsNot Nothing Then
    Response.ContentType = "application/vnd.ms-word.document"

    Dim FileName As String = ""

    If Me.txtPolicyNumber.Text.Trim().Length() = 0 Then
        FileName = "Default.docx"
    Else
        FileName = Split(Me.txtPolicyNumber.Text.Trim(), " ")(0) & ".docx"
    End If
    Response.AddHeader("content-disposition", "attachment;filename=" & Context.Server.HtmlEncode(FileName))

    ConvertNumPageFieldsToPageRef(doc)
    doc.UpdatePageLayout()
    doc.Save(Response.OutputStream, Aspose.Words.SaveFormat.Docx)

    Response.Flush()
    Response.End()
End If

impersonator.EndImpersonation()
doc = Nothing

End Sub

‘’’


‘’’ Replaces all NUMPAGES fields in the document with PAGEREF fields. The replacement field displays the total number
‘’’ of pages in the sub document instead of the total pages in the document.
‘’’

‘’’ The combined document to process
Private Shared Sub ConvertNumPageFieldsToPageRef(ByVal doc As Document)
’ This is the prefix for each bookmark which signals where page numbering restarts.
’ The underscore “_” at the start inserts this bookmark as hidden in MS Word.
Const bookmarkPrefix As String = “_SubDocumentEnd”
’ Field name of the NUMPAGES field.
Const numPagesFieldName As String = “NUMPAGES”
’ Field name of the PAGEREF field.
Const pageRefFieldName As String = “PAGEREF”

' Create a new DocumentBuilder which is used to insert the bookmarks and replacement fields.
Dim builder As New DocumentBuilder(doc)
' Defines the number of page restarts that have been encountered and therefore the number of "sub" documents
' found within this document.
Dim subDocumentCount As Integer = 0

' Iterate through all sections in the document.
For Each section As Section In doc.Sections
    ' This section has it's page numbering restarted so we will treat this as the start of a sub document.
    ' Any PAGENUM fields in this inner document must be converted to special PAGEREF fields to correct numbering.
    If section.PageSetup.RestartPageNumbering Then
        ' Don't do anything if this is the first section in the document. This part of the code will insert the bookmark marking
        ' the end of the previous sub document so therefore it is not applicable for first section in the document.
        If (Not section.Equals(doc.FirstSection)) Then
            ' Get the previous section and the last node within the body of that section.
            Dim prevSection As Section = CType(section.PreviousSibling, Section)
            Dim lastNode As Node = prevSection.Body.LastChild

            ' Use the DocumentBuilder to move to this node and insert the bookmark there.
            ' This bookmark represents the end of the sub document.
            builder.MoveTo(lastNode)
            builder.StartBookmark(bookmarkPrefix & subDocumentCount)
            builder.EndBookmark(bookmarkPrefix & subDocumentCount)

            ' Increase the subdocument count to insert the correct bookmarks.
            subDocumentCount += 1
        End If
    End If

    ' The last section simply needs the ending bookmark to signal that it is the end of the current sub document.
    If section.Equals(doc.LastSection) Then
        ' Insert the bookmark at the end of the body of the last section.
        ' Don't increase the count this time as we are just marking the end of the document.
        Dim lastNode As Node = doc.LastSection.Body.LastChild
        builder.MoveTo(lastNode)
        builder.StartBookmark(bookmarkPrefix & subDocumentCount)
        builder.EndBookmark(bookmarkPrefix & subDocumentCount)
    End If

    ' Iterate through each NUMPAGES field in the section and replace the field with a PAGEREF field referring to the bookmark of the current subdocument
    ' This bookmark is positioned at the end of the sub document but does not exist yet. It is inserted when a section with restart page numbering or the last 
    ' section is encountered.
    Dim nodes() As Node = section.GetChildNodes(Aspose.Words.NodeType.FieldStart, True).ToArray()
    For Each fieldStart As FieldStart In nodes
        If fieldStart.FieldType = FieldType.FieldNumPages Then
            ' Get the field code.
            Dim fieldCode As String = GetFieldCode(fieldStart)
            ' Since the NUMPAGES field does not take any additional parameters we can assume the remaining part of the field
            ' code after the fieldname are the switches. We will use these to help recreate the NUMPAGES field as a PAGEREF field.
            Dim fieldSwitches As String = fieldCode.Replace(numPagesFieldName, "").Trim()

            ' Inserting the new field directly at the FieldStart node of the original field will cause the new field to
            ' not pick up the formatting of the original field. To counter this insert the field just before the original field
            Dim previousNode As Node = fieldStart.PreviousSibling

            ' If a previous run cannot be found then we are forced to use the FieldStart node.
            If previousNode Is Nothing Then
                previousNode = fieldStart
            End If

            ' Insert a PAGEREF field at the same position as the field.
            builder.MoveTo(previousNode)
            ' This will insert a new field with a code like " PAGEREF _SubDocumentEnd0 *\MERGEFORMAT ".
            Dim newField As Field = builder.InsertField(String.Format(" {0} {1}{2} {3} ", pageRefFieldName, bookmarkPrefix, subDocumentCount, fieldSwitches))

            ' The field will be inserted before the referenced node. Move the node before the field instead.
            previousNode.ParentNode.InsertBefore(previousNode, newField.Start)

            ' Remove the original NUMPAGES field from the document.
            RemoveField(fieldStart)
        End If
    Next fieldStart
Next section

End Sub

‘’’


‘’’ Retrieves the field code from a field.
‘’’

‘’’ The field start of the field which to gather the field code from
‘’’
Private Shared Function GetFieldCode(ByVal fieldStart As FieldStart) As String
Dim builder As New StringBuilder()

Dim node As Node = fieldStart
Do While node IsNot Nothing AndAlso node.NodeType <> Aspose.Words.NodeType.FieldSeparator AndAlso node.NodeType <> Aspose.Words.NodeType.FieldEnd
    ' Use text only of Run nodes to avoid duplication.
    If node.NodeType = Aspose.Words.NodeType.Run Then
        builder.Append(node.GetText())
    End If
    node = node.NextPreOrder(node.Document)
Loop
Return builder.ToString()

End Function

‘’’


‘’’ Removes the Field from the document
‘’’

‘’’ The field start node of the field to remove.
Private Shared Sub RemoveField(ByVal fieldStart As FieldStart)
Dim currentNode As Node = fieldStart
Dim isRemoving As Boolean = True
Do While currentNode IsNot Nothing AndAlso isRemoving
If currentNode.NodeType = Aspose.Words.NodeType.FieldEnd Then
isRemoving = False
End If

    Dim nextNode As Node = currentNode.NextPreOrder(currentNode.Document)
    currentNode.Remove()
    currentNode = nextNode
Loop

End Sub

I think I figured out why we get an Out of memory error.... the server we are using at the moment is a 32 bit server, while my development machine is 64 bit. So it makes sense that it breaks when it's reaching 2GB of ram on the server, and it works fine on my development computer.

So, our plan to upgrade to a 64 bit server just got a boost...

BUT.... why does it use so much RAM memory in the first place? the final document is only 1.1 MB !!!

Hi Michael,

Thanks for your inquiry. What version of Aspose.Words are you currently using? Could you please upgrade to latest version of Aspose.Words 14.5.0 and see how it goes on your end. In case the problem still remains, and to ensure a timely and accurate response, please zip and attach the following resources here for testing:

  1. Your input Word documents you're getting this problem with.
  2. Please create a standalone (runnable) console application that helps us reproduce your problem on our end and attach it here for testing.

As soon as you get these pieces of information ready, we'll start investigation into your issue and provide you more information.

Best regards,

Hi Awais Hafeez,

I upgraded to the newest version, and still have the same problem.

To prepare all that takes some time, and I have to check with my superior if I can send those documents to you, I am pretty sure I am not allowed...

But... I have done some investigating, I stepped through the code, and I found the problem.

The problem is with the code in ConvertNumPageFieldsToPageRef() function, more specifically, with this line of code:

Dim newField As Field = builder.InsertField(String.Format(" {0} {1}{2} {3} ", pageRefFieldName, bookmarkPrefix, subDocumentCount, fieldSwitches))

Every time this line executes, it takes 1 to 3 seconds to process, and the memory jumps from between 100 MB to 400 MB... until its reaching about 2.2 GB of memory, and then every insert takes about the same time to run, but the memory does not increase at all, in fact sometimes it decreases...

So it seems to me that somewhere there is a limit (some kind of cache) in the DocumentBuilder object to hold 2 GB of memory, and release some when something new needs to process.

Is there a way to change that to maybe 1.5 GB (or even less) so that the total process does not go over 2 GB of memory?

Hi Michael,


Thanks for the additional information. There are following two overloads of DocumentBuilder.InsertField method:

1) Inserts a Word field into a document and updates the field result.
public Field InsertField(string);

2) Inserts a Word field into a document without updating the field result.
public Field InsertField(string,string);

So, instead of first overload, you can try using the second overload and call Document.UpdateFields method once before calling Document.Save method at the end. I hope, this helps.

Best regards,

Thank you!

This fix cut down the total time of processing from minutes to seconds, and also the memory went up to about 900 MB...

I just added an empty string as a second parameter to InsertField(), to make it uses the second overload, and I changed the code in the main function to call doc.UpdateFields() before doc.UpdatePageLayout().

You should modify your tutorial http://www.aspose.com/docs/display/wordsnet/Controlling+How+Page+Numbering+is+Handled so that other people don't get into the same problem.

Thanks again :)

Hi Michael,


Thanks for your suggestion. We will look into it and update that part of documentation shortly.

Best regards,