Extremely Slow Performance of PageCount and UpdatePageLayout

Hello,

I'm taking large documents generated by Microsoft Word mail merge, possibly as large as 300+ pages (containing up to 300+ letters each) in some cases, and chopping them up into one document per letter. Aspose.Words is doing a great job loading the large document, manipulating it in memory and chopping it up into individual letters per document.

However, when I try to get a PageCount from the individual documents the performance is killing me. I have tried calling Document.UpdatePageLayout() first and then accessing Document.PageCount, and it takes about 30 seconds. The documents I'm trying to get a PageCount for are 1-4 pages in length. In fact, so far all of the documents I've tested have been exactly 1 page in length!

I've tried accessing Document.PageCount without doing a Document.UpdatePageLayout() call and it still takes about 30 seconds. Although I can't find the documentation on this anywhere (yet) it seems to me that maybe Document.PageCount is calling Document.UpdatePageLayout() first since this is a newly created document and hasn't been through a layout yet.

I'm using Aspose.Words for .NET version 13.4.0.0, full license (not a demo version). The computer I'm testing on is a Dell laptop with 8 GB of RAM. It's running an Intel Core i7@2.4GHz with Windows 7 Enterprise. The memory usage has not gone up above 4.5 GB total at any time during processing and the CPU usage topped out with 65% spikes.

Like I said, the document loading, manipulation, splitting and saving are all working very fast--literally taking about 1 second per document. Getting the Document.PageCount is absolutely killing me. I also noticed that saving as an XPS and PDF document seem to be very fast. Is there any way to convert these docs to XPS or PDF docs in memory and get the PageCount property from them quickly?

Update - I just tried converting the doc to PDF in memory and running a regex to get the number of pages. Performance appears to be about the same. A one page doc is still taking about 30 or so seconds to convert to PDF; presumably it's all just a rendering engine bottleneck.

Thanks

This is a critical issue for us and we’re on a very tight deadline. That’s why I posted late on a Sunday night when we finally narrowed down the issue. I noticed you are addressing issues posted hours later, but so far I haven’t gotten a “hello” from Aspose. I understand you “answer issues in the order in which they are reported” and that you don’t “cherry pick” issues. In that spirit please feel free to give me some sort of response so I can decide how best to proceed.

Hi,

Thanks for your inquiry and sorry for the delayed response.

Document.PageCount property invokes page layout which builds the document in memory so note that with large documents this property can take time. However, after invoking this property first, any rendering operation e.g rendering to PDF or image will be instantaneous.

Since, rebuilding page layout is a time-consuming operation, if you call Document.PageCount property many times (e.g. every time after modifying the Document), it can slow down performance of your process.

Please note that memory and CPU usage are completely dependent on document size and document complexity. Upon, rebuilding page layout, on average, Aspose.Words layouts 10 pages per second. So, in your case, the extra amount of time Aspose.Words takes to rebuild page layout depends on the number of pages your Word documents have.

Moreover, please create a simple application (for example a Console Application Project) that helps us reproduce the same problem on our end and attach it here for testing. Also, please attach a sample Word document you're getting this performance issue with for our reference. We will investigate the issue on our side and provide you more information. Thanks for your cooperation.

Best regards,

I uploaded a fake doc to priority support forum. The doc is very unremarkable as you will see. I attempted to compensate for the extreme slowness by using parallel processing. This resulted in a 12.5 minute delay which I described in the priority support thread. What I think would be more useful for you to investigate is the splitting process. I believe it may be more of an issue than the actual documents. I’ll continue my thread in the priority support forum.

Try this code. It’s based on code Aspose employees have posted on this board:

public class Splitter
{
    private List<Document> _childDocuments;
    private Document _sourceDocument;
    public Splitter(Document sourceDocument)
    {
        _childDocuments = new List<Document>();
        _sourceDocument = sourceDocument;
        // Loop through all sections.
        for (int i = 0; i < _sourceDocument.Sections.Count; i++)
        {
            DateTime startTime = DateTime.Now;
            Console.WriteLine(string.Format("Begin: {0}", startTime));
            Section section = _sourceDocument.Sections[i];

            // Create empty document.
            Document subDoc = new Document();
            subDoc.RemoveAllChildren();

            // Append section to the empty document.
            subDoc.AppendChild(subDoc.ImportNode(section, true, ImportFormatMode.KeepSourceFormatting));
            int pgCount = subDoc.PageCount;
            DateTime endTime = DateTime.Now;
            Console.WriteLine(string.Format("End: {0}", endTime));
            Console.WriteLine(string.Format("Elapsed time: {0}\n------------", endTime.Subtract(startTime).TotalMilliseconds));

            // Save sub document to docx.
            subDoc.Save(string.Format(@"c:\temp\output\DocToSplit{0}-{1}.docx", i, pgCount));
        }
    }
}

This is a simple example that accurately reproduces the issue on my end.

A post was split to a new topic: Slow performances on updatepagelayout

Hi,

OK thanks, I’ll check that as soon as possible
(maybe not before Septembre)

1 Like