Asynchronous Processing

We are currently evaluating Aspose. When processing large Word docs (200+ pages), the time is atleast 15 to 20 seconds which is unacceptable to our boss. Is there a way to get back the first page quickly while the rest of the document is opened, similar to the way Word opens the document and gives you the first page right away, but not the page count till while later.

Hi Mike,


Thank you for your interest in Aspose products.

First of all, I would suggest you please use the latest version of Aspose.Words i.e. 11.2.0 and let us know how it goes on your side. You can download it from the following link:
http://www.aspose.com/community/files/51/.net-components/aspose.words-for-.net/default.aspx

Secondly, please note that when you open a document from disk using Aspose.Words, it is loaded into memory as a whole and stored in a DOM. I am afraid you can not perform any operations on the document unless it is completely loaded into memory. Also, there is no direct link between the loaded document and the file on disk.

Moreover, could you please attach your input Word document, you’re getting this problem with, here for testing? I will investigate the issue on my side and provide you more information.

Best Regards,

Hi Awais,

Attached is the Word document we're testing. We are using version 11.2, and it takes about 17 seconds to load the document. Showing either a progress bar or the first page of a document while the rest of the document is processed would likely be acceptable.

Here was a post on your site where someone asked for a callback regarding conversion to PDF, and the Aspose representative said it would be in future release.

<a href="v</a></p><p>Do you have any update about that, and would it be available in Aspose.Word/Cell/etc.?</p><p>Thanks!!</p>

Hi Mike,


Thanks for your inquiry.

I suppose by loading you mean rendering the document to image? It does indeed take a bit of time to render your document as it is quite large. We will consider introducing a callback which is called when each page is rendered. We will keep you informed of any developments regarding this feature.

In the mean time I can think of way to achieve your second idea by showing the first page of the document. Please use the code below to extract the first page of the document so it can be rendered quicker without needing to render the entire document.

<span style=“font-size:
10.0pt;font-family:“Courier New”;color:#2B91AF;mso-no-proof:yes”>Document<span style=“font-size:10.0pt;font-family:“Courier New”;mso-no-proof:yes”> doc = new Document(“PaperVision_Capture_UserGuideR74.docx”);<o:p></o:p>

Document docPreview = GetFirstPageOfDocument(doc);

docPreview.Save("Document Out.pdf");

///

/// Extracts the first page of a document based on section, page breaks or from a set number of block levels nodes.

///

public static Document GetFirstPageOfDocument(Document doc)

{

// Number of paragraphs or tables in the document body to extract before stopping if we do not encounter any page or section breaks.

const int maxNumberOfBlockLevelNodes = 50;

int currentCount = 0;

Document previewDoc = (Document)doc.Clone(false);

NodeImporter importer = new NodeImporter(doc, previewDoc, ImportFormatMode.UseDestinationStyles);

foreach (Section section in doc.Sections)

{

// If this section starts on a new page then we know we have the first page.

if(section != doc.FirstSection)

{

SectionStart sectionType = section.PageSetup.SectionStart;

if(sectionType == SectionStart.EvenPage || sectionType == SectionStart.NewPage || sectionType == SectionStart.OddPage)

break;

}

// Add the section to the document.

previewDoc.AppendChild(importer.ImportNode(section, true));

previewDoc.LastSection.Body.RemoveAllChildren();

foreach (CompositeNode composite in section.Body.ChildNodes)

{

// Copy the node to the empty document.

previewDoc.LastSection.Body.AppendChild(importer.ImportNode(composite, true));

currentCount++;

// If the max number of nodes we predict are on the first page is reached or if the current paragraph contains a page break

// then we know we have the first page so return the document as is.

if (currentCount > maxNumberOfBlockLevelNodes || (composite != section.Body.LastParagraph && composite.Range.Text.Contains(ControlChar.PageBreak)))

return previewDoc;

}

}

return previewDoc;

}


If we can help with anything else, please feel free to ask.

Thanks,

Hi Awais,

Thank you so much for this method, it is exactly what I needed. I provided my CTO with results showing significant improvement in processing time. I've included that spread sheet in case you're interested.

I do have two additional questions for you please.

  1. We noticed that the first run on processing a document took significantly longer than if we processed the document again without closing the app. Is there a initialization routine the engine goes through on the first run?
  2. We would also need similar DocumentPreview functionality for Excel, PowerPoint, Visio, Outlook, (all Office formats you support). Would you be able to provide those methods as well? For now, if you could give us methods for Excel and PowerPoint, we could wait for the others until after purchase is made.

Thanks again so much.

Hi Mike,


Thanks for this additional information.

It’s great to hear the work around helps. I have taken a look at your excel file. There seems to be a slight confusion in terminology, please see the points below for clarification.

  • Loading a document into the Aspose.Words DOM only involves the document constructor. As soon as the constructor of the Document class has returned this means that the file is fully loaded into the DOM. This should not take any longer than a second even with large files. On your results you state that the document takes 15 seconds to load into the DOM, this sounds incorrect. Please let us know if this is really the case on your machine.
  • Calling Document.UpdatePageLayout or saving to image or PDF etc is called rendering. This is what takes the bulk of the time as the document layout needs to be built in memory. This is what should be the 15 seconds.

Also, are you able to post your full code you are using for testing here? I want to quickly double check that the code will function as fast as possible.

Regarding your queries:

  1. Yes, the first rendering conversion may take a longer as Aspose.Words needs to precache fonts and other resources. You can choose to precache such resources at the start of your application by creating a new document and calling Document.UpdatePageLayout.
  2. I will move your forum thread to the Aspose.Total forum so the support developers can take a look and provide you with the equivalent code you require for each product.

Thanks,

Hi Awais,

I'm cleaning up my solution to upload so you can check that we're doing things optimally -- should have it to you tomorrow, or Wednesday for you.

Yes, it sounds like I have the terminology wrong. Anyway, the time we're interested in is the total time to save the first image to disk so it can be displayed to the client browser. So I think my numbers are valid, just the terminology is off.

We had our meeting today with the CTO and I believe things are looking good for us to purchase. Couple things he asked that I need to follow up with you.

(1) How accurate is the output from the DocumentPreview code? Your code has a const maxNumberOfBlockLevelNodes = 50. What is the probability that a document would exceed that number and should we increase it? What would be an example of a document that exceeded 50 nodes, etc? His concern is that the preview image would look different from the original.

(2) CTO also wants performance number for Excel and PowerPoint. Any chance someone can provide us with a DocumentPreview method for those document types?

Thanks again Awais! Very much appreciated.

-Mike

Hi Mike,


Thanks, I will wait for your input.

It’s great things are working as expected. Regarding the accuracy of the code, it should be accurate every time. The set number of nodes is only used as a fall back if there is no section or page break in the the document. The value states that up to 50 block level nodes are copied, this is nodes such as paragraphs or tables in the document body. It is extremely unlikely that a document will have more nodes than this on the first page. However you can increase this value just to be sure.

BTW I’m Adam and not Awais, I took over the thread a little while back :slight_smile:

Thanks,

Hi Adam,

Please find attached the VS 2010 solution that tests Aspose performance for converting a document. Let me know if any trouble. When the app first starts, it will default to the PaperVision Capture manual that we have been testing. The output image/SVG files will be dropped in a folder with the similar name as the source doc.

eg,

PaperVision Capture R74.docx => \PaperVision Capture R74_docx

We have started testing Excel and PowerPoint. Can we get a PreviewDocument method for those types of documents?

Much Thanks!

Ps, had to remove your dll's to minimize size of upload. We're using Version 11.2.

Hi Adam,

One other question that we were asked about is how Aspose determines the default page size. Is it always 8 1/2 x 11, or does it use A4 if running on a system in European culture?

Thanks

Hi Mike,

Thanks for your inquiry. Regarding determining the page size, I think, the following API links will be helpful to you.

Best Regards,

Hi Mike,


Thanks for this additional information.

I have taken a look at your code and it’s all correct and optimal. There are no problems there.

Regarding the default page size, that is a good question. I have a vague memory of a discussion about this, I think the final answer was that Aspose.Words always uses the same default page size (A4) regardless of locale. If this is any problem to you please inform me and we will open up discussion about this again.

Thanks,

Hi Mike,


I am a representative of Aspose.Cells team.
Well, everything may not be going in the same way as Word Document due to the fact that MS Word and MS Excel have different/diverse architectures and file formats etc. If you need to export the image file(s) for your desired page(s) (one by one) for an Excel worksheet, you may do it. You need to use SheetRender.ToImage (int pageIndex) API, here you will specify the pageIndex for your needs. See the topic for your reference:



About spreadsheet preview, we think you may try to use LoadDataOption.SheetIndexes to specify which worksheet(s) should be loaded only when you open the file via Aspose.Cells product, see the document for your reference:
http://www.aspose.com/docs/display/cellsnet/Load+only+Specific+Sheets+in+a+Workbook


Thank you.

Hi Mike,


I am representing Aspose.Slides.

As Adam has already explained to you about the presentation loading and rendering earlier. I like to add that in Aspose.Slides the presentation is completely loaded in memory and fills in DOM (Document Object Model) for this. The amount of time taken depends on the size and contents inside presentation. Once the presentation is loaded in memory then Aspose.Slides can be used to render the slides individually or collectively. I feel generating the slide thumbnail of individual slide is one of your requirements. Obviously, it will take lesser time as compare to rendering the whole presentation. Please visit this documentation link to see how to generate the individual slide thumbnails.

I have also created an issue with ID SLIDESNET-33379 in our issue tracking system as new feature request to see whether it is possible to load a part of presentation in DOM to save further time.

Many Thanks,

Hi Mudassirv,

Thanks for the update. Is there a link to SLIDESNET-33379, or is that an internal-only tracking number?

If you're interested, I've attached our latest performance results with PowerPoint and Excel.

Thanks!

Hi Amjadv,

Attached is the workbook that takes about 5 or 6 minutes to save the first worksheet to disk. It has 11 sheets; each sheet has 882 rows and 37 columns. Originally, it was my understanding the long processing time was due to images in the workbook. But it turns out the real culprit is sheer amount of data. Hence the DocumentPreview code you referenced above did not provide any benefit.

I've also attached our updated solution which includes AsposeCellsOutput, just in case you see something wrong with our implementation.

So as I see it, we have two options to save the image to disk. (1) Save as one big image (FitImageToPage and OnePagePerSheet, although we're slightly confused on the meaning of each), or (2) split the worksheet across multiple pages and download the first page as the preview. I was hoping there was a third option: Saving to one image, but specifying the size of that image to limit the number of pixels. We couldn't figure a way to do that, so wondering if even possible.

Finally, can you clarify the difference betwee FitToPage and OnePagePerSheet, and how they interact with each other?

Your support is much appreciated.

Thanks!!

Hi,


Thanks for the template file and sharing the project here.

We found the issue initially. We need to do further evaluation and investigation for your issue. We will do it soon, kindly spare us some time. I have logged a ticket with an id: CELLSNET-40610 for your issue. Once we have any update for it, we will let you know here immediately.

Regarding PageSetup’s FitToPagesTall/Wide Vs OnePagePerSheet (ImageOrPrintOptions), the OnePagePerSheet will normally render one complete image for the whole worksheet. The FitToPages options will try to set the sheet accordingly to Tall and Wide option you specify for the sheet’s print preview, you may try to exercise the option in MS Excel to better understand it. Generally, the SheetRender’s image rendering engine depends upon print preview that is shown in MS Excel for the worksheet(s).



Thank you.

Hi Amjad,

Just to make sure I understand - when you say you found the issue, does that mean you think Aspose can improve the performance for handling this large worksheet? Any idea how much improvement (roughly) and when it would be ready?

Thanks.

Hi,


Yes, we are evaluating and working over improving the performance for handling larger Excel files. We will share an eta for it soon.

Thank you.

Hi,

Please try our latest version or fix of Aspose.Cells for .NET v8.5.2.2
Sheet to image - superscript and subscript shifted up too much in .NET

By converting first sheet to one page, using the following C# code (with OnePagePerSheet = true) with v8.5.2.2 only costs about 45 seconds now:
e.g.
Sample code:

Workbook wb = new Workbook("srcFile.xlsx");

Worksheet sheet = wb.Worksheets[0];

ImageOrPrintOptions imgOpts = new ImageOrPrintOptions();
imgOpts.ImageFormat = ImageFormat.Png;
imgOpts.OnePagePerSheet = true;
//imgOpts.SetDesiredSize(1000, 10000);

SheetRender render = new SheetRender(sheet, imgOpts);
render.ToImage(0, "destFile.png");

Also, with SetDesiredSize(int desiredWidth, int desiredHeight) function, you can set width and height of the output image.(See the comment line in the code).