Hi Mike,
Hi Awais,
Attached is the Word document we're testing. We are using version 11.2, and it takes about 17 seconds to load the document. Showing either a progress bar or the first page of a document while the rest of the document is processed would likely be acceptable.
Here was a post on your site where someone asked for a callback regarding conversion to PDF, and the Aspose representative said it would be in future release.
<a href="v</a></p><p>Do you have any update about that, and would it be available in Aspose.Word/Cell/etc.?</p><p>Thanks!!</p>
Hi Mike,
<span style=“font-size:
10.0pt;font-family:“Courier New”;color:#2B91AF;mso-no-proof:yes”>Document<span style=“font-size:10.0pt;font-family:“Courier New”;mso-no-proof:yes”> doc = new Document(“PaperVision_Capture_UserGuideR74.docx”);<o:p></o:p>
Document docPreview = GetFirstPageOfDocument(doc);
docPreview.Save("Document Out.pdf");
///
/// Extracts the first page of a document based on section, page breaks or from a set number of block levels nodes.
///
public static Document GetFirstPageOfDocument(Document doc)
{
// Number of paragraphs or tables in the document body to extract before stopping if we do not encounter any page or section breaks.
const int maxNumberOfBlockLevelNodes = 50;
int currentCount = 0;
Document previewDoc = (Document)doc.Clone(false);
NodeImporter importer = new NodeImporter(doc, previewDoc, ImportFormatMode.UseDestinationStyles);
foreach (Section section in doc.Sections)
{
// If this section starts on a new page then we know we have the first page.
if(section != doc.FirstSection)
{
SectionStart sectionType = section.PageSetup.SectionStart;
if(sectionType == SectionStart.EvenPage || sectionType == SectionStart.NewPage || sectionType == SectionStart.OddPage)
break;
}
// Add the section to the document.
previewDoc.AppendChild(importer.ImportNode(section, true));
previewDoc.LastSection.Body.RemoveAllChildren();
foreach (CompositeNode composite in section.Body.ChildNodes)
{
// Copy the node to the empty document.
previewDoc.LastSection.Body.AppendChild(importer.ImportNode(composite, true));
currentCount++;
// If the max number of nodes we predict are on the first page is reached or if the current paragraph contains a page break
// then we know we have the first page so return the document as is.
if (currentCount > maxNumberOfBlockLevelNodes || (composite != section.Body.LastParagraph && composite.Range.Text.Contains(ControlChar.PageBreak)))
return previewDoc;
}
}
return previewDoc;
}
Hi Awais,
Thank you so much for this method, it is exactly what I needed. I provided my CTO with results showing significant improvement in processing time. I've included that spread sheet in case you're interested.
I do have two additional questions for you please.
- We noticed that the first run on processing a document took significantly longer than if we processed the document again without closing the app. Is there a initialization routine the engine goes through on the first run?
- We would also need similar DocumentPreview functionality for Excel, PowerPoint, Visio, Outlook, (all Office formats you support). Would you be able to provide those methods as well? For now, if you could give us methods for Excel and PowerPoint, we could wait for the others until after purchase is made.
Thanks again so much.
Hi Mike,
- Loading a document into the Aspose.Words DOM only involves the document constructor. As soon as the constructor of the Document class has returned this means that the file is fully loaded into the DOM. This should not take any longer than a second even with large files. On your results you state that the document takes 15 seconds to load into the DOM, this sounds incorrect. Please let us know if this is really the case on your machine.
- Calling Document.UpdatePageLayout or saving to image or PDF etc is called rendering. This is what takes the bulk of the time as the document layout needs to be built in memory. This is what should be the 15 seconds.
- Yes, the first rendering conversion may take a longer as Aspose.Words needs to precache fonts and other resources. You can choose to precache such resources at the start of your application by creating a new document and calling Document.UpdatePageLayout.
- I will move your forum thread to the Aspose.Total forum so the support developers can take a look and provide you with the equivalent code you require for each product.
Hi Awais,
I'm cleaning up my solution to upload so you can check that we're doing things optimally -- should have it to you tomorrow, or Wednesday for you.
Yes, it sounds like I have the terminology wrong. Anyway, the time we're interested in is the total time to save the first image to disk so it can be displayed to the client browser. So I think my numbers are valid, just the terminology is off.
We had our meeting today with the CTO and I believe things are looking good for us to purchase. Couple things he asked that I need to follow up with you.
(1) How accurate is the output from the DocumentPreview code? Your code has a const maxNumberOfBlockLevelNodes = 50. What is the probability that a document would exceed that number and should we increase it? What would be an example of a document that exceeded 50 nodes, etc? His concern is that the preview image would look different from the original.
(2) CTO also wants performance number for Excel and PowerPoint. Any chance someone can provide us with a DocumentPreview method for those document types?
Thanks again Awais! Very much appreciated.
-Mike
Hi Mike,
Hi Adam,
Please find attached the VS 2010 solution that tests Aspose performance for converting a document. Let me know if any trouble. When the app first starts, it will default to the PaperVision Capture manual that we have been testing. The output image/SVG files will be dropped in a folder with the similar name as the source doc.
eg,
PaperVision Capture R74.docx => \PaperVision Capture R74_docx
We have started testing Excel and PowerPoint. Can we get a PreviewDocument method for those types of documents?
Much Thanks!
Ps, had to remove your dll's to minimize size of upload. We're using Version 11.2.
Hi Adam,
One other question that we were asked about is how Aspose determines the default page size. Is it always 8 1/2 x 11, or does it use A4 if running on a system in European culture?
Thanks
Hi Mike,
Thanks for your inquiry. Regarding determining the page size, I think, the following API links will be helpful to you.
Best Regards,
Hi Mike,
Thanks,
Hi Mike,
Hi Mike,
Hi Mudassirv,
Thanks for the update. Is there a link to SLIDESNET-33379, or is that an internal-only tracking number?
If you're interested, I've attached our latest performance results with PowerPoint and Excel.
Thanks!
Hi Amjadv,
Attached is the workbook that takes about 5 or 6 minutes to save the first worksheet to disk. It has 11 sheets; each sheet has 882 rows and 37 columns. Originally, it was my understanding the long processing time was due to images in the workbook. But it turns out the real culprit is sheer amount of data. Hence the DocumentPreview code you referenced above did not provide any benefit.
I've also attached our updated solution which includes AsposeCellsOutput, just in case you see something wrong with our implementation.
So as I see it, we have two options to save the image to disk. (1) Save as one big image (FitImageToPage and OnePagePerSheet, although we're slightly confused on the meaning of each), or (2) split the worksheet across multiple pages and download the first page as the preview. I was hoping there was a third option: Saving to one image, but specifying the size of that image to limit the number of pixels. We couldn't figure a way to do that, so wondering if even possible.
Finally, can you clarify the difference betwee FitToPage and OnePagePerSheet, and how they interact with each other?
Your support is much appreciated.
Thanks!!
Hi,
Hi Amjad,
Just to make sure I understand - when you say you found the issue, does that mean you think Aspose can improve the performance for handling this large worksheet? Any idea how much improvement (roughly) and when it would be ready?
Thanks.
Hi,
Hi,
Please try our latest version or fix of Aspose.Cells for .NET v8.5.2.2
Sheet to image - superscript and subscript shifted up too much in .NET
By converting first sheet to one page, using the following C# code (with OnePagePerSheet = true) with v8.5.2.2 only costs about 45 seconds now:
e.g.
Sample code:
Workbook wb = new Workbook("srcFile.xlsx");
Worksheet sheet = wb.Worksheets[0];
ImageOrPrintOptions imgOpts = new ImageOrPrintOptions();
imgOpts.ImageFormat = ImageFormat.Png;
imgOpts.OnePagePerSheet = true;
//imgOpts.SetDesiredSize(1000, 10000);
SheetRender render = new SheetRender(sheet, imgOpts);
render.ToImage(0, "destFile.png");
Also, with SetDesiredSize(int desiredWidth, int desiredHeight) function, you can set width and height of the output image.(See the comment line in the code).