CPU Limitation during conversion from Microsoft documents to PDF

As we have to convert Microsoft Documents (i.e. doc/docx, xls/xlsx, ppt/pptx) to PDF.
We found that once the number of pages is too large (e.g. 10000 pages for word or excel), the cpu utility would be 100%, and the whole server is irresponsible. Is there any option to limit the CPU utility for the conversion process?

Also, previously, I want to limit the page size limit. but I find that following logic to count page also draw a lot cpu resources. Is it correct to retrieve the page count?
e.g. words:
Document doc = new Document(in);
if(doc.getPageCount() > pageLimit) {… }
excel:
for(int i=0;i<workbook.getWorksheets().getCount();i++) {
Worksheet worksheet = workbook.getWorksheets().get(i);
ImageOrPrintOptions printoption = new ImageOrPrintOptions();
printoption.setPrintingPage(PrintingPageType.DEFAULT);
SheetRender sr = new SheetRender(worksheet, printoption);
pageCount += sr.getPageCount();
}
if(pageCount > pageLimit) {… }
e.g. powerpoint:
Presentation presentation = new Presentation(in);
if(presentation.getSlides().size() > pageLimit) { … }

@alanso

Please note that performance and memory usage all depend on complexity and size of the documents you are generating.

In terms of memory, Aspose products do not have any limitations. If you are loading huge documents, more memory would be required. This is because during processing, the document needs to be held wholly in memory.

Could you please share some more detail about your requirement along with sample documents and code example that you are using? We will then provide you more information about your query.

sparse_doc.zip (116.4 KB)

As some of user upload some sparse documents, and we find that it draw all the server resources. We want to enforce some limitation, like page limitation. However, we found that the excel parsing is not work. Seems it draw all CPU resource.

I have tried to use a separated thread to do that, but seems it still draw all the cpu. And I have set a timeout on the thread, but seems the conversion cannot be stopped after the thread timeout.

e.g.
@Async
public Future convertExcel(ByteArrayInputStream in) {

    return CompletableFuture.completedFuture("start").thenApply(s->{

        ByteArrayOutputStream out = new ByteArrayOutputStream();
        try {
            Workbook workbook = new Workbook(in);
            workbook.save(out, com.aspose.cells.SaveFormat.PDF);
        }catch(Exception e) {
            ......
        }
        return out;
    });     
}

And call the async thread by:
Future result = asposeFileConversionUtil.convertExcel(in);
result.get(5, TimeUnit.SECONDS);

After the thread timeout, the conversion is still running.

@alanso,

I evaluated your issue regarding Aspose.Cells for Excel rendering to PDF. I checked your file and found it has over 3 billion pages to be rendered (mostly the pages are blank as you just put one last entry into the last cell “XFD1048576”). Please note, Aspose.Cells would by default render all the pages (even if the pages are blank). Now you could imagine, i.e., render over 3 billion pages would be a huge task and surely it will eat resources (cpu, memory) and would take lots of time. In short, it is reasonable to use more resources when rendering such a huge list of pages into PDF file. Please remove your unnecessary blank pages before rendering to PDF which will work instantly. See the sample code for your reference:
e.g
Sample code:

Workbook workbook = new Workbook("f:\\files\\sparse.xlsx");
        SheetPrintingPreview preview = new SheetPrintingPreview(workbook.getWorksheets().get(0), new ImageOrPrintOptions());
        int pageCount = preview.getEvaluatedPageCount();
        System.out.println("Total number of pages: " + pageCount);

        workbook.getWorksheets().get(0).getCells().deleteBlankColumns();
        workbook.getWorksheets().get(0).getCells().deleteBlankRows();

        workbook.save("f:\\files\\out1.pdf");

Regarding Docx to PDF, we will give our feedback soon.

@alanso

We suggest you please use the latest version of Aspose products. We have converted the DOCX to PDF using the latest version of Aspose.Words for .NET 20.12 and it takes around seven seconds at our end. However, if you open the Word document in MS Word and scroll down to the last page, it takes much time for it.

Please note that it is quite difficult to answer such questions because CPU performance and memory usage all depend on complexity and size of the documents you are loading/generating.

It hardly depends on local environment. It can be completely different for a server that generates thousands documents 24/7 or for a local PC that generate only the one document by demand.

Hope this answers your query. Please let us know if you have any more queries.

I understand there are many blank pages in our example. And it is not realistic, however, I want to know if there is any method that can prevent Aspose pdf conversion from consuming all resources of the server?

I am using Aspose Java version. After upgrade the latest library, seems the page count works.

Seems your code works, the preview function doesn’t consume all the cpu resources.

I want to add limitation on page side control on conversion, other than the code you mention for excel.
Could you suggest the code for page count checking for word and powerpoint? Thanks a lot.

It is nice to know that it works for your needs.

We will give details on Word documents to PDF and PowerPoint presentations to PDF soon.