Memory optimise presentation and excel file conversions

The code for presentation to html file I use -
Presentation pres = new Presentation(new ByteArrayInputStream(content.getBytes()));
ByteArrayOutputStream baos = new ByteArrayOutputStream();

        try {
            // setting html options
            HtmlOptions options = new HtmlOptions();
            options.setHtmlFormatter(
                    HtmlFormatter.createCustomFormatter(new CustomFormattingController()));
            INotesCommentsLayoutingOptions inotesOpts = options.getNotesCommentsLayouting();
            inotesOpts.setNotesPosition(NotesPositions.BottomFull);

            // save the slides to output stream
            pres.save(baos, SaveFormat.Html, options);
        } finally {
            pres.dispose();
            baos.close();
        }

        return ExtractedContent.builder().bytes(baos.toByteArray()).build();

The code I use to convert excel to html -

ByteArrayOutputStream baos = null;
        List<byte[]> list = new ArrayList<>();
        HtmlSaveOptions options;

        try {
            Workbook workbook = new Workbook(new ByteArrayInputStream(input.getBytes()));
            WorksheetCollection worksheetCollection = workbook.getWorksheets();
            worksheetCount = worksheetCollection.getCount();

            for (int i = 0; i < worksheetCount; i++) {
                baos = new ByteArrayOutputStream();
                worksheetCollection.setActiveSheetIndex(i);
                options = new HtmlSaveOptions();
                options.setExportHeadings(true);
                options.setPresentationPreference(true);
                options.setEncoding(Encoding.getUTF8());
                options.setExportActiveWorksheetOnly(true);
                options.setExportGridLines(true);
                // save the options into byte array output stream
                workbook.save(baos, options);
                // add the baos in the list
                list.add(baos.toByteArray());
                baos.close();
            }
        } catch (Exception ex) {
            log.debug(ex.getMessage());
            throw ex;
        }

        // this will be returned and each baos will be handled as a separate resource.
        return ExtractedContent.builder().byteList(list).build();

I want this code to be memory optimised.
Which configurations can I use to lower the memory footprint but not compromise of content quality?
Please suggest.
Can’t find much in the documentations.

@ankit11088,

Please zip and and attach the Excel file (XLS/XLSX) and Presentation file (PPT/PPTX) here to evaluate your issue on our end.

Attaching here some of the files I am using.
The converted html is of a lot bigger size.
I am expecting some configurations which will lower the memory footprint and still preserve the quality of the content.Heavy-documents.zip (9.6 MB)

@ankit11088,

Regarding Excel to HTML rendering using Aspose.Cells for Java API, I have tested using our latest version/fix (please try it): Aspose.Cells v21.7.6 with the following sample code (to convert your each Excel file to HTML), it works fine and I do not notice any memory hike issue:
e.g.
Sample code:

 com.aspose.cells.License license = new com.aspose.cells.License();
        license.setLicense("Aspose.Cells.Java.lic");
        
        Workbook workbook = new Workbook("f:\\files\\MIS-600-RS-AdultIncomes.xlsx");
        //Workbook workbook = new Workbook("f:\\files\\MIS-600-RS-AdultIncomes (1).xlsx");
        //Workbook workbook = new Workbook("f:\\files\\MIS-620-RS-T3-FinancialDataSet.xlsx");
        WorksheetCollection worksheetCollection = workbook.getWorksheets();
        int worksheetCount = worksheetCollection.getCount();
        HtmlSaveOptions options = new HtmlSaveOptions();
        for (int i = 0; i < worksheetCount; i++) {
            
            worksheetCollection.setActiveSheetIndex(i);
            options = new HtmlSaveOptions();
            options.setExportHeadings(true);
            options.setPresentationPreference(true);
            options.setEncoding(Encoding.getUTF8());
            options.setExportActiveWorksheetOnly(true);
            options.setExportGridLines(true);
            
            workbook.save("f:\\files\\out1MIS-600-RS-AdultIncomes_" + i + ".html", options);
            //workbook.save("f:\\files\\outMIS-620-RS-T3-FinancialDataSet_" + i + ".html", options);
        }

Please try our latest fix/version (the Download link is shared above) and share your feedback.

Regarding Powerpoint to HTML, we will look into your issue and get back to you soon.

Thanks for the reply and suggestion.
This is pretty much what I am doing right now also, only thing is I can’t write to disk, everything has to be in RAM.

What was the size of htmls which were generated on your side?

I will try with latest version - 21.7.6

Also looking eagerly for your take on ppt conversion part.

@ankit11088,

The size of the output HTMLs for “MIS-600-RS-AdultIncomes.xlsx” and “MIS-600-RS-AdultIncomes (1).xlsx” is around 1.20 MB to 1.37 MB. For “MIS-620-RS-T3-FinancialDataSet.xlsx” the size of the output HTML is 4.70 MB.

We will be checking the issue and get back to you soon.

@ankit11088

As far a code concerning to Aspose.Slides is concerned, it is fine. However, you can use HtmlOptions parameters to customize output quality/resulting HTML size: JpegQuality, PicturesCompression, DeletePicturesCroppedAreas. But these parameters almost no impact to the application memory consumption. Other than that it is completely fine.

Thanks for the reply.

Can we do something about the BottomFull and BottomTruncated approach?
Which one is less memory intensive?

What’s the difference between BottomFull and BottomTruncated?
I see with this option set as None, the memory footprint comes down drastically.

Can you guys shed some light over it how to use this option optimally?

I suggest you to please refer to following API reference guide for further elaboration.

Hi,
Thanks for the reply. Got caught up in few things, couldn’t come back.
So yeah seems like we have to go with what we are using currently and that’s how the aspose works, the memory footprint is still the same.

We are considering another option for converting the pptx and xlsx documents to PDF instead of HTML. They have considerably less footprint so to say.

Meanwhile, I wanted to know few things from your side.

image (11).png (69.0 KB)

This is a snapshot of our GC analysis, you can see the GCLocker is causing GC and is topmost factor.
Can I know if in aspose we are making any JNI calls which can result in this GC?
What can be causing it?

For this part of your question, I request you to please share complete details including the scenario that I may discuss with team internally and get back to you with some feedback.