Split large PPTX files (in parallel) into individual slides

paulrinckens · November 21, 2018, 12:30pm

Hi Aspose Support Team,

in our project we want to split large PPTX Files (up to 80MB and 600 Slides) into individual slides. The solution is provided as stateless REST endpoint so that multiple split requests can be processed at the same time on the same server. This means the server needs to handle the parsing of multiple of these large PPTX files into Aspose.Presentation objects in parallel and handle the splitting. The splitting logic reduces to the following code snippet:

			presentation = new Presentation(inFilePath.toString(), loadOptions);

			presentation.getMasters().removeUnused(true);
			presentation.getLayoutSlides().removeUnused();

			for (int i = 0; i < presentation.getSlides().size(); i++) {

				logger.info("Splitting slide " + (i + 1) + ".");

				newPresentation = new Presentation(loadOptions);

				float sourceWidth = (float) presentation.getSlideSize().getSize().getWidth();
				float sourceHeight = (float) presentation.getSlideSize().getSize().getHeight();
				int sourceSizeType = presentation.getSlideSize().getType();
				newPresentation.getSlideSize().setSize(sourceWidth, sourceHeight, sourceSizeType);

				newPresentation.getSlides().get_Item(0).remove();
				newPresentation.getSlides().addClone(presentation.getSlides().get_Item(i));

				newPresentation.getMasters().removeUnused(true);
				newPresentation.getLayoutSlides().removeUnused();

				newPresentation.save(outDir.resolve((i + 1) + "." + FileType.PPTX.getFileEnding()).toString(),
						SaveFormat.Pptx);

				newPresentation.dispose();
			}

The issue we are facing is, that if multiple request are processed in parallel, i.e. multiple large documents are processed and split in parallel, the server crashes with a heap space overflow.

We run the service on machines with 4GB Heap Space.

As this is a large client project intended to go into production, we are dependent on a stable solution that can handle a predefined amount of requests in parallel.
We would be glad about instructions on how to improve performance on the implemented code and/or on what architectural requirements are needed to handle a certain workload.

Unfortunately I can not share any example presentation files with you.

Best regards

Paul

mudassir.fayyaz · November 21, 2018, 5:15pm

@paulrinckens,

I have observed your requirements and sample code on my end. The sample code used seems fine. You are using huge presentations with huge number of slides inside it. In order to process these presentations, you need to increase the Java heap size on your end since you are processing multiple large documents and splitting them in parallel. The slide cloning procedure it self is also a resource eating process as it involves copying slides data structure from one presentation to another.