The file size for page splitting after converting PPT to PDF is different from the expected size

shiveringpan · November 7, 2023, 2:24am

从Different File Sizes after Converting PPTX to PDF Locally in Java and via the Cloud继续讨论：

String sourceFile = "D:\\www\\aspose\\pj\\mass\\ssss_99_40.pdf";
PdfFileEditor pdfEditor = new PdfFileEditor();
int[] pagesToExtract = new int[] { 1, 30 };
for (int i = 1; i <= 30; i++){
    String targetFile = "D:\\www\\aspose\\pj\\mass\\ssss_99_40_-"+i+".pdf";
    pdfEditor.extract(sourceFile, new int[] { i }, targetFile);
}

The effect I want to achieve is that each page size should be the actual size of the current page, rather than containing some other unused information, such as unused file streams or unused fonts. Perhaps I need to perform optimization com.aspose.pdf.optimization.OptimizationOptions again after page splitting

Please review the above content file , Thank you to the team

asad.ali · November 7, 2023, 3:26pm

@shiveringpan

Would you kindly try using the below code snippet and check if it is producing the expected output:

String sourcePdfPath = "input.pdf";
String outputFolderPath = "output/";

// Open the source PDF document
Document pdfDocument = new Document(sourcePdfPath);

// Create the output folder if it doesn't exist
File outputFolder = new File(outputFolderPath);
outputFolder.mkdirs();

// Iterate through each page in the source document
for (int pageNumber = 1; pageNumber <= pdfDocument.getPages().size(); pageNumber++) {
    // Create a new Document for each page
    Document newDocument = new Document();
    newDocument.getPages().add(pdfDocument.getPages().get_Item(pageNumber));

    // Set page margins to zero
    for (Page page : newDocument.getPages()) {
        page.getPageInfo().setMargin(new MarginInfo(0, 0, 0, 0));
    }

    // Save the newly created document with zero margins
    String outputFilePath = outputFolderPath + "Page_" + pageNumber + ".pdf";
    newDocument.save(outputFilePath);
}

shiveringpan · November 9, 2023, 5:52am

Hello, it seems that the above code did not achieve the expected effect. The more pages in the file, the larger the page number, and the larger the generated file will become. You can refer to the Different File Sizes after Converting PPTX to PDF Locally in Java and via the Cloud - #8 by shiveringpan I saw the original PPT document I uploaded and the converted PDF document inside. The effect obtained by executing the above code is different from that obtained by exporting it to PDF using Microsoft PowerPoint and then running the above code. The problem is that the result of using Aspose. Slides to convert the PDF file to split the page contains some other content?

asad.ali · November 9, 2023, 5:01pm

@shiveringpan

Just to better understand, can you please confirm do you mean file size when you are saying that it is different when you split the page? OR do you mean the page size (dimensions) in the split document(s)?

asad.ali · November 10, 2023, 1:31pm

@shiveringpan

We request you please try optimizing the PDF documents once they are generated using the code example given in attached link. In case results are still not satisfactory, please let us know.

shiveringpan · November 13, 2023, 2:51am

Yes, use after cutting Optimize, Compress or Reduce PDF Size in Java|Aspose.PDF for Java Can achieve the expected results. However, I am not sure why this situation occurred. To do so, I need to execute optimization parameters for each converted file

asad.ali · November 13, 2023, 1:28pm

@shiveringpan

The increase in the size is because the PDF format saves/keeps resources that are needed to render the PDF document in multiple environments. These resources can be images, streams or fonts. In order to remove unnecessary resources, we optimize the PDF document to reduce its size. Do you want the PDFs to be generated with reduced size in the first place during conversion?

shiveringpan · November 14, 2023, 2:25am

Yes, in fact, I have already executed the file after converting it to PDF Optimize, Compress or Reduce PDF Size in Java|Aspose.PDF for Java Then use the optimized PDF to trim each page. This processing logic is different from Microsoft’s PPT exported PDF, as the content of each page cut in Microsoft’s exported PDF has not changed It seems like I need to optimize after cutting

asad.ali · November 14, 2023, 12:33pm

@shiveringpan

If you can please share complete routine of your program including sample source, sample output and sample expected output for our reference that you are currently performing at your side, we can log an investigation task to analyze this case in further depth.

shiveringpan · November 17, 2023, 3:12am

Sample File Link

Convert PPT to PDF and optimize

            String targetFile= "sourceFile.pptx";
            String sourceFile= "targetFile.pdf";
            FileOutputStream os = new FileOutputStream(targetFile);
            Presentation ppt = new Presentation(sourceFile);
            PdfOptions pdfOptions = new PdfOptions();
            // Sets the Jpeg quality
            pdfOptions.setJpegQuality((byte)40);  //0-100
            // Sets the behavior for metafiles
            pdfOptions.setSaveMetafilesAsPng(true);
            pdfOptions.setBestImagesCompressionRatio(true);  

            // Sets the text compression level
            pdfOptions.setTextCompression(PdfTextCompression.Flate);
            // Defines the PDF standard
            pdfOptions.setCompliance(PdfCompliance.Pdf15);
            // Saves the presentation as a PDF

            ppt.save(os, com.aspose.slides.SaveFormat.Pdf, pdfOptions);
            os.close();

PDF cropping generates each page

            String sourcePdfPath = "sourceFile.pdf";
            String outputFolderPath = "targetFile.pdf";

            Document pdfDocument = new Document(sourcePdfPath);

            File outputFolder = new File(outputFolderPath);
            outputFolder.mkdirs();

            for (int pageNumber = 1; pageNumber <= pdfDocument.getPages().size(); pageNumber++) {
                // Create a new Document for each page
                Document newDocument = new Document();
                newDocument.getPages().add(pdfDocument.getPages().get_Item(pageNumber));

                // Set page margins to zero
                for (Page page : newDocument.getPages()) {
                    page.getPageInfo().setMargin(new MarginInfo(0, 0, 0, 0));
                }

                // Save the newly created document with zero margins
                String outputFilePath = outputFolderPath + "Page_" + pageNumber + ".pdf";
                newDocument.save(outputFilePath);
            }

The PDF file generated by exporting directly to Microsoft PowerPoint and performing the aforementioned cropping are expected to differ for each page

Thank you to the team

asad.ali · November 17, 2023, 12:29pm

@shiveringpan

We have checked the files shared by you and it looks like they are generated using different versions of the APIs. Also, we noticed that the PDF files from PPTX are generated using evaluation/trial version of Aspose.Slides for Java.

Furthermore, as per our understandings, you want to generate optimized PDF files after splitting the PDF pages. Is it right? If so, you need to optimize the file size after you split the PDF document into multiple pages. In fact, it is recommend that you optimize the PDF file size at the last of every operation you are supposed to do on files.