Method to generate/concatenate properly one docx to n PDF

Hello Aspose Team

We are trying to create program in order to do a light CCM POC.

Input are:

  • 1 DOCX as a layout document with mass mailing field
  • 1 XML with data input to fill mass mailing field n times -> we test at 1250

Output wanted:

  • First 1 PDF for each DOCX + XML flow generate , for example with 4 pages
  • Then 1 merged PDF with 4*1250 = 5000 pages

Several questions:

  1. Concatenate 5000 PDF take a lot of time (a couple of hours) and we often have heap space memory issues on PC with Eclipse / JVM 1.8 32 bits so we looking for best design it it’s exists

Currently we are doing steps in this way:

· Generate physically in a folder 1250 PDF with 4 pages from 1 DOCX in XML format + XML flow

· Launch concatenate process on first 2000 pages with next 2000pages

· Launch concatenate process on 4000 pages with last 1000 pages

Would it be more efficient for examples:

  • To generate n PDF with extract/delete images and parse big PDF to re include images at this time , see also question 2)
  • To concatenate before n DOCX in XML format then generate big PDF and finally split in 1250 PDF files?
  1. We have JPEG images on layout and we want to reduce PDF final Mo size so is it possible to have JPEG images only on first 4 pages generated and work with image reference on all the next pages? And what is the best way to proceed?

These are important architecture’s solution points because we aim to generate PDF file up to 100K pages.

Thanks for your help :slight_smile:

See you soon

Hi Nicolas,

Thanks for your interest in Aspose.

When concatenating PDF files using Aspose.Pdf. PdfFileEditor has a property UseDiskBuffer, If this property is set to true then resultant document of concatenation is resaved during concatenation. Using of this property makes concatenation with incremental updates. This allows to save memory because we don’t need to store all document data in memory. Please note number of documents which will be concatenated before next re-save is determined by ConcatenationPacketSize. Please check following code snippet for details.

.....
.....
PdfFileEditor editor = new PdfFileEditor();
editor.setUseDiskBuffer(true);
editor.setConcatenationPacketSize(100);
editor.concatenate(fileArray,"C:\\Shiv Working\\Medigap\\Merged.pdf");
.....
.....

It is not recommended to generate PDF without images and include later. It will take more processing time, first for concatenation PDF documents and later adding images. Similarly Aspose.Words does not recommend converting very big document to PDF, performance may degrade

I am afraid it is not supported at the moment to add images in first four pages and reference these in remaining document to reduce the file size. We have logged an enhancement ticket PDFNET-40961 in our issue tracking system for further investigation and implementation. However you may optimize PDF file size by following instruction on this documentation link.

So you can convert DOCX+XML to PDF using mail merge feature of Aspose.Words in first step and later can concatenate PDF documents and optimize the PDF using Aspose.PDF. Hopefully it will help you to accomplish the task.

Please feel free to contact us for any further assistance.

Best Regards,

Hello Tital,

Thanks for your reply.
At this point we used the following code to concatanate one PDF with another one but it takes hour and hour

try 
{
    // Add the pages of the source document to the target document
    pdfDocument1.getPages().add(pdfDocument2.getPages());
    
    // Save the concatenated output file (the target document)
    pdfDocument1.save(pathfinale + TARGET_FILE);
}
finally
{
    pdfDocument1.dispose();
    pdfDocument1.close();
    pdfDocument2.dispose();
    pdfDocument2.close();
}

Unfortunately your snippet code with PdfFileEditor is not working: PDF are merged but with have text overlay issue so PDF is unusable: do you already know this bug?

// Open the target document
PdfFileEditor editor = new PdfFileEditor();

//editor.setsetRemoveUnusedObjects
editor.setConvertTo(PdfFormat.PDF_A_1A);

editor.setUseDiskBuffer(true);
editor.setConcatenationPacketSize(3);

String [] fileArray = {path+"APIDOC1001.PDF", path+"APIDOC1002.PDF", path+"APIDOC1003.PDF"};

editor.concatenate(fileArray,"C:\\test\\Merged.pdf");

Thanks for your help for troubleshoot
Nico

Hi Nico,


Thanks for your feedback. We will appreciate it if you please share your sample source documents here, we will look into these and will provide you information accordingly.

We are sorry for the inconvenience caused.

Best Regards,

Hi


I have attached an example of PDF with this malfunction

Thanks for your help.

Regards,

Nico

Hi Nico,


Thanks for sharing resultant merged PDF document here. As requested above, please also share sample input PDF documents here. We will test the scenario at our end and will provide you information accordingly.

We are sorry for the inconvenience caused.

Best Regards,

Hi Tilal Ahmad


We are trying an other way to proceed without Aspose because anyway time process to concatenate seems to be too important for our requirement so at this time you can suspend this thread.

Regards

Nico

Hi Nico,


Thanks for your feedback. Sure, we can hold the investigation.

However as per my suggestion if you please share your source sample PDF documents here, then meanwhile we may investigate and try to fix any issue if it exists, until you test your other options. It will help us to improve our API.

We are sorry for the inconvenience.

Best Regards,