Pages dimensions are not preserved when copying pages to other document

everteamjmf · March 30, 2022, 12:58pm

Hello

Maybe it is a bug or maybe I am not using your library correctly. I hope it’s the latter and that you could show me please how to do it better.

In our processing of merging differents documents types to a pdf document, we encountered something strange, depending on the documents used for the merge, dimensions are or are not preserved.

The used documents are in the “resources” directory of the project :
(1) ppt2.pdf : a file resulting from the conversion of ppt2.pptx to pdf format. Since they come from a slide, it is ok that pages are smaller than standard PDF pages and are in landscape mode.
(2) demo0002.tif : a A4 scanned tif image
(3) test1.pdf : a A4 simple one page pdf
=> if merging (1) + (3) : the pages dimensions of ppt2.pdf are preserved in merged_result.pdf.
=> if merging (1) + (2) or (1) + (2) + (3) : the ppt2.pdf pages have not the right width and height in merged_result.pdf.

I attached this eclipse project with a small program and data files to reproduce : TestAspose.zip (2.2 MB)

Before running the test, please :

in the main() method, replace “D:/src/workspace_20220316/TestAspose/” with your own project directory
add aspose-pdf-22.2.jar in the /lib directory. (too big to put in the zip)
add aspose-slides-22.3-jdk16.jar in the /lib directory. (too big to put in the zip)
add Aspose.Total.Product.Family.lic in the root directory

Sorry I could download the aspose-pdf-22.2.jar, but the download of the last 22.3 version does not work : your repository show it but returns this error :
<Error>
<Code>NoSuchKey</Code>
<Message>The specified key does not exist.</Message>
<Key>repo/com/aspose/aspose-pdf/22.3/aspose-pdf-22.3.jar</Key>
<RequestId>A1NSB89WKWG9Q5TK</RequestId>
<HostId>gLReDPGnWzQV8ip/x03YFV+up+ap1VHvbY1Y21I1yzjaCyge12yhJ8qlvgUI5VhwnYV92ljpyjE=</HostId>
</Error>

In the source TestAspose.java, if one comments out the bloc (lines 59-62) which adds the image (loadImage(…)) then this is the case (1) + (3) and the dimensions of the 2 firsts pages are OK in merged_result.pdf.

But if the bloc which adds the image (lines 59-62) is compiled and executed, then this is the case (1) + (2) + (3) then you can find the bad dimensions of pages 1 and 2 in the file merged_result.pdf.

My question : how could I do to have dimensions preserved in all cases ?

Thanks
jmf

asad.ali · March 30, 2022, 8:23pm

@everteamjmf

The issue is possibly happening due to reusing the same document object without properly saving it. In order test the case with simple approach, we used the below code snippet and achieved a correct and expected output result:

com.aspose.slides.Presentation pres = null;
Document doc = null;
Document destdoc = null;
try {
 // Create destination document, result of merge
 destdoc = new Document();

 // Convert .pptx to pdf
 pres = new com.aspose.slides.Presentation( dataDir + "ppt2.pptx" );
 pres.save( dataDir + "ppt2.pdf", com.aspose.slides.SaveFormat.Pdf );
 // Merge ppt pages to result
 doc = new Document( dataDir + "ppt2.pdf" );
 //merge( doc, destdoc );
 destdoc.getPages().add(doc.getPages());
 destdoc.save(dataDir + "merged.pdf");
 // !!! If this image bloc is commented out, then the page dimensions of ppt2.pdf are preserved.
 // !!! If this image bloc is executed, ppt2.pdf pages have not the right width and hight in merged_result.pdf.
 // Load image to doc
 //doc = loadImage( dataDir + "demo0002.tif" );
 doc = new Document();
 Page page = doc.getPages().add();
 Image image = new Image();

 // Load sample BMP image file
 image.setFile(dataDir + "demo0002.tif" );
 page.getParagraphs().add(image);
 // Merge image doc to result
 destdoc = new Document(dataDir + "merged.pdf");
 destdoc.getPages().add(doc.getPages());
 destdoc.save(dataDir + "merged.pdf");
 //merge( doc, destdoc );

 // Add PDF to doc
 doc = new Document( dataDir + "test1.pdf" );
 destdoc = new Document(dataDir + "merged.pdf");
 destdoc.getPages().add(doc.getPages());
 destdoc.save(dataDir + "merged.pdf");
}

merged.pdf (906.5 KB)

You can run this code snippet in your environment and modify it as per your routine of process. Please feel free to let us know in case you need more information.

everteamjmf · March 31, 2022, 9:26am

Hello

Thank you very much for your quick response and for taking the time to examine the program and find a workaround.

I had already tested that saving/reloading the document of the image “demo0002.tif” solved the problem of this case.
And your idea of saving/reloading the merged result document also works.

I have discussed this with my technical manager and we both think that when adding pages of different dimensions (which could be in a same source doc), the dimensions of the first pages should not be changed by adding subsequent pages, and that this should be reported as a bug in Aspose.PDF.

Moreover, our PDF merge is used in different applications on our server, and some of our customers have several thousand users simultaneously cutting, pasting and merging pages from several source PDFs.
So we are a bit afraid of losing speed and using too much cpu/disk resources, if the server spends extra time saving+reloading the same document between each source document change, which in some worst cases can be: between each page.

Please either give us a more resource efficient workaround, or open a priority bug ticket on Aspose.PDF.
I’m sure we are not the only ones in your customers, who use this feature to merge without saving/reloading all the time.

Thank you very much
jmf

asad.ali · March 31, 2022, 5:59pm

@everteamjmf

Please note that API keeps all the resources in the memory until a document is closed or saved. The pages in destination document are effected when you keep changing the source document without properly saving it. Therefore, it is necessary to save the destination document in order to commit recent changes in it and to make sure that it does not get effect by new changes.

You can simply adopt an approach of saving the PDF document incrementally where you would not need to save it to any local path or re-initialize it. Please check the below sample code snippet that would minimize the resource consumption and performance overhead:

com.aspose.slides.Presentation pres = null;
Document doc = null;
Document destdoc = null;
try {
 // Create destination document, result of merge
 destdoc = new Document();
            
 pres = new com.aspose.slides.Presentation( dataDir + "ppt2.pptx" );
 pres.save( dataDir + "ppt2.pdf", com.aspose.slides.SaveFormat.Pdf );
 doc = new Document( dataDir + "ppt2.pdf" );
 destdoc.getPages().add(doc.getPages());
 destdoc.save();

 doc = new Document();
 Page page = doc.getPages().add();
 Image image = new Image();

 // Load sample BMP image file
 image.setFile(dataDir + "demo0002.tif" );
 page.getParagraphs().add(image);
 // Merge image doc to result
 destdoc.getPages().add(doc.getPages());
 destdoc.save();

 // Add PDF to doc
 doc = new Document( dataDir + "test1.pdf" );
 destdoc.getPages().add(doc.getPages());
 destdoc.save(dataDir + "merged.pdf");
}
finally {
}

AlainRUSSIER · April 1, 2022, 10:39am

Hello
Thanks again for your support, this is very valuable for us.
Your code snippet makes me to ask you a question and to make a remark.

Question
I did not thought that I could do save() without a path after new Document() without a path. You said that this saves incrementally, but where, in memory or in a temp file somewhere ? Can you please elaborate on this ?

Remark
I tried to enhance my merge() method with your “destdoc.save()” but this did not work, so I tried different things :

— 1 — I replaced the test method of the zipped project by exactly your code snippet
==> it works.

— 2 — Then I commented out the 2 lines “destdoc.save()”
==> it still works. So the miss of incremental save does not seem to be the cause of my problem.

— 3 — Then in both 3 places I replaced:

destdoc.getPages().add(doc.getPages());

with:

for (Page p : doc.getPages()) {
Page destpage = destdoc.getPages().add( p );
}

==> it still works. Then copying page by page was not the cause.

— 4 — then in the added “for” loops above I added:

destpage.setPageInfo( p.getPageInfo() );

==> and it does not work : the ppt2.pdf pages lost their dimensions.

— 5 — then I commented out the image copy (the second “for” loop)
==> and this makes it to work again

— 6 — now uncomment the 2 lines “destdoc.save()” of step 2, and uncomment the image copy (the second “for” loop)
==> it does not work again. the incremental save has not effect.

Conclusion
When a PDF Page is added, IF a PageInfo is also copied AND a page containing an Image is added after, THEN the pages dimensions of the previous pages are lost.

It did not work as you thought, and the actual way can give “unpredictable result” depending on page content, pages order, and information set.

Best regards
jmf

Below is your example code snippet modified with the above 6 steps:

    com.aspose.slides.Presentation pres = null;
    Document doc = null;
    Document destdoc = null;
    try {
     // Create destination document, result of merge
     destdoc = new Document();
                
     pres = new com.aspose.slides.Presentation( dataDir + "ppt2.pptx" );
     pres.save( dataDir + "ppt2.pdf", com.aspose.slides.SaveFormat.Pdf );
     doc = new Document( dataDir + "ppt2.pdf" );
     for (Page p : doc.getPages()) {
         Page destpage = destdoc.getPages().add( p );
         destpage.setPageInfo( p.getPageInfo() );
     }
     // destdoc.getPages().add(doc.getPages());
     destdoc.save();

     doc = new Document();
     Page page = doc.getPages().add();
     Image image = new Image();

     // Load sample BMP image file
     image.setFile(dataDir + "demo0002.tif" );
     page.getParagraphs().add(image);
     // Merge image doc to result
     for (Page p : doc.getPages()) {
         Page destpage = destdoc.getPages().add( p );
         destpage.setPageInfo( p.getPageInfo() );
     }
     // destdoc.getPages().add(doc.getPages());
     destdoc.save();

     // Add PDF to doc
     doc = new Document( dataDir + "test1.pdf" );
     for (Page p : doc.getPages()) {
         Page destpage = destdoc.getPages().add( p );
         destpage.setPageInfo( p.getPageInfo() );
     }
     // destdoc.getPages().add(doc.getPages());
     destdoc.save(dataDir + "merged.pdf");
    }
    finally {
    }

everteamjmf · April 1, 2022, 10:47am

I posted the reply above when connected with our paid account, sorry. But I’m the same human user everteamjmf. I just added this everteamjmf as a subaccount of our paid account.

asad.ali · April 1, 2022, 7:22pm

@everteamjmf

Thanks for your feedback and sharing test results from your side. The incremental approach to save a document is used when you are building it from scratch so that the changes could take effect. The document gets saved in the memory until you save it physically or in a memory stream. Furthermore, when you add a page inside destination document, it gets added with it all settings/PageInfo. You do not need to add PageInfo separately when you are merging PDFs.

Can you please also share your feedback about using PageInfo? Is it standard in your process to copy it as well? We will need to perform an investigation on this particular use-case. We will log an investigation ticket in our issue tracking system and share the ID with you.

everteamjmf · April 4, 2022, 12:42pm

Hello

The incremental save in memory seems very interesting and it opens horizons. Thank you for teaching me about this.

About the need of using PageInfo when copying pages from doc to other doc, we included it in our standard merge function because we found it is necessary when we have to set a special page size in the source document, for the page size to be set also set in the destination document.

For example if a source file to merge is an image which width is larger than height (landscape) and which width is larger than actual portrait page, then we exchange the width and height of the page (then landscape page) to have a chance to display the image bigger for users.

Here is a snippet example where I exchange the A4 width and height to use with this image : landscape_1334x700.jpg (148.3 KB)

Document doc = null;
Document destdoc = null;
try {
    // Create the destination document, where to merge all others
    destdoc = new Document();
    
    // Merge an image : as for other files types, we convert them to pdf before to merge
    // Load a landscape image
    Image image = new Image();
    image.setFile( dataDir + "landscape_1334x700.jpg" );
    // Create a document for the image 
    doc = new Document();
    Page page = doc.getPages().add();
    // Since image is landscape, set the page to landscape to display the image bigger
    PageSize ps = PageSize.getA4();
    page.getPageInfo().setWidth( ps.getHeight() );
    page.getPageInfo().setHeight( ps.getWidth() );
    // This temp doc could be multi-pages in case of multi-tiff
    page.getParagraphs().add( image );
    
    // Merge the pages containing images
    for (Page p : doc.getPages()) {
        Page destpage = destdoc.getPages().add( p );
        // To see the bug :
        //  comment out setPageInfo() below and the dest page becomes portrait mode instead of landscape  
        destpage.setPageInfo( p.getPageInfo() );        // <<<<<<<<<<<<<<
    }
    
    // Create the result file
    destdoc.save(  dataDir + "merged.pdf" );
}
finally {
    //...
}

It works as it is : the destination document has a landscape page as wanted.
But if you comment out the line having the <<<<<<<<<<<< comment, the destination document has a portrait page.

I think this is not the same bug as reported in the posts above, but shows why we need to set PageInfo and it could have its own ticket.

Thanks and best regards
jmf

asad.ali · April 4, 2022, 8:51pm

@everteamjmf

Thanks for the feedback. Can you please share the sample documents which can be used to reproduce this issue in our environment using your code snippet? We will further check it and log a ticket accordingly.

everteamjmf · April 5, 2022, 10:06am

Hello
Only the file that is attached to the above post is needed to run the test. I upload it again : landscape_1334x700.jpg (148.3 KB)

To run the code, please use the zipped project (of the first post above) and just replace the content of the test() method with the yesterday code snippet.

I spent a lof of time making different sample programs to reproduce bugs in Aspose.PDF, now please may I ask you to open the 2 different bug tickets showed in this thread and let us know ?

one for the bug showed yesterday by the snippet just above
one for the bug showed by the snippet posted friday, 1st of april

Waiting for your fixes
Thanks for your help
jmf

asad.ali · April 5, 2022, 7:57pm

@everteamjmf

We are checking it and will get back to you shortly.

asad.ali · April 6, 2022, 4:52am

@everteamjmf

We have logged two tickets in our issue management system:

PDFJAVA-41482 (For first scenario)
PDFJAVA-41483 (For second scenario)

We will definitely look into its details and keep you posted with the status of its rectification. Please be patient and spare us some time. We apologize for your inconvenience.

everteamjmf · December 6, 2023, 10:05am

Hello

I see that the issue status of PDFJAVA-41482 and PDFJAVA-41483 is “Closed”.
Please can you tell us if they are fixed, and if yes in which version of Aspose.PDF library ?

So we could remove the workaround of saving reloading the document after each page, this could speed up our process significantly.

Thanks by advance
Best regards
Jean-Michel

asad.ali · December 6, 2023, 7:13pm

@everteamjmf

About the above ticket, Pdf Specification doesn’t have such data in PDF as PageInfo. It can’t be saved in a pdf document.

PageInfo - is Object used for pdf generation only. And it is expected behavior that after document save it will be lost.

When a document is initiated, PageInfo will always be in default state and could be modified for new document design.

It is described in PageInfo class description:

Represents the page information for pdf generator.

The reason why pages dimensions are not preserved when document is created and paragraphs are not processed using doc.processParagraphs(); or document.save methods.
If we call doc.processParagraphs(); before loop - the pages dimensions are preserved:

doc.processParagraphs();
// Merge the pages containing images
for (Page p : doc.getPages()) {
            Page destpage = destdoc.getPages().add( p );
            // To see the bug :
            //  comment out setPageInfo() below and the dest page becomes portrait mode instead of landscape
//            destpage.setPageInfo( p.getPageInfo() );        // <<<<<<<<<<<<<<
}

everteamjmf · December 7, 2023, 9:10am

Thank you very much for your reply, we’ll try the solution with doc.processParagraphs().
Best regards
Jean-Michel

everteamjmf · December 19, 2023, 3:28pm

Thank you very much : it works very well, then it is also closed for us.
Best regards
Jean-Michel

asad.ali · December 19, 2023, 11:51pm

@everteamjmf

Nice to know that. Please feel free to create a new topic in case you need further assistance.