Page size issues when appending existing pdfs

Hi,


I am using the following simple piece of code to merge several pdfs in to a single pdf. The last two pdfs each contain a single page that is A3 sized and landscape orientation. All pages of the other documents are regular portrait A4. If I look at the resultant pdf all pages are there and all are displayed correctly. The last two pages of the merged document are indeed A3 landscaped pages when viewed with Adobe Acrobat Reader.

However, if I look at the PageInfo properties of the pages in either the outputDocument.Pages or reopen.Pages collections, they are always 842.0 height and 595.0 width and non-landscaped. The media box, crop box, rectangle, etc. of the last two pages are 842 x 1190 as would be expected of a landscape A3 page, but the PageInfo structure is incorrect.

            var inputDocument2 = new Document(@“D:\append-2.pdf”);
var inputDocument3 = new Document(@“D:\append-3.pdf”);
var inputDocument4 = new Document(@“D:\append-4.pdf”);
var inputDocument5 = new Document(@“d:\ml-spread-report-hc01-abs0011-2010-11.pdf”);
        <span style="color: blue;">var</span> outputDocument = <span style="color: blue;">new</span> <span style="color: rgb(0, 0, 139);">Document</span>();
        outputDocument.<span style="color: purple;">Pages</span>.<span style="color: rgb(0, 139, 139);">Add</span>(inputDocument2.<span style="color: purple;">Pages</span>);
        outputDocument.<span style="color: purple;">Pages</span>.<span style="color: rgb(0, 139, 139);">Add</span>(inputDocument3.<span style="color: purple;">Pages</span>);
        outputDocument.<span style="color: purple;">Pages</span>.<span style="color: rgb(0, 139, 139);">Add</span>(inputDocument4.<span style="color: purple;">Pages</span>);
        outputDocument.<span style="color: purple;">Pages</span>.<span style="color: rgb(0, 139, 139);">Add</span>(inputDocument5.<span style="color: purple;">Pages</span>);

        outputDocument.<span style="color: rgb(0, 139, 139);">Save</span>(<span style="color: rgb(163, 21, 21);">@"D:\output.pdf"</span>);

        <span style="color: blue;">var</span> pageEditor = <span style="color: blue;">new</span> <span style="color: rgb(0, 0, 139);">PdfPageEditor</span>(outputDocument);
        <span style="color: blue;">var</span> pageCount = pageEditor.<span style="color: rgb(0, 139, 139);">GetPages</span>();

        <span style="color: blue;">for</span> (<span style="color: blue;">int</span> <span style="font-weight: bold;">pageNum</span> = 1; <span style="font-weight: bold;">pageNum</span> <= pageCount; <span style="font-weight: bold;">pageNum</span>++)
        {
            <span style="color: blue;">var</span> pageSize = pageEditor.<span style="color: rgb(0, 139, 139);">GetPageSize</span>(<span style="font-weight: bold;">pageNum</span>);
        }

        <span style="color: blue;">var</span> reopen = <span style="color: blue;">new</span> <span style="color: rgb(0, 0, 139);">Document</span>(<span style="color: rgb(163, 21, 21);">@"d:\output.pdf"</span>);</pre></div><div><br></div><div>Additionally, if I look at the pageSize value received from the PdfPageEditor class the page sizes do come through correctly, it is only the page info within the Pages class received from the Document.Pages property that are incorrect.</div><div><br></div><div>Am I doing something wrong? I need to do some further operations, positioning some floating boxes to cover the existing page header and footer information before replacing them, so I need to know the sizes of the pages.</div><div><br></div><div>Lastly, can you recommend a better way of removing the header and footers from the existing pdf files and then merging them in to a new document that has a consistent (programatically generated) header and footer across all pages?</div><div><br></div><div>Edit: Oh, and I am using the 9.2.1.0 version of the Aspose.Pdf library.</div><div><br></div>

Boschy:
I need to do some further operations, positioning some floating boxes to cover the existing page header and footer information before replacing them, so I need to know the sizes of the pages.

Hi John,

Thanks for your inquiry. Please note to get Page coordinates use Rect property of Page object, it represents actual page coordinates stored in Page dictionary.

Moreover, Rect property does not consider page rotation parameter by default. In order to take into consideration page rotation, we have Page.GetPageRect(bool considerRotation) method. If we pass considerRotation parameter as true then it consider rotation angle and return actual rectangle dimensions. Please check following code snippet:

foreach (Page page in
reopen.Pages)<o:p></o:p>

{<o:p></o:p>

Aspose.Pdf.Rectangle
rect = page.GetPageRect(true);<o:p></o:p>

Console.WriteLine(“Page {0} width is {1} and heigth is {2}, rotation:
{3}, size considering rotation: widht {4} : height {5}”
,
page.Number, page.Rect.Width, page.Rect.Height, page.Rotate.ToString(),
rect.Width, rect.Height);<o:p></o:p>

}<o:p></o:p>

Please feel free to contact us for any further assistance.


Best Regards,

Boschy:


Lastly, can you recommend a better way of removing the header and footers from the existing pdf files and then merging them in to a new document that has a consistent (programatically generated) header and footer across all pages?


Hi John,

Thanks for your inquiry. I’m afraid currently Aspose.Pdf can only delete header/footer, if those are added using Aspose.Pdf.

In order to accomplish your requirement of removing Header/Footer from PDF file, you have to create a Stamp (header, footer, page number) with an identifier using PdfFileStamp.StampId property. And later using you can remove the stamp with PdfContentEditor object. Please check following code snippet for the purpose. Hopefully it will help you to accomplish the task.


PdfFileStamp pfe = new PdfFileStamp(“PdfWithSeveralPages.pdf”,“34634.pdf”);<o:p></o:p>

//100 is stampId for footer<o:p></o:p>

pfe.StampId = 100;<o:p></o:p>

pfe.AddFooter(new FormattedText(“Footer”), 10);<o:p></o:p>

//200 is stampId for header<o:p></o:p>

pfe.StampId = 200;<o:p></o:p>

pfe.AddHeader(new FormattedText(“Header”), 10);<o:p></o:p>

//300 if stampId for page number<o:p></o:p>

pfe.StampId = 300;<o:p></o:p>

pfe.AddPageNumber(new FormattedText(" Page #", System.Drawing.Color.Red, System.Drawing.Color.Blue));<o:p></o:p>

pfe.Close();<o:p></o:p>

PdfContentEditor pce = new PdfContentEditor();<o:p></o:p>

pce.BindPdf(“34634.pdf”);<o:p></o:p>

StampInfo[] stamps = pce.GetStamps(1);<o:p></o:p>

Console.WriteLine(stamps.Length);<o:p></o:p>

Assert.AreEqual(3, stamps.Length);<o:p></o:p>

//show found stamps IDs<o:p></o:p>

foreach (StampInfo info in stamps)<o:p></o:p>

{ Console.WriteLine(info.StampId); }<o:p></o:p>

//remove header, footer and page number<o:p></o:p>

pce.DeleteStampById(100);<o:p></o:p>

pce.DeleteStampById(200);<o:p></o:p>

pce.DeleteStampById(300);<o:p></o:p>

pce.Save(“34634-1.pdf”);<o:p></o:p>

PdfContentEditor pce1 = new PdfContentEditor();<o:p></o:p>

pce1.BindPdf(“34634-1.pdf”);



Please feel free to contact us for any further assistance.

Best Regards,

Thank you for the reply and advice. I will implement and see how it goes.


Unfortunately, the existing pdfs aren’t made with Aspose and, as such, have hard coded headers and footers. My plan is to add a stamp that will cover the old header and footer, then add stamps with the new detail on top.

Ok, the problems continue. I can now successfully read the proper rectangle for each page and am happily placing page numbers in the new footer.


However, I have run in to another problem. While appending all of the already existing pdf documents that weren’t generated by Aspose together, I am adding two floating boxes to those pages that had a header and footer in the original document. The intent of the floating boxes is to cover the original header and footer. Remember also that the documents being appended have a mix of page size and orientations. But, when I now save the document all pages are saved with the same size and orientation (all portrait A4), regardless of the actual page rectangle. This means A3 landscape pages only show half the original page in an A4 portrait page. If I comment out the floating box adding code, the document is saved with the correct page sizes and orientations and everything looks good.

Is there another way to “cover” the existing header and footer in the appended pdfs? Remember that these existing documents weren’t created with Aspose, so I can’t use the technique suggested above.

Boschy:
However, I have run in to another problem. While appending all of the already existing pdf documents that weren’t generated by Aspose together, I am adding two floating boxes to those pages that had a header and footer in the original document. The intent of the floating boxes is to cover the original header and footer. Remember also that the documents being appended have a mix of page size and orientations. But, when I now save the document all pages are saved with the same size and orientation (all portrait A4), regardless of the actual page rectangle. This means A3 landscape pages only show half the original page in an A4 portrait page. If I comment out the floating box adding code, the document is saved with the correct page sizes and orientations and everything looks good.

Hi John,

Thanks for your inquiry. Please share your sample code and document, so we will look into it and will provide you more information accordingly.

We are sorry for the inconvenience caused.

Best Regards,

Boschy:
Is there another way to “cover” the existing header and footer in the appended pdfs? Remember that these existing documents weren’t created with Aspose, so I can’t use the technique suggested above.

Hi John,

Thanks for your feedback. I am afraid there is no other convenient way to remove a header/footer than as suggest above. However as work around you can remove Header/TextStamp/Watermark by replacing stamp contents with empty text string as following.

PdfContentEditor editor = new PdfContentEditor();

editor.BindPdf(myDir + “watermark.pdf”);

editor.ReplaceText(“Neevia Document Converter P”, “”);<o:p></o:p>

editor.Save(myDir+“watermark_out.pdf”);



We are sorry for the inconvenience caused.


Best Regards,

Hi Tilal,


First off, thanks for taking the time to look at this issue for me. The code I am using during testing is below. If the code is run as is, the foreach section of the PageType.Custom3 case in AppendDocument will add a floating box as a header and footer. If you look at the output you will see that all of the pages are of equal size and orientation. If you comment out the two lines within the foreach of the PageType.Custom3 case and regenerate the document you will see that the output pages are correct (document 3 is A3 landscape).

As I am sure you will be able to note, I have also made the two header/footer floating boxes have very obvious background colours during testing. The actual colour, however, doesn’t change the outcome.
	public void AppendPdf()
{
var pageInfoList = new List<PageInfo>();
        <span style="color:green;">// Instantiate License class and call its SetLicense method to use the license</span>
        <span style="color:blue;">var</span> license = <span style="color:blue;">new</span> <span style="color:darkblue;">License</span>();
        license.<span style="color:darkcyan;">SetLicense</span>(<span style="color:#a31515;">"Aspose.Pdf.lic"</span>);

        <span style="color:blue;">var</span> pageInfos = <span style="color:blue;">new</span> <span style="color:darkblue;">Dictionary</span><<span style="color:darkblue;">PageType</span>, <span style="color:darkblue;">PageInfo</span>>();
        <span style="color:blue;">var</span> outputDocument = <span style="color:blue;">new</span> <span style="color:darkblue;">Document</span>();

        <span style="color:blue;">var</span> inputDocument1 = <span style="color:blue;">new</span> <span style="color:darkblue;">Document</span>(<span style="color:#a31515;">@"D:\append-2.pdf"</span>);
        <span style="color:blue;">this</span>.<span style="color:darkcyan;">AppendDocument</span>(inputDocument1, <span style="color:darkblue;">PageType</span>.<span style="font-weight:bold;color:purple;">Custom1</span>, outputDocument, pageInfos);

        <span style="color:blue;">var</span> inputDocument2 = <span style="color:blue;">new</span> <span style="color:darkblue;">Document</span>(<span style="color:#a31515;">@"D:\append-3.pdf"</span>);
        <span style="color:blue;">this</span>.<span style="color:darkcyan;">AppendDocument</span>(inputDocument2, <span style="color:darkblue;">PageType</span>.<span style="font-weight:bold;color:purple;">Custom2</span>, outputDocument, pageInfos);
        
        <span style="color:blue;">var</span> inputDocument3 = <span style="color:blue;">new</span> <span style="color:darkblue;">Document</span>(<span style="color:#a31515;">@"D:\append-4.pdf"</span>);
        <span style="color:blue;">this</span>.<span style="color:darkcyan;">AppendDocument</span>(inputDocument3, <span style="color:darkblue;">PageType</span>.<span style="font-weight:bold;color:purple;">Custom3</span>, outputDocument, pageInfos);
         
        <span style="color:darkcyan;">AddHeaderAndFooter</span>(outputDocument);

        outputDocument.<span style="color:darkcyan;">Save</span>(<span style="color:#a31515;">@"D:\output.pdf"</span>);
    }

    <span style="color:blue;">void</span> <span style="color:darkcyan;">AppendDocument</span>(<span style="color:darkblue;">Document</span> inputDocument, <span style="color:darkblue;">PageType</span> pageType, <span style="color:darkblue;">Document</span> outputDocument, <span style="color:darkblue;">IDictionary</span><<span style="color:darkblue;">PageType</span>, <span style="color:darkblue;">PageInfo</span>> pageInfos)
    {
        <span style="color:blue;">var</span> startPage = outputDocument.<span style="color:purple;">Pages</span>.<span style="color:purple;">Count</span> + 1;
        
        outputDocument.<span style="color:purple;">Pages</span>.<span style="color:darkcyan;">Add</span>(inputDocument.<span style="color:purple;">Pages</span>);

        <span style="color:blue;">var</span> endPage = outputDocument.<span style="color:purple;">Pages</span>.<span style="color:purple;">Count</span>;
        <span style="color:blue;">var</span> pageRange = <span style="color:darkblue;">Enumerable</span>.<span style="color:darkcyan;">Range</span>(startPage, endPage - startPage + 1).<span style="color:darkcyan;">ToList</span>();

        <span style="color:darkblue;">PageInfo</span> pageInfo;
        <span style="color:blue;">if</span> (!pageInfos.<span style="color:darkcyan;">TryGetValue</span>(pageType, <span style="color:blue;">out</span> pageInfo))
        {
            pageInfo = <span style="color:blue;">new</span> <span style="color:darkblue;">PageInfo</span> { <span style="color:purple;">Type</span> = pageType };
            pageInfos[pageType] = pageInfo;

            <span style="color:blue;">var</span> pageRect = outputDocument.<span style="color:purple;">Pages</span>[startPage].<span style="color:darkcyan;">GetPageRect</span>(<span style="color:blue;">true</span>);
            <span style="color:blue;">var</span> pageMargin = outputDocument.<span style="color:purple;">Pages</span>[startPage].<span style="color:purple;">PageInfo</span>.<span style="color:purple;">Margin</span>;

            <span style="color:blue;">switch</span> (pageType)
            {
                <span style="color:blue;">case</span> <span style="color:darkblue;">PageType</span>.<span style="font-weight:bold;color:purple;">Custom1</span>:
                    {
                        <span style="color:green;">// Add blanking text floats</span>
                        <span style="color:blue;">var</span> headerBlank = <span style="color:blue;">new</span> <span style="color:darkblue;">Aspose</span>.<span style="color:darkblue;">Pdf</span>.<span style="color:darkblue;">FloatingBox</span>((<span style="color:blue;">float</span>)pageRect.<span style="color:purple;">Width</span>, 50)
                                          {
                                              <span style="color:purple;">Top</span> = -pageMargin.<span style="color:purple;">Top</span>,
                                              <span style="color:purple;">Left</span> = -pageMargin.<span style="color:purple;">Left</span>,
                                              <span style="color:purple;">BackgroundColor</span> = outputDocument.<span style="color:purple;">Pages</span>[startPage].<span style="color:purple;">Background</span>
                                          };

                        <span style="color:blue;">var</span> footerBlank = <span style="color:blue;">new</span> <span style="color:darkblue;">Aspose</span>.<span style="color:darkblue;">Pdf</span>.<span style="color:darkblue;">FloatingBox</span>((<span style="color:blue;">float</span>)pageRect.<span style="color:purple;">Width</span>, 50)
                                          {
                                              <span style="color:purple;">Top</span> = pageRect.<span style="color:purple;">Height</span> - 80 - pageMargin.<span style="color:purple;">Top</span>, 
                                              <span style="color:purple;">Left</span> = -pageMargin.<span style="color:purple;">Left</span>, 
                                              <span style="color:purple;">BackgroundColor</span> = outputDocument.<span style="color:purple;">Pages</span>[startPage].<span style="color:purple;">Background</span>
                                          };

                        <span style="color:blue;">foreach</span> (<span style="color:blue;">var</span> <span style="font-weight:bold;">pageNum</span> <span style="color:blue;">in</span> pageRange)
                        {
                            outputDocument.<span style="color:purple;">Pages</span>[<span style="font-weight:bold;">pageNum</span>].<span style="color:purple;">Paragraphs</span>.<span style="color:darkcyan;">Add</span>(headerBlank);
                            outputDocument.<span style="color:purple;">Pages</span>[<span style="font-weight:bold;">pageNum</span>].<span style="color:purple;">Paragraphs</span>.<span style="color:darkcyan;">Add</span>(footerBlank);
                        }

                        <span style="color:blue;">break</span>;
                    }
                <span style="color:blue;">case</span> <span style="color:darkblue;">PageType</span>.<span style="font-weight:bold;color:purple;">Custom2</span>:
                    {
                        <span style="color:green;">// This document type doesn't have a header or footer</span>
                        <span style="color:blue;">break</span>;
                    }
                <span style="color:blue;">case</span> <span style="color:darkblue;">PageType</span>.<span style="font-weight:bold;color:purple;">Custom3</span>:
                    {
                        <span style="color:green;">// Add blanking text floats</span>
                        <span style="color:blue;">var</span> headerBlank = <span style="color:blue;">new</span> <span style="color:darkblue;">Aspose</span>.<span style="color:darkblue;">Pdf</span>.<span style="color:darkblue;">FloatingBox</span>((<span style="color:blue;">float</span>)pageRect.<span style="color:purple;">Width</span>, 50)
                                          {
                                              <span style="color:purple;">Top</span> = -pageMargin.<span style="color:purple;">Top</span>,
                                              <span style="color:purple;">Left</span> = -pageMargin.<span style="color:purple;">Left</span>,
                                              <span style="color:purple;">BackgroundColor</span> = <span style="color:darkblue;">Aspose</span>.<span style="color:darkblue;">Pdf</span>.<span style="color:darkblue;">Color</span>.<span style="color:purple;">DarkGreen</span>

// BackgroundColor = outputDocument.Pages[startPage].Background
};

                        <span style="color:blue;">var</span> footerBlank = <span style="color:blue;">new</span> <span style="color:darkblue;">Aspose</span>.<span style="color:darkblue;">Pdf</span>.<span style="color:darkblue;">FloatingBox</span>((<span style="color:blue;">float</span>)pageRect.<span style="color:purple;">Width</span>, 50)
                                          {
                                              <span style="color:purple;">Top</span> = pageRect.<span style="color:purple;">Height</span> - 80 - pageMargin.<span style="color:purple;">Top</span>, 
                                              <span style="color:purple;">Left</span> = -pageMargin.<span style="color:purple;">Left</span>, 

// BackgroundColor = outputDocument.Pages[startPage].Background
BackgroundColor = Aspose.Pdf.Color.DarkBlue
};

                        <span style="color:blue;">foreach</span> (<span style="color:blue;">var</span> <span style="font-weight:bold;">pageNum</span> <span style="color:blue;">in</span> pageRange)
                        {
                            outputDocument.<span style="color:purple;">Pages</span>[<span style="font-weight:bold;">pageNum</span>].<span style="color:purple;">Paragraphs</span>.<span style="color:darkcyan;">Add</span>(headerBlank);
                            outputDocument.<span style="color:purple;">Pages</span>[<span style="font-weight:bold;">pageNum</span>].<span style="color:purple;">Paragraphs</span>.<span style="color:darkcyan;">Add</span>(footerBlank);
                        }

                        <span style="color:blue;">break</span>;
                    }
                <span style="color:blue;">case</span> <span style="color:darkblue;">PageType</span>.<span style="font-weight:bold;color:purple;">Custom4</span>:
                    {
                        <span style="color:blue;">break</span>;
                    }
            }                
        }

        pageInfo.<span style="color:purple;">Pages</span>.<span style="color:darkcyan;">AddRange</span>(pageRange);
    }

    <span style="color:blue;">void</span> <span style="color:darkcyan;">AddHeaderAndFooter</span>(<span style="color:darkblue;">Document</span> document)
    {
        <span style="color:blue;">var</span> pageCount = document.<span style="color:purple;">Pages</span>.<span style="color:purple;">Count</span>;

        <span style="color:green;">// Add page number stamp</span>
        <span style="color:blue;">var</span> pageNumberStamp = <span style="color:blue;">new</span> <span style="color:darkblue;">PageNumberStamp</span>
                                {
                                    <span style="color:purple;">Background</span> = <span style="color:blue;">false</span>,
                                    <span style="color:purple;">Format</span> = <span style="color:#a31515;">"Page # of "</span> + pageCount,
                                    <span style="color:purple;">BottomMargin</span> = 10,
                                    <span style="color:purple;">RightMargin</span> = 10,
                                    <span style="color:purple;">HorizontalAlignment</span> = <span style="color:darkblue;">HorizontalAlignment</span>.<span style="font-weight:bold;color:purple;">Right</span>,
                                    <span style="color:purple;">StartingNumber</span> = 1
                                };
        pageNumberStamp.<span style="color:purple;">TextState</span>.<span style="color:purple;">Font</span> = <span style="color:darkblue;">FontRepository</span>.<span style="color:darkcyan;">FindFont</span>(<span style="color:#a31515;">"Arial"</span>);
        pageNumberStamp.<span style="color:purple;">TextState</span>.<span style="color:purple;">FontSize</span> = 10.0F;
        pageNumberStamp.<span style="color:purple;">TextState</span>.<span style="color:purple;">FontStyle</span> = <span style="color:darkblue;">FontStyles</span>.<span style="font-weight:bold;color:purple;">Bold</span>;
        pageNumberStamp.<span style="color:purple;">TextState</span>.<span style="color:purple;">FontStyle</span> = <span style="color:darkblue;">FontStyles</span>.<span style="font-weight:bold;color:purple;">Italic</span>;
        pageNumberStamp.<span style="color:purple;">TextState</span>.<span style="color:purple;">ForegroundColor</span> = <span style="color:darkblue;">Aspose</span>.<span style="color:darkblue;">Pdf</span>.<span style="color:darkblue;">Color</span>.<span style="color:purple;">Black</span>;

        <span style="color:blue;">foreach</span> (<span style="color:blue;">var</span> <span style="font-weight:bold;">rawPage</span> <span style="color:blue;">in</span> document.<span style="color:purple;">Pages</span>)
        {
            <span style="color:blue;">var</span> page = <span style="font-weight:bold;">rawPage</span> <span style="color:blue;">as</span> <span style="color:darkblue;">Page</span>;

            <span style="color:blue;">if</span> (page != <span style="color:blue;">null</span>)
            {
                <span style="color:blue;">var</span> pageRect = page.<span style="color:darkcyan;">GetPageRect</span>(<span style="color:blue;">true</span>);

                page.<span style="color:darkcyan;">AddStamp</span>(pageNumberStamp);
            }
        }
    }</pre></div><div><br></div><div>Along with the three input PDFs, I have also attached the two outputs I am getting at the moment - one with the floating box additions to Custom3 commented out and one with the floating boxes added.</div><div><br></div><div>Lastly, the three input files were generated from Microsoft Word documents as test documents. Testing on the live documents has the same effect.</div><div><br></div><div>Thanks again for your help.</div><div><br></div><div>regards,</div><div>John.</div>

Hi John,


Thanks for sharing your sample code, we are looking into it and will update you soon.

Best Regards,