Add PDF Header Performance Issue and PDF size

Hi,


I added header and footer to exist PDF document, I found the execution of AddStamp(hStamp) takes almost 800 ms (AddStamp(pdfStamp) is ok, it takes tens ms), we have tens pages pdf document, due to customer requirement, each page will have different header, this will take too long to finish, following is the codes:

byte[] headerImage.;
byte[] footerImage ;
byte[] bodyPDF =…;
using (MemoryStream inputStream = new MemoryStream(bodyPDF))
{
Document doc = new Document(inputStream);
Document newDoc = new Document();
foreach (Aspose.Pdf.Page page in doc.Pages)
{
Aspose.Pdf.Page newPage = newDoc.Pages.Add();
headerImage = …
footerImage = …
newPage.SetPageSize(page.Rect.Width, page.Rect.Height + hHeight + fHeight);
using(MemoryStream imageStream1 = new MemoryStream(headerImage))
{
var hStamp = new ImageStamp(imageStream1);
hStamp.HorizontalAlignment = HorizontalAlignment.Center;
hStamp.VerticalAlignment = VerticalAlignment.Top;
// hStamp.Background = false;
newPage.AddStamp(hStamp);
duration = Environment.TickCount - start;
}
var pdfStamp = new PdfPageStamp(page);
pdfStamp.TopMargin = hHeight;
pdfStamp.HorizontalAlignment = HorizontalAlignment.Center;
pdfStamp.VerticalAlignment = VerticalAlignment.Top;
newPage.AddStamp(pdfStamp);
I tried
hStamp.Background = false;
with no help, did I miss something?

Another problem is original bodyPDF is 193 KB (10 pages), and the Header is 17 KB, after add the Header, the final PDF is about: 2000 KB, how it can be?

Thank you in advance

Jack

Hi Jack,<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" />

Thank you for the details.

To further investigate your issue, we would request you to share your template PDF file and Stamp file with us to help us reproduce both of your reported issues at our end. This will help us in identifying the cause of the issue soon.

We are sorry for the inconvenience,

Hi Nausherwan,

Thank you for the prompt response!

Attached is the Header PDF file and PDFBody itself, the codes to read the pdf like this:

byte[] headerImage=File.ReadAllBytes("Header.pdf");

byte[] bodyPDF =File.ReadAllBytes("BodyPDF.pdf");;

...

Thanks,

Jack

Hi Jack,<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" />

Thank you for sharing the files.

Regarding the performance issue while adding Image Stamp, can you please share the image files you are using for creating the ImageStamp. Currently, I used an image file of my own and it hardly takes 300 ms to complete the while process of stamping and file generation.

JackCui:

Another problem is original bodyPDF is 193 KB (10 pages), and the Header is 17 KB, after add the Header, the final PDF is about: 2000 KB, how it can be?

I am able to reproduce the issue you mentioned. Your issue has been registered in our issue tracking system with issue id: PDFNEWNET-34578. We will notify you via this forum thread regarding any updates.

Sorry for the inconvenience,

Hi Nausherwan,

Following codes and the attached PDF files is what I used for performance testing, I run it in console application and can showed how long it takes to call AddStamp, this test using a low resolution image, if we use high resolution image(I attached as HighResolutionHeaderPDF.pdf), the result will be even worse.

Note: both HighResolutionHeaderPDF.pdf and HeaderPDF.pdf can not be opened by Adobe, as it is image, it can be used and opened using above codes

Let me know if you need more information

Thanks,

Jack

double hHeight = 0;

byte[] headerImage = File.ReadAllBytes("d:\\HF\\HeaderPDF.pdf");

byte[] bodyPDF = File.ReadAllBytes("d:\\HF\\BodyPDF.pdf");
try
{
hHeight = 128.4;

using (MemoryStream inputStream = new MemoryStream(bodyPDF))
{
Document doc = new Document(inputStream);
Document newDoc = new Document();
foreach (Aspose.Pdf.Page page in doc.Pages)
{
Aspose.Pdf.Page newPage = newDoc.Pages.Add();
newPage.SetPageSize(page.Rect.Width, page.Rect.Height + hHeight + 42.5);
int start, duration;
using (MemoryStream imageStream1 = new MemoryStream(headerImage))
{
var hStamp = new ImageStamp(imageStream1);
hStamp.HorizontalAlignment = HorizontalAlignment.Center;
hStamp.VerticalAlignment = VerticalAlignment.Top;
start = Environment.TickCount;
newPage.AddStamp(hStamp);
duration = Environment.TickCount - start;
Console.WriteLine("Add Header image took: " + duration.ToString() + " ms");
}
start = Environment.TickCount;
var pdfStamp = new PdfPageStamp(page);
pdfStamp.TopMargin = hHeight;
pdfStamp.HorizontalAlignment = HorizontalAlignment.Center;
pdfStamp.VerticalAlignment = VerticalAlignment.Top;
newPage.AddStamp(pdfStamp);
duration = Environment.TickCount - start;
Console.WriteLine("Add body pdf took: " + duration + " ms");
}
Console.ReadKey();
using (MemoryStream outputStream = new MemoryStream())
{
newDoc.Save(outputStream);
// return outputStream.ToArray(),
}

}

}
catch (Exception ex)
{
}

Hi Jack,<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" />

Thank you for sharing the files.

I tested your code and at my end, it is only taking 234 ms (maximum) to add header to a page in the PDF file. Please confirm which version of Aspose.PDF for .NET are you using? If you are using an older version, please download and try the latest version of Aspose.Pdf for .NET v7.5 and share the results with us.

Sorry for the inconvenience,

Hi Nausherwan,

I tried the latest V7.5, there is no difference with what we used: V7.3, the concern is we have to use the better resolution one(the file:HighResolutionHeaderPDF.pdf) I attached, if we combine tens pages of such PDF document, it will take too long to wait.

You further help is appreciated

Jack

Hi Jack,<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" />

Thank you for the details.

I have created an issue in our issue tracking system with issue id: PDFNEWNET-34582 for further investigation by the development team regarding performance improvement. We will notify you via this forum thread regarding any updates.

Sorry for the inconvenience,

Thanks Nausherwan, please be advised this is a little high priority to us, as we will release a SP and this issue already bring to our customer's attentions

Again thank you for help

Jack

And do you have some time line about this so we can better manage our schedule?

Thanks,

Jack

Hi Jack,<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" />

Well, as we are just able to replicate your problem, I am afraid we cannot provide the ETA without a detailed analysis of the development team. Our development team will require some time to analyze this issue and share their feedback. We will update you once we have any details. You may also check the following link for different support options in case you need to escalate your issue:

http://www.aspose.com/corporate/services/default.aspx

Sorry for the inconvenience,

Hi Jack,


Thanks for your patience.

We have further investigated this issue and as per our observations, the file which you have shared contains several fonts in every page resources. Each page has it’s own resource dictionary, but fonts of different pages refers to the same object. For example both 1st and 2nd pages have font FAAAI (Calibri Bold) in their resources and this font refers to object 74 in the document. Thus, source document has only 4 unique fonts resources.

The same situation is with images, patterns etc in the resources of source document. You are creating new document with number of pages equal to original document and adding every page of original document as a PDF page stamp to appropriate page of new document. When page of source documents is used as stamp for destination document, all resources of original page are copied to resources of the Form XObject which is created for page of destination document.

Since every PDF page stamp is different object, that’s why links between resources of different page are lost, in other words, the page stamp for every page has it’s own set of resources. Therefore, every page resource is added in document as different object and this causes growth of size occupied by resources up to 10 times (because document has 10 pages). This explains why document size increases.

In order to resolve the huge size issue, please try using OptimizeResources(…) method before saving the PDF file, which will allow to “re-link” all equal resources again.

newDoc.OptimizeResources();
newDoc.Save(@“D:\pdftest\Header_Stamped.pdf”);

As per my observations, when I have tested the scenario using Aspose.Pdf for .NET 7.6.0, the output size of document is 230KB.

OR

You may use PdfPageEditor.MovePosition to change position of the page (this method do the same thing as customer tries do with stamps, but difference is that this method operates “inside” of the same document and that’s why objects links are not lost in that case)
Possible scenario is:

[C#]

PdfPageEditor ppe = new
PdfPageEditor();<o:p></o:p>

ppe.BindPdf("c:/pdftest/header.pdf");

//change size of the page: (note: this will work only if all pages has equal size)

Aspose.Pdf.PageSize ps = ppe.GetPageSize(1);

ps.Height += 400;

ppe.PageSize = ps;

//move pages positions:

ppe.MovePosition(0, 200);

ppe.Save("34578-1.pdf");

PdfFileStamp pfs = new PdfFileStamp();

pfs.BindPdf(TestSettings.GetOutputFile("34578-1.pdf"));

Aspose.Pdf.Facades.Stamp stamp = new Aspose.Pdf.Facades.Stamp();

stamp.BindImage(TestSettings.GetInputFile("34578.jpg"));

pfs.AddStamp(stamp);

pfs.Save(TestSettings.GetOutputFile(“34578-1A.pdf”));

Thanks for the professional guide, it's help solve both the file size and the link issue!

How about the performance issue? do you have some suggestion? each page we have different PDF header and footer, so we have to call addstamp for many times, which consumes time!

Thanks again,

Jack

Hi Jack,


As you need to add different Header/Footer on each page, so you need to create a separate object. Please try using our products and in case you face any performance related issue, please share the sample project so that we can test the scenario at our end.