Dear ladies and gentlemen,
In our application we have to combine several (eg 100) PDF files.
To optimize the resulting file we use this code:
pdfFileEditor.Concatenate(pdfStreams.ToArray(), packPdf);
packPdf.Seek(0, SeekOrigin.Begin);
var pdfDocument = new Document(packPdf);
foreach (Page page in pdfDocument.Pages)
{
var idx = 1;
foreach (XImage image in page.Resources.Images)
{
using (var imageStream = new MemoryStream())
{
image.Save(imageStream, ImageFormat.Jpeg);
imageStream.Seek(0, SeekOrigin.Begin);
page.Resources.Images.Replace(idx, imageStream);
}
idx = idx + 1;
}
}
// optimize the file size
pdfDocument.Optimize();
pdfDocument.OptimizeSize = true;
pdfDocument.OptimizeResources(new Document.OptimizationOptions
{
RemoveUnusedStreams = true,
RemoveUnusedObjects = true,
LinkDuplcateStreams = true
});
// save updated File
pdfDocument.Save(newPdfFileName);
After this optimization the size of the created pdf file is still too large. The cause of this is due to the fonts used in the source files.
The font definitions (/font-Dictionary and dependent objects) were taken for each original document into the target file.
If the /FileFonts2 streams are identical, only one stream was saved. If the streams are not identical, no union set of all required characters was formed.
The difference in file size, with an average size of the streams of 18.5 KB, is about 3 MB.
Is there a way to summarize the fonts efficiently?
Is such an implementation planed?
Best regards
Kind Regards,
Oliver
Hi Oliver,
//array of streams<o:p></o:p>
FileStream[] pdfStreams = new FileStream[3];
pdfStreams[0] = new FileStream("c:/pdftest/Bescheid_1_to_59_neu.pdf", FileMode.Open);
pdfStreams[1] = new FileStream("c:/pdftest/Bescheid_1_to_59_neu - Copy.pdf", FileMode.Open);
pdfStreams[2] = new FileStream("c:/pdftest/Bescheid_1_to_59_neu - Copy (2).pdf", FileMode.Open);
MemoryStream packPdf = new MemoryStream();
Aspose.Pdf.Facades.PdfFileEditor pdfFileEditor = new PdfFileEditor();
pdfFileEditor.Concatenate(pdfStreams.ToArray(), packPdf);
packPdf.Seek(0, SeekOrigin.Begin);
var pdfDocument = new Document(packPdf);
foreach (Page page in pdfDocument.Pages)
{
var idx = 1;
foreach (XImage image in page.Resources.Images)
{
using (var imageStream = new MemoryStream())
{
image.Save(imageStream, System.Drawing.Imaging.ImageFormat.Jpeg);
imageStream.Seek(0, SeekOrigin.Begin);
page.Resources.Images.Replace(idx, imageStream);
}
idx = idx + 1;
}
}
// optimize the file size
pdfDocument.Optimize();
pdfDocument.OptimizeSize = true;
pdfDocument.OptimizeResources(new Document.OptimizationOptions
{
RemoveUnusedStreams = true,
RemoveUnusedObjects = true,
LinkDuplcateStreams = true,
AllowReusePageContent=true
});
// save updated File
pdfDocument.Save(“c:/pdftest/OptimizedFile.pdf”);
Hi,
if you concatenate the 3 copies the result is as expected. The font streams of each copy are identical, so the number of font-streams in the target file is equal to the number of font-streams in one of the copied files.
Our problem is the font-streams are not combined even if it is the same font. This is increasing the number of font-streams and as a result the file size.
Perhaps the attached files help to understand what I mean. On the old way the Single-PDFs were extracted from one big file (oldway.pdf). The new way generates single PDFs (generated1.pdf-generated5.pdf) and concatenate these to one file (newway.pdf). We need both the single and combined PDF.
The size of the new single PDF files have increased because these are PDF/A files. The main Problem seems to be the increased number of font-streams. Even with Aspose.PDF v10.2.0 and the use of the property AllowReusePageContent the file size is too large. I think un-embedding custom fonts is no option because of the PDF/A standard.
Best regards
Oliver
Hi Oliver,
I
have tested the scenario and I am able to reproduce the same problem. For the
sake of correction, I have logged it in our issue tracking system as PDFNEWNET-38425. We
will investigate this issue in details and will keep you updated on the status
of a correction. <o:p></o:p>
We apologize for your inconvenience.
Hi,
any news on this topic for me?
kind regards,
Oliver
Hi Oliver,
Hi,
any news for me on this topic. We are waiting for a solution, we need to help our customers. Could you please check, and give me a date for the solution.
Kind regards,
Oliver
Hi Oliver,
Hi again,
could you please check again with the developers for an ETA?
thanks,
Kind regards,
Olive
Hi Oliver,
PdfFileEditor pfe = new
PdfFileEditor();<o:p></o:p>
int[][] pagesToExtract = new int[][] { new int[] { 1, 3 }, new int[] { 5 }, new int[] { 7, 9 }, new int[] { 11 }, new int[] { 13, 15 } };
for(int i = 0; i < 5; i++)
{
pfe.Extract("oldway.pdf", pagesToExtract[i], "38425-generated" + (i + 1) + ".pdf");
}
FileStream[] pdfStreams = new FileStream[5];
pdfStreams[0] = new FileStream("38425-generated1.pdf", FileMode.Open);
pdfStreams[1] = new FileStream("38425-generated2.pdf", FileMode.Open);
pdfStreams[2] = new FileStream("38425-generated3.pdf", FileMode.Open);
pdfStreams[3] = new FileStream("38425-generated4.pdf", FileMode.Open);
pdfStreams[4] = new FileStream("38425-generated5.pdf", FileMode.Open);
Aspose.Pdf.Facades.PdfFileEditor pdfFileEditor = new PdfFileEditor();
FileStream outStream = new FileStream("38425-concatenated.pdf", FileMode.Create, FileAccess.ReadWrite);
pdfFileEditor.Concatenate(pdfStreams, outStream);
outStream.Close();
Document doc = new Document("38425-concatenated.pdf");
doc.OptimizeResources(new Document.OptimizationOptions
{
RemoveUnusedStreams = true,
RemoveUnusedObjects = true,
LinkDuplcateStreams = true,
AllowReusePageContent = true
});
doc.Save("38425-optimized.pdf");
Dear ladies and gentlemen,
thanks for your reply but I think there was a misunderstanding. The files generated1.pdf to generated5.pdf have been generated by our application in
a new way and have been combined to newway.pdf.
The file oldway.pdf is the result of the old generation to show you how small the result was with the old generation.
As you can see is the difference in size quite large.
What we need now is a way to combine the newly generated files (generated1.pdf - generated5.pdf) so that the size of newway.pdf is reduced (similar to oldway.pdf).
The attached source file is a part of our used code.
Thanks in advance.
Best regards
Oliver
Hi Oliver,
Hi,
do you have any information from your product-team for me concerning this issue?
thank you and kind regards,
Oliver
Hi Oliver,
In other words: “The font streams of each copy are identical” is not quite correct; fonts are different in generated documents because these fonts are subsets.
Hi,
<use full version of the same font (not subset) in every of generated files? This may give a chance to use this font only once in concatenated files>
we changed our program and were ablre to reduze the pdf-size quite good. This is solved, Thanks for your help.
Kind regards,
Oliver
Hi Oliver,
The issues you have found earlier (filed as PDFNEWNET-38425) have been fixed in Aspose.Pdf for .NET 10.9.0.
This message was posted using Notification2Forum from Downloads module by Aspose Notifier.
The issues you have found earlier (filed as ) have been fixed in this update. This message was posted using BugNotificationTool from Downloads module by MuzammilKhan