PDF to PDF/A conversion with embedding fonts (subsets or all fonts) / Huge output file size

Hi, before buying a license we want to do some tests and one of them is converting PDF to PDF / A. Our goal is to convert a PDF file (with or without embedded fonts) to compatible PDF / A standard document (with all embedded fonts or subsets). We tried the presented code and found that after the conversion, file size increases many times (from 286KB to 35MB), we expected 5-10% growth depending on the size of the embedded fonts used. Are there any additional settings for the conversion process or file compression so that the output file size is acceptable?
[InputFile.pdf](https://github.com/aspose-free-consulting/projects/files/4493129/InputFile.pdf)
Output file can not be attached because of it’s size (more than 10MB)

`Aspose.Pdf.Document pdfDocument = new Aspose.Pdf.Document(file);
pdfDocument.Convert(input + “validationLog.xml”, PdfFormat.PDF_A_2B, ConvertErrorAction.Delete);

//Linearization
pdfDocument.Optimize();
//Save doc
pdfDocument.Save(output);`

@dfasergey,

We have tested the scenario in our environment and were able to notice the issue. We have logged it as PDFNET-48040 in our issue tracking system. We will further look into details of the issue and keep you posted with the status of its correction. Please be patient and spare us little time.

We are sorry for the inconvenience.

Thank you, please keep me posted with issue status

@dfasergey,

Sure, we will inform you.

@Adnan.Ahmad, is there any news on my request? We’re still waiting for a solution, because we consider Aspose.PDF as a potential product for use in our workflow.

@dfasergey,

I like to inform this issue has been added recently in our issue tracking system and as per our company policy, the first priority for investigation is given to the Paid Support i.e. Enterprise and Priority Support on first come first serve basis. After that the issues from normal support forum are scheduled for investigation on first come first serve basis. I request for your patience and we will share good news with you soon.

Dear support,

My colleague Robert Fokkema had some email correspondence with Sabina from sales on this matter with you in the CC.

Sabrina asked us to indicate in the forum that having this issue resolved is making a difference if we will purchase your product or not.

We understand the priorities you set, but we would like to have a clear understanding if we will be able to get a solution in short time or that we will have to wait months before this is being addressed.

Please let us know if you can get this issue address urgently, so we can make a decision if we will continue investigating using your product or not.

@dfasergey,

I like to infrom that this issue will take time to get resolved so if you are entitled to priority support than please visit Paid support helpdesk to get your issue resolve as soon as possible.

@dfasergey,

I have seen that email and I’m looking into this matter. Let me see how quickly we can resolve this issue and I’ll get back to you with appropriate information soon.

@dfasergey,

Can you please try to use following sample code on your end. This will help you to achieve your requirements. Please share feedback with us if there is still an issue.

       Document pdfDocument = new Document(fileName));
            pdfDocument.Convert("log.xml, PdfFormat.PDF_A_2B, ConvertErrorAction.Delete);
            pdfDocument.Save("intermediate-result.pdf");
            Document doc = new Document("intermediate-result.pdf");
            doc.OptimizeResources(
                new OptimizationOptions()
                {
                    RemoveUnusedObjects =  true,
                        RemoveUnusedStreams = true, 
                        LinkDuplcateStreams = true
                });
            doc.Save("result.pdf");

Thanks for the sample code, I tried to use it, and the size of the output file is much smaller than it was before, but still too large (the initial file size is 286 KB, the file size before the new settings is 35 111 KB, the file size with the new settings is 6 857 KB), This is progress, but I want to know if there is any additional possibility to reduce the file size (maybe some additional compression parameters)?

@dfasergey,

We are looking into this further and will update you regarding this issue as soon as possible.

@dfasergey

We have further investigated the earlier logged ticket from the perspective of subsetting fonts in the PDF document. OptimizeResources() method allows to subset fonts in the document but unfortunately, an exception occurred when this flag was applied to the document with RemoveUnusedStreams and RemoveUnusedResources.

Nevertheless, we would like to suggest you use the following workaround (i.e. apply optimization in two steps) in order to optimize your source document and obtain better results:

Document pdfDocument = new Document(dataDir + "InputFile.pdf");
pdfDocument.Convert(new MemoryStream(), PdfFormat.PDF_A_2B, ConvertErrorAction.Delete);

MemoryStream tmp1 = new MemoryStream();
pdfDocument.Save(tmp1);
Document doc = new Document(tmp1);
doc.OptimizeResources(
                new Aspose.Pdf.Optimization.OptimizationOptions()
                {
                    SubsetFonts = true
                });
MemoryStream tmp2 = new MemoryStream();
doc.Save(tmp2);


Document doc2 = new Document(tmp2);
doc2.OptimizeResources(
new Aspose.Pdf.Optimization.OptimizationOptions()
                {
                    RemoveUnusedObjects = true,
                    RemoveUnusedStreams = true,
                    LinkDuplcateStreams = true
                });
doc2.Save(dataDir + "result2.pdf");

result2.pdf (436.5 KB)

You can notice that the output file size has been reduced to 436KB. We will surely investigate the ticket further to determine whether PDF file size can be reduced more or not and how the exception can be prevented. Also, we will work on simplifying the above-shared code snippet and share our feedback with you as soon as the ticket is resolved. Please spare us some time.

CN105503283B.pdf (6.8 MB)
CN105787103B.pdf (8.0 MB)
OriginalFiles.zip (4.4 MB)

Hi, I checked the suggested code on our samples (OriginalFiles.zip) and the output file is still more than double the original. Can we expect a further reduction in file size to an acceptable size when using Aspose.PDF? - we expect an increase in file size within 20-50% of the original. It is ok to have ~200KB increase for a file with original size 286KB, but for a file whose original size is 1.3mb (CN105787103B), the increase is more than 7mb (8.4mb is the size of the output file) does not look so good.

This came from Robert via email
We tested this and noticed that we get mixed results.

In our test case we run 1K records which started with 870 MB, but after the process were 2.980 MB. We were expecting an increase, but maybe 50%, so around 435 MB increase to 1.305 MB. Following are 4 records as example:

CN105269770B.pdf 292 KB becomes 447KB

CN104651516B.pdf 4 MB becomes 11 MB

CN105503283B.pdf 640 KB becomes 7 MB

CN105787103B.pdf 1.3 MB becomes 8.4 MB

CN105269770B is the PDF we supplied before that became huge, but is now acceptable (smaller is always better) with just an increase of 155 KB, do roughly 50% increase due to the fonts, optimizing, etc. which would be acceptable.

The other 3 however still show huge increases of 2 to 10 times the original size.

I have asked our developer to supply these in the forum for testing on your side.

Time wise, to convert these 1000 took around 10 hours, so the speed is also not too great. Will it be possible to run it parallel on the same system?


This Topic is created by shahzadlatif using Email to Topic tool.

@dfasergey

Thanks for your feedback and providing further details.

We have logged these details along with the ticket and will surely investigate. Please note that the issue seems related to these specific type of documents that you have shared and it needs to be further investigated. We will definitely share with you as soon as we have some results. Please spare us some time.

We are sorry for the inconvenience.

(the file is shared with the team privately)

This file ended up with 20 embedded fonts, where the initial file showed 13 fonts. This is not that big a problem, but the size did increase with 7 MB, which is not as expected.

We tested this on 1000 records and around 60% end ups in the “acceptable” range of increase (maximum 50% increase); however 15-20 % shows an increase that is not in line with expectations. The remaining are discussable if this is acceptable of not.

I am sure you will be able to find a solution for this why this happens, after which we want to retry the batch we have to see if the problem is resolved. After which I expect we will make the decision on which license to get from you.


This Topic is created by shahzadlatif using Email to Topic tool.

@dfasergey

Thanks for sharing document and elaborating the issue further.

We have logged the received file along with the ticket and will definitely investigate against provided details. We will inform you as soon as we have some updates in this regard. Please spare us some time.

Attached 15 files which score worst in our set. (files shared with the team privately)

File

Original MB

Output MB

TW201836587A.pdf 0.58 7.23

CN108296026B.pdf 0.56 7.27

TW201836601A.pdf 0.56 7.68

TW201836482A.pdf 0.73 10.49

CN107976667B.pdf 0.46 6.62

TW201813759A.pdf 0.49 7.13

CN210277673U.pdf 0.45 6.63

TW201836477A.pdf 0.47 7.11

CN110993098A.pdf 0.42 6.65

TW201836569A.pdf 0.43 7.05

TW201836515A.pdf 0.42 7.02

CN110983825A.pdf 0.41 7.10

CN210294243U.pdf 0.37 6.58

TW201836589A.pdf 0.36 6.50

CN110981990A.pdf 0.28 6.45

All are Asian languages; however we also have many Asian langauge publications where the size increase is just up to 50% as expected


This Topic is created by shahzadlatif using Email to Topic tool.

@dfasergey,

Thanks for sharing document and elaborating the issue further.

We have logged the received file along with the ticket and will definitely investigate against provided details. We will inform you as soon as we have some updates in this regard. Please spare us some time.