PDF to PDF/A conversion with embedding fonts (subsets or all fonts) / Huge output file size

@dfasergey

We have further investigated the earlier logged ticket and found that some fonts are embedded into PDF/A doucment completely instead of as subset to decrease the file size. This full font embedding behavior of the API is due to composite fonts as current PDF/A funcationality in the API does not support creation of subsets of composite fonts.

Furthermore, there are 2 common internal formats of composite fonts i.e. TrueType and compact font formats. Most of your documents include fonts with TrueType internal format for which support is required to be added to the API. Hence, another ticket as PDFNET-48259 has been logged in our issue tracking system for the purpose.

As you see the original ticket PDFNET-48040 is dependant upon the recently logged ticket, we will return to its investigation after implementation of the required feature. We plan to complete the feature implementation at our earliest and as soon as it is implemented, we will investigate new optimized PDF/A file size i.e. whether it suits you or needs another fix. Please spare us some time.

Hello, is there any news about PDFNET-48040 and PDFNET-48259? When can we expect an update that will fix the identified issues?
We would like to perform more tests on sets of documents containing non-TrueType fonts (as you mentioned, this may be the main reason for the increase in file size) and evaluate the results.

@dfasergey

We would like to share with you that both tickets have been resolved and their fix will be included in upcoming version i.e. Aspose.PDF for .NET 20.7. Next version will be available in first week of July 2020 and we will inform you as soon as it is published.

Furthermore, our investigation against both tickets showed that files provided by you (OriginalFiles.zip, PDF_Aspose_sample.zip) can be divided into 2 groups by results of conversion:
1st group - 10 files, which name starts from “CN”,
2nd group - 8 files, which name starts from “TW”

Results for 1st group are good - increase ratio for whole group is ~1.35.

Second group was not tested because PDF/A conversion problems were found for every file from this group. These problems are not related to new functionality and some of these problems are related to deviations from specification for original document, some problems related to special kind of CJK fonts used (may be these problems will not observed on your side), some problems were not investigated.

Any of these problems cannot be investigated/fixed in terms of these logged tickets, so if you will report about problems not related to file size, we will create new tickets for them to be investigated and resolved.

To get effective size compression next flag and special strategy must be set in conversion

options object(Aspose.Pdf.PdfFormatConversionOptions):
OptimizeFileSize(true), ExcludeFontsStrategy(RemoveFontsStrategy.SubsetFonts | RemoveFontsStrategy.RemoveDuplicatedFonts).

Use of these options decreases conversion performance, because an additional time is required to process fonts specifically, but at the same time it decreases file size significantly.

So, code snippet to get PDF/A-2B documents optimized by size, can be like this:

Document pdfDocument = new Document(fileName);
PdfFormatConversionOptions opts = new PdfFormatConversionOptions(PdfFormat.PDF_A_2B, ConvertErrorAction.Delete);
opts.OptimizeFileSize = true;
opts.ExcludeFontsStrategy = RemoveFontsStrategy.SubsetFonts | RemoveFontsStrategy.RemoveDuplicatedFonts;
pdfDocument.Convert(opts);
pdfDocument.Save(newFileName);

The issues you have found earlier (filed as PDFNET-48040) have been fixed in Aspose.PDF for .NET 20.7.

Great news! Thank you for notification, @asad.ali can we get a trial key to check the updates?

@dfasergey

Please consider applying for a 30-days free temporary license in order to evaluate the API without any limitations. In case of further inquiry, please feel free to let us know.