Aspose.NET.PDF remove duplicate embedded fonts

Hello,

I am concatenating multiple PDF files, up to 1500 separate PDF’s into one PDF. I need to embed the fonts, but only once for each font style.

My issue is that I cannot remove duplicate embedded fonts in my concatenated PDF file. Is there a way to do this in Aspose.NET PDF?

I found this topic from 2007 that seem similar to my issue but no solutions:

I see JAVA has a possibility of removing duplicate fonts, but nothing for .NET as far as I can tell:

I attached an image showing my issue of multiple embedded fonts in one PDF which causes the file size to be extremely large.

Thank you for any help.

image.png (24.1 KB)

@schamma

You can please try optimizing the output PDF and see if it helps in removing the duplicate fonts. In case issue keeps persisting, please share the sample PDF for our reference and we will further proceed to assist you accordingly:

var oo = new Aspose.Pdf.Optimization.OptimizationOptions();
oo.ImageCompressionOptions.ImageQuality = 50;
oo.ImageCompressionOptions.MaxResolution = 300;
oo.ImageCompressionOptions.ResizeImages = true;
oo.ImageCompressionOptions.CompressImages = true;
oo.ImageCompressionOptions.Encoding = Optimization.ImageEncoding.Jpeg;
oo.ImageCompressionOptions.Version = Aspose.Pdf.Optimization.ImageCompressionVersion.Standard;
oo.AllowReusePageContent = true;
oo.RemoveUnusedObjects = true;
oo.RemoveUnusedStreams = true;
oo.LinkDuplcateStreams = true;
oo.SubsetFonts = true;
oo.AllowReusePageContent = true;
oo.UnembedFonts = true;

using (Document document = new Document(dataDir + "FromTXT.pdf"))
{
    document.OptimizeSize = true;
    document.Flatten();
    document.OptimizeResources(oo);
    document.Save(dataDir + "after compression_optimized_TXT.pdf");
}

@asad.ali

Thanks for the reply, looking at your code, you are unembedding the fonts:

oo.UnembedFonts = true;

I want to embed the fonts, but only once. Yes unembedding the fonts makes the PDF size much smaller, but unfortunately I need to embed at least one of each font style. The issue is I don’t need all the extra embedded fonts multiple times.

I tried your code and it removed all of the embedded fonts and the pdf is much smaller, but that doesn’t solve the issue.

I tried to unembed fonts = true for each individual PDF, and then unembed fonts = false after concatenating into one large PDF, but that didn’t work either.

You can try on your own to create multiple PDF files and then try to concatenate them into a single PDF. Then check whether the final PDF embedded each of the font styles once or multiple times, as in my case.

Another question is, when I’m combining multiple PDF’s, should I be optimizing each PDF before combining or just optimize the final PDF after concatenating?

Thank you for any assistance.

image.png (16.4 KB)

@schamma

We are sorry that suggested approach did not suit your needs. Please note that unembedding fonts is a complex process and it typically involves modifying the PDF document and its resources. Also, we could create a PDF but it may not be identical to the type of PDF documents that you are having. It would be helpful for our investigation if you can share any sample PDF document that contains 2 or more duplicate fonts in it. We will proceed with the analysis accordingly.

The recommended approach would be to optimize the resultant PDF that you obtain after concatenation.