Aspose.Pdf (C#) Replace subsets of SymbolMT font with full embedded font

Our process stitches hundreds of PDFs that may or may not have bullet points and a few other non-letter characters. When this happens, each of these PDFs will include a subset of the SymbolMT font, so when we stitch them, the file gets bloated with hundreds of these subsets.
The printers we use for our project tend to run out of memory fairly quickly, so we’re currently using iText to unembed these subsets and embed one full version of the font from a TTF file embedded into our application, but we are trying to migrate to Aspose.Pdf and I cannot find a way to do this.

@jterroba

To replace subsets of the SymbolMT font with a full embedded version in your PDFs using Aspose.Pdf, you can utilize the TextFragmentAbsorber class along with TextEditOptions. This allows you to search for text fragments that use the SymbolMT font and replace them with the full font from a TTF file.

Here’s a step-by-step approach you can follow:

  1. Load the PDF Document: Open the PDF file that you want to modify.
  2. Create a TextFragmentAbsorber: This will help you find all text fragments that use the SymbolMT font.
  3. Set TextEditOptions: Use the FontReplace option to specify that you want to replace the font.
  4. Iterate Through Text Fragments: For each text fragment found, set the font to the full version of the SymbolMT font.
  5. Save the Document: Finally, save the modified PDF.

Here’s a sample code snippet to illustrate this:

using Aspose.Pdf;
using Aspose.Pdf.Text;

public void ReplaceSymbolMTFont(string inputFilePath, string outputFilePath, string fullFontPath)
{
    // Load the PDF document
    Document document = new Document(inputFilePath);

    // Create TextFragmentAbsorber to find all text fragments
    TextFragmentAbsorber absorber = new TextFragmentAbsorber();
    
    // Set text edit options to replace fonts
    TextEditOptions options = new TextEditOptions(TextEditOptions.FontReplace.RemoveUnusedFonts);
    absorber.TextEditOptions = options;

    // Accept the absorber for all pages
    document.Pages.Accept(absorber);

    // Iterate through all text fragments
    foreach (TextFragment textFragment in absorber.TextFragments)
    {
        // Check if the font is SymbolMT
        if (textFragment.TextState.Font.FontName == "SymbolMT")
        {
            // Replace with the full font
            textFragment.TextState.Font = FontRepository.FindFont(fullFontPath);
        }
    }

    // Save the modified document
    document.Save(outputFilePath);
}

Make sure to replace fullFontPath with the actual path to your full SymbolMT TTF file. This approach will effectively reduce the number of font subsets in your stitched PDFs and improve performance on your printers.

If you have any further questions or need additional assistance, please let me know!

Thank you for the blazing fast response, let me try the code you’ve provided and I will reply back with results

EDIT: Didn’t quite work out of the box because of how I have the font file (as an embedded resource in the assembly) so here’s my version, which worked*

/// <summary>
/// Replace existing font subsets with a provided font from an embedded resource file 
/// </summary>
/// <typeparam name="T">Type of the assembly that will have the font file embedded into</typeparam>
/// <param name="document">Aspose.Pdf <see cref="Document"/> to process</param>
/// <param name="fullFontPath">Full path to the embedded resource font file.</param>
public static void ReplaceFontSubsets<T>(this Document document, String fullFontPath)
{

    // Create TextFragmentAbsorber to find all text fragments
    TextFragmentAbsorber absorber = new TextFragmentAbsorber();

    // Set text edit options to replace fonts
    TextEditOptions options = new TextEditOptions(TextEditOptions.FontReplace.RemoveUnusedFonts);
    absorber.TextEditOptions = options;

    // Accept the absorber for all pages
    document.Pages.Accept(absorber);

    // Iterate through all text fragments
    foreach (TextFragment textFragment in absorber.TextFragments)
    {
        // Check if the font is SymbolMT
        if (textFragment.TextState.Font.FontName.Contains("SymbolMT"))
        {
            //Load the embedded resource font file from the assembly
            var assembly = Assembly.GetAssembly(typeof(T));
            var fontStream = assembly?.GetManifestResourceStream(fullFontPath);

            // Replace with the full font
            //textFragment.TextState.Font = FontRepository.FindFont(fullFontPath);
            textFragment.TextState.Font = FontRepository.OpenFont(fontStream, FontTypes.TTF);

        }
    }
}

* The result still had the Symbol font as an embedded subset, but only once, which isn’t what I originally asked for, but for all intents and purposes does what I wanted. It also now has one instance of an embedded subset of the Arial font, which I’m not sure where it came from. Again, not breaking, but it wasn’t there in the original.

@jterroba

It looks like you were able to sort out the issue that you were facing. Please feel free to create a new topic in case you need any kind of assistance.

1 Like