PDF Optimize Size Problem with not embedded Fonts

mario.ferrante · March 9, 2018, 10:03am

Hi,

i’m using Sharepoint and Aspose PDF to make two-step PDF conversion.

Step 1: i convert a word document using sharepoint server without any optimization.
Sharepoint produces a correct and reliable PDF conversion but it’s size is very big either using settings to optimize it, such as unembed fonts, reduce image quality, etc. so i decided to do not use optimization at all. This conduced me to
Step 2: i use Aspose to optimize pdf file size. Here follows the code snippet

// Load PDF file
using (Aspose.Pdf.Document doc = new Aspose.Pdf.Document(tmpPdfFilePath))
{
doc.OptimizeResources(new Document.OptimizationOptions()
{
LinkDuplcateStreams = true,
RemoveUnusedObjects = true,
RemoveUnusedStreams = true,
CompressImages = true,
UnembedFonts = true,
ImageQuality = 90
});
// Save updated document
doc.Optimize();
doc.OptimizeSize = true;
doc.Save(outputStream);
}

The result is that the file is correctly optimized and size is widely reduced but i get the following error if i open it using Acrobat Reader:
“Cannot find or create the font ‘ABCDEE+Calibri’. Some characters may not display or print correctly”

Acrobat is able to open the document but some part of it are unreadable.

Most of all the problem is that the font is a normal Calibri System and it does not change in the other parts of the documents where it is correctly displayed.

Attached the file i used to test the case.
ASPOSE…PDF v17.11

Regards

Test PDF FONT.zip (139.7 KB)

Farhan.Raza · March 9, 2018, 8:58pm

@mario.ferrante

Thank you for contacting support.

I have worked with the data shared by you and have been able to reproduce the issue in our environment. A ticket with ID PDFNET-44348 has been logged in our issue management system for further investigation and resolution. The issue ID has been linked with this thread so that you will receive notification as soon as the issue is resolved.

However, you can use below code snippet in your environment to avoid the issue. Size of PDF file generated with this code snippet is larger than size of problematic PDF file but it resolves the problem until the logged ticket is investigated and resolved.

        String inputFilePath = dataDir + "Step 1 - sharepoint converted.pdf";
        String outputFilePath = dataDir + "Aspose.PDF_18.2.pdf";
        // Open document
        Document document = new Document(inputFilePath);

        // Create TextAbsorber object to get all phrases
        TextFragmentAbsorber textFragmentAbsorber = new TextFragmentAbsorber();

        // Accept the absorber for all pages
        document.Pages.Accept(textFragmentAbsorber);

        // Get the extracted text fragments
        TextFragmentCollection textFragmentCollection = textFragmentAbsorber.TextFragments;

        // Loop through the fragments
        foreach (TextFragment textFragment in textFragmentCollection)
        {
            textFragment.TextState.Font = FontRepository.FindFont("Calibri");
        }
            
        document.OptimizeResources(new Document.OptimizationOptions()
        {
            LinkDuplcateStreams = true,
            RemoveUnusedObjects = true,
            RemoveUnusedStreams = true,
            CompressImages = true,
            UnembedFonts = true,
            ImageQuality = 90,
        });
        // Save updated document
        document.Optimize();
        document.OptimizeSize = true;  
        document.Save(outputFilePath);

I hope this will be helpful. Please feel free to let us know if you need any further assistance.

mario.ferrante · March 13, 2018, 11:08am

Hi,

thanks for your reply. As far as i can see the workaround is closely related to the ‘Calibri’ font, but the font type could differs from each document to another and potentially many fonts could be included in a document.
Customers expect pdf conversion as much reliable as possible, for this reason i can’t use your snippet above.

There is at least a way to detect that the conversion of a document will (or has) generate the wrong PDF. If it is the case i could be able to decide to skip that document and optimize the others.

I hope the issue that you opened will be fixed as soon as possible. My company purchased a full licence of Aspose.Pdf, is it possible to get higher level support priority in this case?

Farhan.Raza · March 13, 2018, 6:22pm

@mario.ferrante

I would like to share with you that, apparently the problem pertains to displaying and printing of characters, because characters are present in PDF file just fine. This can also be verified by exporting generated PDF file to a DOC or TXT file, with Adobe Acrobat. Everything is fine except displaying some characters so this does not make a PDF invalid and thus can not be detected.

Moreover, the issue reported by you has been logged in our issue management system a few days ago and is pending for analysis. Our product team has been busy with previously logged issues and your ticket will be scheduled on its due turn. We appreciate your patience and comprehension in this regard.

However, we also offer Paid Support, where issues are used to be investigated with higher priority. Our customers, who have paid support subscription, report their issue there which are meant to be investigated urgently. In case your reported issue is a blocker, you may please consider subscribing for Paid Support. For further information, please visit Paid Support FAQ.

aspose.notifier · May 13, 2018, 8:55pm

The issues you have found earlier (filed as PDFNET-44348) have been fixed in Aspose.PDF for .NET 18.5. This message was posted using BugNotificationTool by asad.ali