Aspose.PDF for .NET Problem with TextFragmentAbsorber

Hello.
I am creating “Document” from an input stream, I know that is valid pdf document then I am creating “TextFragmentAbsorber” with a regular expression, then “Pages” of the created document “Accept” “TextFragmentAbsorber” and then an exception happens

Object reference not set to an instance of an object.
System.NullReferenceException: Object reference not set to an instance of an object.
   at    .(Operator )
   at    ()
   at   .(BaseOperatorCollection , Resources , Page )
   at   .(BaseOperatorCollection , Resources )
   at   .()
   at Aspose.Pdf.Text.TextFragmentAbsorber.Visit(Page page)

The problem happens when I am running .NET core on Linux x64. The process is described works fine in Windows environment and doesn’t throw an exception.

@nikitamarchenko

Thank you for contacting support.

Would you please create a narrowed down sample application reproducing this issue so that we may try to reproduce and investigate it in our environment. Please mention if this problem occurs with every PDF file or with specific PDF files. Also share the details about Linux version on your side. Before sharing requested data, please ensure using Aspose.PDF for .NET 18.9.1 in your environment.

@Farhan.Raza

The problem occurs with every PDF.
I am using Amazon Linux 2 (Linux 4.14.67-71.56.amzn2.x86_64);
Yes, I am using the latest version of Aspose.PDF(18.9.1).

    class Program
    {
        private const char DefaultReplacementChar = '█';

        static void Main(string[] args)
        {
            var outputPath = "./PDF32000_2008_2.pdf";
            var pdfDocument = new Document(@"./PDF32000_2008.pdf");
            var textFragmentAbsorber = new TextFragmentAbsorber(@"\w+", new TextSearchOptions(true));
            pdfDocument.Pages[1].Accept(textFragmentAbsorber);
            var textFragmentCollection = textFragmentAbsorber.TextFragments;
            foreach (var textFragment in textFragmentCollection)
            {
                textFragment.Text = CreateReplacementString(DefaultReplacementChar, textFragment.Text.Length);
            }

            pdfDocument.Save(outputPath);
        }

        private static string CreateReplacementString(char c, int length)
        {
            var stringBuilder = new StringBuilder();
            for (var i = 0; i < length; i++)
            {
                stringBuilder.Append(c);
            }

            return stringBuilder.ToString();
        }
    }

Document in case needed: https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf

@nikitamarchenko

Thank you for sharing requested information.

We are testing the scenario in our environment and will share our findings with you soon.

@nikitamarchenko

Thank you for being patient.

We have not been able to reproduce the issue in Linux environment. Please ensure installing libgdiplus package as well as MS Core Fonts in your environment and then share your kind feedback with us.

@Farhan.Raza

I did install libgdiplus but for MS Core Fonts since I am using RPM type of Linux, I found this package http://mscorefonts2.sourceforge.net/ and I did install it but since it does not use this installation path “/usr/share/fonts/truetype/msttcorefonts” and using this one “/usr/share/fonts/msttcore” I did specify it in code

FontRepository.Sources.Add(new FolderFontSource("/usr/share/fonts/msttcore"));

but the only thing that I get after I wrote this line of code is an exception

Unhandled Exception:  ​ : Unexpected font parsing exception ---> System.IO.DirectoryNotFoundException: Could not find a part of the path '/usr/share/fonts/truetype/msttcorefonts'.
   at System.IO.Enumeration.FileSystemEnumerator`1.CreateDirectoryHandle(String path, Boolean ignoreNotFound)
   at System.IO.Enumeration.FileSystemEnumerator`1..ctor(String directory, EnumerationOptions options)
   at System.IO.Enumeration.FileSystemEnumerable`1..ctor(String directory, FindTransform transform, EnumerationOptions options)
   at System.IO.Enumeration.FileSystemEnumerableFactory.UserFiles(String directory, String expression, EnumerationOptions options)
   at System.IO.Directory.InternalEnumeratePaths(String path, String searchPattern, SearchTarget searchTarget, EnumerationOptions options)
   at System.IO.Directory.GetFiles(String path)
   at    .       ()
   --- End of inner exception stack trace ---
   at    .       ()
   at   .(FontSource )
   at   .()
   at   .()
   at TextFragmentAbsorberProblem.Program.Main(String[] args) in C:\Users\nikim\source\repos\TextFragmentAbsorberProblem\TextFragmentAbsorberProblem\Program.cs:line 13
Aborted

Could you please provide a guide for RPM-Red-Hat Linux OS family like “Red Hat Enterprise / Fedora Linux / Suse Linux / Cent OS”?

@nikitamarchenko

We are checking this with RPM type of Linux and will get back to you with our findings soon.

@nikitamarchenko

We have managed to reproduce the issue in CentOS 7 x64 with .NET Core 2.1, as in attached screenshot CentOS7x64Core2.1.png. A ticket with ID PDFNET-45504 has been logged in our issue management system for further investigation and resolution. The ticket ID has been linked with this thread so that you will receive notification as soon as the ticket is resolved.

We are sorry for the inconvenience.

Was this resolved? I have a similar issue on Windows.

@MikeOtown

Can you please confirm that all Windows fonts are installed properly in the system? Also, please make sure that you are using Aspose.PDF for .NET 22.12 version of the API. In case issue still persists, please share your sample PDF and complete code snippet with us. We will test the scenario in our environment and address it accordingly.