Multiple intermittent issues while using TextDevice and TextAbsorber

We are using Aspose.PDF 23.12.0

We have functionalities of extracting text from pdf page and extracting texts from particular location on pdf.
When multiple threads/calls performs any of these operation we start seeing multiple issues, we were able to reproduce these, but these are intermittent.
Please use the below code to reproduce this, please note this happens intermittently. Please run this test multiple times to reproduce this.

        [TestMethod]
        public async Task FailureTest()
        {
            byte[] document1 = File.ReadAllBytes("any.pdf");
            Stream documentStream1 = new MemoryStream(document1);
            var d1 = documentStream1.CloneStream();
            var d2 = documentStream1.CloneStream();
            var d3 = documentStream1.CloneStream();
            var d4 = documentStream1.CloneStream();

            var t1 = Task.Run(() => GetTextFromAllPages(d1));
            var t2 = Task.Run(() => GetTextFromAllPages(d2));
            var t3 = Task.Run(() => GetTextFromAllPages(d3));
            var t4 = Task.Run(() => GetTextFromAllPages(d4));

            await Task.WhenAll(t1, t2, t3, t4).ConfigureAwait(false);
        }

        IEnumerable<string> GetTextFromAllPages(Stream stream)
        {
            Aspose.Pdf.Document pdfDocument = new Aspose.Pdf.Document(stream);

            List<string> pdfTextPages = [];
            
            int pagesCount = pdfDocument.Pages.Count;

            for (int pageNo = 1; pageNo <= pagesCount; pageNo++)
            {
                string pageContent = GetTextFromPage(pdfDocument, pageNo);
                pdfTextPages.Add(pageContent);
            }
            return pdfTextPages;
        }

        string GetTextFromPage(Aspose.Pdf.Document pdfDocument, int pageNumber)
        {
            TextDevice textDevice = new TextDevice();
            using MemoryStream memoryStream = new MemoryStream();
            textDevice.Process(pdfDocument.Pages[pageNumber], memoryStream);
            return Encoding.Unicode.GetString(memoryStream.ToArray());
        }

These are exceptions that we see with TextDevice and similar issues while using TextAbsorber as well

Message
The given key 'F182' was not present in the dictionary.

Stack
System.Collections.Generic.KeyNotFoundException:
   at System.ThrowHelper.ThrowKeyNotFoundException (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at #=zJdMgRdgt7hHT99I6mofkJ4Ka0BKS.#=zC96MGVX_Nm4j (Aspose.PDF, Version=23.12.0.0, Culture=neutral, PublicKeyToken=716fcc553a201e56)
   at #=zJdMgRdgt7hHT99I6mofkJ4Ka0BKS.#=zC96MGVX_Nm4j (Aspose.PDF, Version=23.12.0.0, Culture=neutral, PublicKeyToken=716fcc553a201e56)
   at #=z3AXh6KtNIg7GyS66nJhaeAtBVrFyLOstuZb$15TlByUnbvLiIWwczUs=.#=zRDfXBFM= (Aspose.PDF, Version=23.12.0.0, Culture=neutral, PublicKeyToken=716fcc553a201e56)
   at #=zxClYwO_U2PJlynuaEcNw5652YA8c0Rf2O$$aXrjpp3xUMyC9rg==.#=znmJrq5_VTFgO (Aspose.PDF, Version=23.12.0.0, Culture=neutral, PublicKeyToken=716fcc553a201e56)
   at #=zxClYwO_U2PJlynuaEcNw5652YA8c0Rf2O$$aXrjpp3xUMyC9rg==..ctor (Aspose.PDF, Version=23.12.0.0, Culture=neutral, PublicKeyToken=716fcc553a201e56)
   at #=z3AXh6KtNIg7GyS66nJhaeAtBVrFyLOstuZb$15TlByUnbvLiIWwczUs=.#=zaPSsevQ9HaUG (Aspose.PDF, Version=23.12.0.0, Culture=neutral, PublicKeyToken=716fcc553a201e56)
   at #=zKgNmlTRvtujzeIt4Ydrv4yDm6YtRp0$N_bi1hQnb$IjGdumeHPw7O$4=.#=ze$9nGDdz$bKx (Aspose.PDF, Version=23.12.0.0, Culture=neutral, PublicKeyToken=716fcc553a201e56)
   at #=zKgNmlTRvtujzeIt4Ydrv4yDm6YtRp0$N_bi1hQnb$IjGdumeHPw7O$4=.#=zA26DcZQ= (Aspose.PDF, Version=23.12.0.0, Culture=neutral, PublicKeyToken=716fcc553a201e56)
   at #=zKgNmlTRvtujzeIt4Ydrv4yDm6YtRp0$N_bi1hQnb$IjGdumeHPw7O$4=.#=zgLgkm1Q= (Aspose.PDF, Version=23.12.0.0, Culture=neutral, PublicKeyToken=716fcc553a201e56)
   at #=zivHMpuGKu7N7gWkY4coAwuGuV0Ego6tGbh_LVgpJLNpUCdh9cqaHsCUWhBCF.#=zbq4mz$RiuRix (Aspose.PDF, Version=23.12.0.0, Culture=neutral, PublicKeyToken=716fcc553a201e56)
   at #=zivHMpuGKu7N7gWkY4coAwuGuV0Ego6tGbh_LVgpJLNpUCdh9cqaHsCUWhBCF.#=zbq4mz$RiuRix (Aspose.PDF, Version=23.12.0.0, Culture=neutral, PublicKeyToken=716fcc553a201e56)
   at #=zivHMpuGKu7N7gWkY4coAwuGuV0Ego6tGbh_LVgpJLNpUCdh9cqaHsCUWhBCF.#=z2C5DQ9o= (Aspose.PDF, Version=23.12.0.0, Culture=neutral, PublicKeyToken=716fcc553a201e56)
   at #=zivHMpuGKu7N7gWkY4coAwuGuV0Ego6tGbh_LVgpJLNpUCdh9cqaHsCUWhBCF..ctor (Aspose.PDF, Version=23.12.0.0, Culture=neutral, PublicKeyToken=716fcc553a201e56)
   at Aspose.Pdf.Text.TextAbsorber.Visit (Aspose.PDF, Version=23.12.0.0, Culture=neutral, PublicKeyToken=716fcc553a201e56)
   at Aspose.Pdf.Devices.TextDevice.Process (Aspose.PDF, Version=23.12.0.0, Culture=neutral, PublicKeyToken=716fcc553a201e56)
   at PdfExtractor.AsposePDF.Document.Extraction.PDFAsposeToTextExtraction.GetTextFromPage 
Message
Unable to cast object of type '#=zrvz0M4saau7$S$szxPz4ZBwtvaz$lQoDFw==' to type '#=zKDi6weuPF0KFdCxX7iW747kgoqM9uYCliA=='.

Stack
System.InvalidCastException:
   at System.Runtime.CompilerServices.CastHelpers.ChkCast_Helper (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at #=z4FTteMyuNTbEeQOt3Yq8dXzAtmW8Rwcxuy6dGsb$K9nNG7OUzw==.#=z$FnbquzaX7Mt (Aspose.PDF, Version=23.12.0.0, Culture=neutral, PublicKeyToken=716fcc553a201e56)
   at Aspose.Pdf.Operators.SelectFont.#=zktsDd9U= (Aspose.PDF, Version=23.12.0.0, Culture=neutral, PublicKeyToken=716fcc553a201e56)
   at Aspose.Pdf.Operator..ctor (Aspose.PDF, Version=23.12.0.0, Culture=neutral, PublicKeyToken=716fcc553a201e56)
   at Aspose.Pdf.Operators.TextOperator..ctor (Aspose.PDF, Version=23.12.0.0, Culture=neutral, PublicKeyToken=716fcc553a201e56)
   at Aspose.Pdf.Operators.TextStateOperator..ctor (Aspose.PDF, Version=23.12.0.0, Culture=neutral, PublicKeyToken=716fcc553a201e56)
   at Aspose.Pdf.Operators.SelectFont..ctor (Aspose.PDF, Version=23.12.0.0, Culture=neutral, PublicKeyToken=716fcc553a201e56)
   at #=z4FTteMyuNTbEeQOt3Yq8dXzAtmW8Rwcxuy6dGsb$K9nNG7OUzw==.#=zoUnE0zs= (Aspose.PDF, Version=23.12.0.0, Culture=neutral, PublicKeyToken=716fcc553a201e56)
   at Aspose.Pdf.OperatorCollection.#=zBjqwizIAYq8o (Aspose.PDF, Version=23.12.0.0, Culture=neutral, PublicKeyToken=716fcc553a201e56)
   at Aspose.Pdf.OperatorCollection.get_Count (Aspose.PDF, Version=23.12.0.0, Culture=neutral, PublicKeyToken=716fcc553a201e56)
   at #=zKgNmlTRvtujzeIt4Ydrv4yDm6YtRp0$N_bi1hQnb$IjGdumeHPw7O$4=.#=zgLgkm1Q= (Aspose.PDF, Version=23.12.0.0, Culture=neutral, PublicKeyToken=716fcc553a201e56)
   at #=zivHMpuGKu7N7gWkY4coAwuGuV0Ego6tGbh_LVgpJLNpUCdh9cqaHsCUWhBCF.#=zbq4mz$RiuRix (Aspose.PDF, Version=23.12.0.0, Culture=neutral, PublicKeyToken=716fcc553a201e56)
   at #=zivHMpuGKu7N7gWkY4coAwuGuV0Ego6tGbh_LVgpJLNpUCdh9cqaHsCUWhBCF.#=zbq4mz$RiuRix (Aspose.PDF, Version=23.12.0.0, Culture=neutral, PublicKeyToken=716fcc553a201e56)
   at #=zivHMpuGKu7N7gWkY4coAwuGuV0Ego6tGbh_LVgpJLNpUCdh9cqaHsCUWhBCF.#=z2C5DQ9o= (Aspose.PDF, Version=23.12.0.0, Culture=neutral, PublicKeyToken=716fcc553a201e56)
   at #=zivHMpuGKu7N7gWkY4coAwuGuV0Ego6tGbh_LVgpJLNpUCdh9cqaHsCUWhBCF..ctor (Aspose.PDF, Version=23.12.0.0, Culture=neutral, PublicKeyToken=716fcc553a201e56)
   at Aspose.Pdf.Text.TextAbsorber.Visit (Aspose.PDF, Version=23.12.0.0, Culture=neutral, PublicKeyToken=716fcc553a201e56)
   at Aspose.Pdf.Devices.TextDevice.Process (Aspose.PDF, Version=23.12.0.0, Culture=neutral, PublicKeyToken=716fcc553a201e56)
   at PdfExtractor.AsposePDF.Document.Extraction.PDFAsposeToTextExtraction.GetTextFromPage 
Message
Index was out of range. Must be non-negative and less than the size of the collection. (Parameter 'index')
Stack
System.ArgumentOutOfRangeException:
   at System.ThrowHelper.ThrowArgumentOutOfRange_IndexException (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at #=z4FTteMyuNTbEeQOt3Yq8dXzAtmW8Rwcxuy6dGsb$K9nNG7OUzw==.#=z$FnbquzaX7Mt (Aspose.PDF, Version=23.12.0.0, Culture=neutral, PublicKeyToken=716fcc553a201e56)
   at Aspose.Pdf.Operators.SelectFont.#=zktsDd9U= (Aspose.PDF, Version=23.12.0.0, Culture=neutral, PublicKeyToken=716fcc553a201e56)
   at Aspose.Pdf.Operator..ctor (Aspose.PDF, Version=23.12.0.0, Culture=neutral, PublicKeyToken=716fcc553a201e56)
   at Aspose.Pdf.Operators.TextOperator..ctor (Aspose.PDF, Version=23.12.0.0, Culture=neutral, PublicKeyToken=716fcc553a201e56)
   at Aspose.Pdf.Operators.TextStateOperator..ctor (Aspose.PDF, Version=23.12.0.0, Culture=neutral, PublicKeyToken=716fcc553a201e56)
   at Aspose.Pdf.Operators.SelectFont..ctor (Aspose.PDF, Version=23.12.0.0, Culture=neutral, PublicKeyToken=716fcc553a201e56)
   at #=z4FTteMyuNTbEeQOt3Yq8dXzAtmW8Rwcxuy6dGsb$K9nNG7OUzw==.#=zoUnE0zs= (Aspose.PDF, Version=23.12.0.0, Culture=neutral, PublicKeyToken=716fcc553a201e56)
   at Aspose.Pdf.OperatorCollection.#=zBjqwizIAYq8o (Aspose.PDF, Version=23.12.0.0, Culture=neutral, PublicKeyToken=716fcc553a201e56)
   at Aspose.Pdf.OperatorCollection.get_Count (Aspose.PDF, Version=23.12.0.0, Culture=neutral, PublicKeyToken=716fcc553a201e56)
   at #=zKgNmlTRvtujzeIt4Ydrv4yDm6YtRp0$N_bi1hQnb$IjGdumeHPw7O$4=.#=zgLgkm1Q= (Aspose.PDF, Version=23.12.0.0, Culture=neutral, PublicKeyToken=716fcc553a201e56)
   at #=zivHMpuGKu7N7gWkY4coAwuGuV0Ego6tGbh_LVgpJLNpUCdh9cqaHsCUWhBCF.#=zbq4mz$RiuRix (Aspose.PDF, Version=23.12.0.0, Culture=neutral, PublicKeyToken=716fcc553a201e56)
   at #=zivHMpuGKu7N7gWkY4coAwuGuV0Ego6tGbh_LVgpJLNpUCdh9cqaHsCUWhBCF.#=zbq4mz$RiuRix (Aspose.PDF, Version=23.12.0.0, Culture=neutral, PublicKeyToken=716fcc553a201e56)
   at #=zivHMpuGKu7N7gWkY4coAwuGuV0Ego6tGbh_LVgpJLNpUCdh9cqaHsCUWhBCF.#=z2C5DQ9o= (Aspose.PDF, Version=23.12.0.0, Culture=neutral, PublicKeyToken=716fcc553a201e56)
   at #=zivHMpuGKu7N7gWkY4coAwuGuV0Ego6tGbh_LVgpJLNpUCdh9cqaHsCUWhBCF..ctor (Aspose.PDF, Version=23.12.0.0, Culture=neutral, PublicKeyToken=716fcc553a201e56)
   at Aspose.Pdf.Text.TextAbsorber.Visit (Aspose.PDF, Version=23.12.0.0, Culture=neutral, PublicKeyToken=716fcc553a201e56)
   at Aspose.Pdf.Devices.TextDevice.Process (Aspose.PDF, Version=23.12.0.0, Culture=neutral, PublicKeyToken=716fcc553a201e56)
   at PdfExtractor.AsposePDF.Document.Extraction.PDFAsposeToTextExtraction.GetTextFromPage
Message
An item with the same key has already been added. Key: F171
Stack
System.ArgumentException:
   at System.ThrowHelper.ThrowAddingDuplicateWithKeyArgumentException (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Collections.Generic.Dictionary`2.TryInsert (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at #=zJdMgRdgt7hHT99I6mofkJ4Ka0BKS.#=zC96MGVX_Nm4j (Aspose.PDF, Version=23.12.0.0, Culture=neutral, PublicKeyToken=716fcc553a201e56)
   at #=zJdMgRdgt7hHT99I6mofkJ4Ka0BKS.#=zC96MGVX_Nm4j (Aspose.PDF, Version=23.12.0.0, Culture=neutral, PublicKeyToken=716fcc553a201e56)
   at #=z3AXh6KtNIg7GyS66nJhaeAtBVrFyLOstuZb$15TlByUnbvLiIWwczUs=.#=zRDfXBFM= (Aspose.PDF, Version=23.12.0.0, Culture=neutral, PublicKeyToken=716fcc553a201e56)
   at #=zxClYwO_U2PJlynuaEcNw5652YA8c0Rf2O$$aXrjpp3xUMyC9rg==.#=znmJrq5_VTFgO (Aspose.PDF, Version=23.12.0.0, Culture=neutral, PublicKeyToken=716fcc553a201e56)
   at #=zxClYwO_U2PJlynuaEcNw5652YA8c0Rf2O$$aXrjpp3xUMyC9rg==..ctor (Aspose.PDF, Version=23.12.0.0, Culture=neutral, PublicKeyToken=716fcc553a201e56)
   at #=z3AXh6KtNIg7GyS66nJhaeAtBVrFyLOstuZb$15TlByUnbvLiIWwczUs=.#=zaPSsevQ9HaUG (Aspose.PDF, Version=23.12.0.0, Culture=neutral, PublicKeyToken=716fcc553a201e56)
   at #=zKgNmlTRvtujzeIt4Ydrv4yDm6YtRp0$N_bi1hQnb$IjGdumeHPw7O$4=.#=ze$9nGDdz$bKx (Aspose.PDF, Version=23.12.0.0, Culture=neutral, PublicKeyToken=716fcc553a201e56)
   at #=zKgNmlTRvtujzeIt4Ydrv4yDm6YtRp0$N_bi1hQnb$IjGdumeHPw7O$4=.#=zA26DcZQ= (Aspose.PDF, Version=23.12.0.0, Culture=neutral, PublicKeyToken=716fcc553a201e56)
   at #=zKgNmlTRvtujzeIt4Ydrv4yDm6YtRp0$N_bi1hQnb$IjGdumeHPw7O$4=.#=zgLgkm1Q= (Aspose.PDF, Version=23.12.0.0, Culture=neutral, PublicKeyToken=716fcc553a201e56)
   at #=zivHMpuGKu7N7gWkY4coAwuGuV0Ego6tGbh_LVgpJLNpUCdh9cqaHsCUWhBCF.#=zbq4mz$RiuRix (Aspose.PDF, Version=23.12.0.0, Culture=neutral, PublicKeyToken=716fcc553a201e56)
   at #=zivHMpuGKu7N7gWkY4coAwuGuV0Ego6tGbh_LVgpJLNpUCdh9cqaHsCUWhBCF.#=zbq4mz$RiuRix (Aspose.PDF, Version=23.12.0.0, Culture=neutral, PublicKeyToken=716fcc553a201e56)
   at #=zivHMpuGKu7N7gWkY4coAwuGuV0Ego6tGbh_LVgpJLNpUCdh9cqaHsCUWhBCF.#=z2C5DQ9o= (Aspose.PDF, Version=23.12.0.0, Culture=neutral, PublicKeyToken=716fcc553a201e56)
   at #=zivHMpuGKu7N7gWkY4coAwuGuV0Ego6tGbh_LVgpJLNpUCdh9cqaHsCUWhBCF..ctor (Aspose.PDF, Version=23.12.0.0, Culture=neutral, PublicKeyToken=716fcc553a201e56)
   at Aspose.Pdf.Text.TextAbsorber.Visit (Aspose.PDF, Version=23.12.0.0, Culture=neutral, PublicKeyToken=716fcc553a201e56)
   at Aspose.Pdf.Devices.TextDevice.Process (Aspose.PDF, Version=23.12.0.0, Culture=neutral, PublicKeyToken=716fcc553a201e56)
   at PdfExtractor.AsposePDF.Document.Extraction.PDFAsposeToTextExtraction.GetTextFromPage

@cpaperless
CloneStream is Extension method?
Please provide its code too.

this is a simple stream cloning code.

here you go

public static class Extension
{
    public static Stream CloneStream(this Stream stream)
    {
        stream.Position = 0;
        MemoryStream newStream = new MemoryStream();
        stream.CopyTo(newStream);
        newStream.Position = 0;
        stream.Position = 0;
        return newStream;
    }

}

@ sergei.shibanov

@cpaperless
Thank you.
I will look the issue and write to you on Monday.

@cpaperless
In my environment, when using the library version 24.06, the given code with the attached document worked without exceptions (I also designed it as a test and set it to run until failure - 709 test runs passed, after which I turned off the check).
InternalOrPublic.pdf (463.2 KB)

Please check the work for this document using the library version 24.06 in your environment and write about the result.

@sergei.shibanov
I am able to reproduce with the version that you have mentioned.
This is the file that I am using.

Please run the same test with this file.

1120SCAO19_2019_ArchiveTaxReturn.pdf (479.6 KB)

@cpaperless
In the .Net 6, Windows 10 environment, when implementing the given code in the NUnit test, I did not get an exception for the given document (500 test runs).
What do you use (sdk, OS, test framework)?

@sergei.shibanov
We are using net8.0 for the test project
MSTest is the test framework
Windows 11 32GB 8CPUs

I am able to reproduce this with 1-2 runs

@sergei.shibanov
Based on the stack trace, can we diagnose when this exception could be thrown?

@cpaperless

This is related to the font key. Unfortunately, the parameter value is not shown and it is unclear what value is causing this.

This is probably related to a different set of fonts in our environments. However, to check, please run the attached console application (setting your paths to the license and the document being processed) and let us know if you get an exception and please provide the list of fonts you get.

(Google Drive, something is not working, so I put the archive on another file sharing service)
Do you get the error if you do not use multithreading and run only one thread?

@sergei.shibanov
running the same test that I shared earlier without the task just works fine.
But in our application, multiple requests can be made by multiple users.
That is where we face this issue, looks like the underlying collection these fonts are stored on are not thread safe?

I will try the given app and let you know findings, thank you.

@cpaperless

Yes, this is a very likely reason.
Formally, we do not guarantee correct work on one document in multi-threaded mode. However, we are trying to get away from this limitation, and therefore I would like to reproduce this error so that I can create a task for the development team.
I will wait for information from you.

@sergei.shibanov
Looks like you aren’t awaiting the Task.WhenAll and that’s why it is completing without any issue.

Please make your Main method async and await the Task.WhenAll.

I am able to repro this with your console app.

@cpaperless
Thank you for writing to us and giving to me tips on how to reproduce the problem - I reproduced it and created a task for the development team.

@cpaperless
We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.

Issue ID(s): PDFNET-57536

You can obtain Paid Support Services if you need support on a priority basis, along with the direct access to our Paid Support management team.

@sergei.shibanov can you let us know if the underlying collection is static?
or if it is bound to an object?

and if there is a possibility we can fill this collection(possibly in a thread safe way) before any of the operation takes place(in this case extraction of text)?

@cpaperless

I agree that is one of possible reasons.

It is necessary to determine when accessing which data structures the violations occur and it is unknown whether there is public access to them.
These are still the tasks of the development team, we will wait for information from them. But perhaps I will also look, suddenly these are really accessible lists. Because the work itself is essentially carried out with copies of documents.
Thank you for the tips.

@cpaperless

I looked, and it was so - I changed Dictionary to ConcurrentDictionary. The change will be available in version 24.07, which will be released in the next week or two. Since I’m not sure that this is the only problem, I won’t close the task; if you still have problems, write to paid support or here. I’ll close it later if there are no comments.

@cpaperless
The release of Aspose.PDF for .NET 24.7 has been published.
I will be waiting for comments from you.

The issues you have found earlier (filed as PDFNET-57536) have been fixed in Aspose.PDF for .NET 24.8.