An exception is thrown when trying to save a scanned PDF to a MemoryStream using SaveAsync

Hi!

Software (incl. versions) where issue can be reproduced:
We’re using NUGET package “Aspose.PDF.Diagram” version 25.9.0 on a NET 8 project for copying a scanned PDF to a MemoryStream asynchronously using Document.SaveAsync using the following code:

         public static Stream GetSavedDocumentStream(Document document)
        {
            Stream ms = new MemoryStream();

            document.Optimize();
            
            int documentSaveTimeout = 5; 
            using (CancellationTokenSource cts = new CancellationTokenSource(TimeSpan.FromMinutes(documentSaveTimeout)))
            {
                try
                {
                    document.SaveAsync(ms, SaveFormat.Pdf, cts.Token).Wait(cts.Token);
                }
                catch (OperationCanceledException)
                {
                    var errorMessage = $"Failed to save file within {documentSaveTimeout} minutes.";
                    throw new ArgumentException(errorMessage);
                }
            }

            ms.Seek(0, SeekOrigin.Begin);
            return ms;
        }

        static void Main(string[] args)
        {
            // ... Code that attaches Aspose licence

            using (FileStream iFile = new FileStream(@"Data\scanned_pdf_sample.pdf"), FileMode.Open, FileAccess.Read))
            {
                var doc = new Document(iFile);

                Stream stream = GetSavedDocumentStream(doc);

                using (FileStream file = new FileStream(@"Data\out.pdf"), FileMode.Create, FileAccess.Write))
                {
                    stream.CopyTo(file);
                }
            }
        }

We have found that for some PDFs that are created as a scan from an camera an exception is thrown when calling Document.SaveAsync:

System.AggregateException: One or more errors occurred. (Object reference not set to an instance of an object.)
 ---> #=zBxE_$C2240IBr0_b3DA93v_jNYQMHAP3Yhdxxk8=: Object reference not set to an instance of an object.
   at #=zrKTGF7rDX5fnfllNNHkXCQESOjZDUSXrDA==.#=zSB1dLm4=.#=zWKABNYBZfwx4wE3VwQ==.MoveNext()

The problem can be reproduced:
With the latest Aspose.PDF.Drawing version 25.9.0 using the provided code and this PDF
scanned_pdf_sample.pdf (107,0 KB)

Based on some experimentation it seems that this problem was introduced with NUGET package Aspose.PDF.Drawing version 25.7.0, as earlier versions do not exhibit the same exception. Does not seem that Document.Optimize call has any difference whether the exception is thrown and not. The synchronous method Document.Save does not have this exception in latest version, however, we must use the asynchronous method SaveAsync as at times the method can hang and so in our use case we must use the asynchronous method to give up if it takes too long.

Thanks in advance!

@troubledog

It appears you are encountering an exception when attempting to save a scanned PDF to a MemoryStream using the SaveAsync method in the Aspose.PDF library. The error message suggests a potential null reference issue, which may be related to the specific content or structure of the scanned PDF.

Since you noted that this problem began with version 25.7.0 of the Aspose.PDF.Drawing package, it could indicate a regression or bug introduced in that version. The fact that the synchronous Save method works without issues suggests that the problem is specific to the asynchronous implementation.

Here are some steps you can take to troubleshoot and potentially resolve the issue:

  1. Check for Updates: Ensure you are using the latest version of the Aspose.PDF library. Newer releases may contain fixes for known issues.

  2. Use Synchronous Save as a Workaround: If the asynchronous method is not critical for your application, consider using the synchronous Save method as a temporary workaround.

  3. Simplify the Document: If feasible, try to simplify the scanned PDF (e.g., reduce its complexity or size) to see if that allows the SaveAsync method to complete successfully.

  4. Enhance Error Handling: Improve your error handling to capture more details about the exception. This may provide additional insights into the underlying issue.

  5. Provide Sample Files: Since this appears to be a potential SDK defect, I will share your findings with the product team for further investigation. If you have a sample file that consistently reproduces the issue, please provide that for analysis.

Here’s a modified version of your code that uses the synchronous Save method as a fallback:

public static Stream GetSavedDocumentStream(Document document)
{
    Stream ms = new MemoryStream();
    document.Optimize();

    int documentSaveTimeout = 5; 
    using (CancellationTokenSource cts = new CancellationTokenSource(TimeSpan.FromMinutes(documentSaveTimeout)))
    {
        try
        {
            document.SaveAsync(ms, SaveFormat.Pdf, cts.Token).Wait(cts.Token);
        }
        catch (OperationCanceledException)
        {
            throw new ArgumentException($"Failed to save file within {documentSaveTimeout} minutes.");
        }
        catch (Exception)
        {
            // Fallback to synchronous save if async fails
            document.Save(ms, SaveFormat.Pdf);
        }
    }

    ms.Seek(0, SeekOrigin.Begin);
    return ms;
}

This code attempts to save asynchronously and falls back to synchronous saving if an exception occurs.

If you continue to experience issues, please let me know, and I can assist further.

The provided solutions are not adequate for our application.

The asynchronous method is critical for the application, so we cannot use the synchronous method as fallback as that has completely blocked threads on the application on some files for our clients before.

The PDF files are sent in by our clients, so we cannot make any changes to them.

The sample file that can be used to reproduce this issue in the latest version of Aspose.PDF.Drawing library is this:
scanned_pdf_sample.pdf (107,0 KB)

@troubledog

Would you kindly share complete stack trace information that you are seeing in your environment? Please also share which OS are you using e.g. Name, Version, etc. We will log an investigation ticket and share the ID with you.

Can reproduce both running the .NET application on Windows 11 PRO version 23H2 and on Linux Alpine version 3.20.0.

The complete stacktrace:

Unhandled exception. System.AggregateException: One or more errors occurred. (Object reference not set to an instance of an object.)
 ---> #=zBxE_$C2240IBr0_b3DA93v_jNYQMHAP3Yhdxxk8=: Object reference not set to an instance of an object.
   at #=zrKTGF7rDX5fnfllNNHkXCQESOjZDUSXrDA==.#=zSB1dLm4=.#=zWKABNYBZfwx4wE3VwQ==.MoveNext()
--- End of stack trace from previous location ---
   at #=zYuTRjC5haReAzPc8N2LDliB0zNAHyB_2iTsRKdo=.#=zxUCpEcvR2jylg5S7AQ==.MoveNext()
--- End of stack trace from previous location ---
   at #=z6xIjDrqEpoNmaqQGgPp_RYWR18TVXlp7Sw==.#=zSB1dLm4=.#=zkjPSlU0kHdoDIJmojg==.MoveNext()
--- End of stack trace from previous location ---
   at #=z6xIjDrqEpoNmaqQGgPp_RYWR18TVXlp7Sw==.#=zSB1dLm4=.#=zkjPSlU0kHdoDIJmojg==.MoveNext()
--- End of stack trace from previous location ---
   at #=zYuTRjC5haReAzPc8N2LDliB0zNAHyB_2iTsRKdo=.#=zxUCpEcvR2jylg5S7AQ==.MoveNext()
--- End of stack trace from previous location ---
   at #=zvZGaZgJi1EHWDf$YxQzELrazex938bZoHn9Un70lHu5H2eadIU5ANtz8ZrI02xe53g==.#=zh_brsb3gEvRaryCx4g==.MoveNext()
--- End of stack trace from previous location ---
   at #=zvZGaZgJi1EHWDf$YxQzELrazex938bZoHn9Un70lHu5H2eadIU5ANtz8ZrI02xe53g==.#=zQdiVFiCLCRZ_5R5Ep6$UP5H_bstE.MoveNext()
--- End of stack trace from previous location ---
   at #=zgEtn9w7Ein38ugMGL6hgakekflCLXd5qrA==.#=zzgZ6pHBsP05EGI9VEA==.MoveNext()
--- End of stack trace from previous location ---
   at #=ztwQljWiHgCGZzTGZVjCPbqa8iyCo.#=z70lCVlAuUoGEfeyVhw==.MoveNext()
--- End of stack trace from previous location ---
   at Aspose.Pdf.Document.#=zFRuLVawp6YW_pkbAVxygxmk=.MoveNext()
--- End of stack trace from previous location ---
   at Aspose.Pdf.Document.#=zgNtx4VLInFVd_AV9rmLV$ME=.MoveNext()
--- End of stack trace from previous location ---
   at Aspose.Pdf.Document.#=z4jPOALTDEBuDk8qRBTCTQIQ=.MoveNext()
   --- End of inner exception stack trace ---
   at System.Threading.Tasks.Task.ThrowIfExceptional(Boolean includeTaskCanceledExceptions)
   at System.Threading.Tasks.Task.Wait(Int32 millisecondsTimeout, CancellationToken cancellationToken)
   at System.Threading.Tasks.Task.Wait(CancellationToken cancellationToken)
   at AsposeTest.Program.GetSavedDocumentStream(Document document) in aspose-test\AsposeTest\AsposeTest\Program.cs:line 24
   at AsposeTest.Program.Main(String[] args) in aspose-test\AsposeTest\AsposeTest\Program.cs:line 45

@troubledog

We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.

Issue ID(s): PDFNET-61169

You can obtain Paid Support Services if you need support on a priority basis, along with the direct access to our Paid Support management team.