Converting a PDF to a bitmap is extremely slow

Hi,

We are loading a specific PDF and then converting it to a bitmap. Until a recent release of Aspose (v24) this was taking 1min 45 sec in the PDFConverter.HasNextImage() call. After that release it’s down to 29 sec. This, however, is still way to slow for what we would expect. Is there a better way to do what we are trying to do? Can this performance be improved? We can go to a simple Web Site and have this file converted in second!

Thanks.

I’ve linked a sample project with the test file that has the problems.

@bsant

We checked in our environment using 24.4 and DOM approach for conversion with below code sinppet:

using (Aspose.Pdf.Document thePdfDocument = new Aspose.Pdf.Document(dataDir + "FDFTDA(11981919)_-_TICKETS_-_DocID_17016051.pdf"))
{
    System.Diagnostics.Stopwatch sw = new System.Diagnostics.Stopwatch();
    sw.Start();
    BmpDevice bmpDevice = new BmpDevice();
    using (MemoryStream pageBitmapMemoryStream = new MemoryStream())
    {
        bmpDevice.Process(thePdfDocument.Pages[1], pageBitmapMemoryStream);
    }
    sw.Stop();
    Console.WriteLine("Total seconds taken : " + sw.Elapsed.TotalSeconds);
}

At first execution it took 2 seconds. However, upon every subsequent run, it took 1.29-1.35 seconds in our environment. Can you please try the above approach and let us know if it still takes more time for you?

Hi,

I’m a dev on the same org working on this issue.

I tried the alternate approach, but I had to edit it to make it loop on all the pages, since we’re interested in converting all the pages at once:

    Public Sub ConvertPDFtoImageDo()
        Using thePdfDocument As New Aspose.Pdf.Document("FDFTDA(11981919)_-_TICKETS_-_DocID_17016051.pdf")
            Dim sw As New Stopwatch()
            sw.Start()
            Dim BmpDevice As New BmpDevice()
            For Each page As Aspose.Pdf.Page In thePdfDocument.Pages
                Using pageBitmapMemoryStream As New MemoryStream()
                    BmpDevice.Process(page, pageBitmapMemoryStream)
                End Using
            Next
            sw.Stop()
            Console.WriteLine("Total seconds taken : " & sw.Elapsed.TotalSeconds)
        End Using
    End Sub

And I compared the time with our current approach for three runs:
#1
GetNextImage total time: 32.315
Convert DOM based Total seconds taken : 28.8354831

#2
GetNextImage total time: 30.619
Convert DOM based Total seconds taken : 28.908888

#3
GetNextImage total time: 31.004
Convert DOM based Total seconds taken : 28.7635003

So, it seems to be only ~1.5 seconds faster.

@Abdelmageed_Mostafa

Would you kindly share the results you used to get with the older version of the API that you were using? Please share below information so that we can proceed accordingly:

  • API version producing expected results
  • System information e.g. OS Name and Version, RAM, Processor, etc.

I compared the two versions of Aspose.PDF again against our approach and the one you proposed, these are the results on my machine for the average of 3 runs:

Aspose 20.10
GetNextImage (our current approach) total time: 35.6372941
Dom Based Convert Time: 33.6049738

Aspose 24.4
GetNextImage (our current approach) total time: 30.8609131
Dom Based Convert Time: 28.6527697

Machine specs:

Device name XXXX
Processor 12th Gen Intel(R) Core™ i5-12400 2.50 GHz
Installed RAM 32.0 GB (31.8 GB usable)
Device ID XXXX
Product ID XXXX
System type 64-bit operating system, x64-based processor

OS:

Edition Windows 11 Pro
Version 22H2
Installed on ‎7/‎16/‎2023
OS build 22621.3447
Experience Windows Feature Experience Pack 1000.22688.1000.0

We are using VB.NET with .Net framework version 4.8 for this project.

@Abdelmageed_Mostafa

It looks like the latest version is giving better performance in both Facades and DOM approaches. However, the results are not similar to what we observed in our environment with below specifications:

  • Windows 11 22H3 Pro x64-bit
  • 16G RAM
  • Console Application - C# - .NET 4.8

Are you sure that no other code routine is being executed during your testing? Are you testing in a separate console application?

Are you sure that no other code routine is being executed during your testing? Are you testing in a separate console application?

Yes, there was no other running code at the time of testing, and it was a separate console application.

Please note that I edited the code you provided for the DOM approach to loop on all the pages on the file: Converting a PDF to a bitmap is extremely slow - #3 by Abdelmageed_Mostafa
We are converting all the pages on the PDF not just the first one.

There is a big improvement on the latest version compared to our current version 20.10, but still not quite what we’re after.
And there is only a marginal improvement when using the DOM approach as I mentioned earlier.

@Abdelmageed_Mostafa

Sorry for the delayed response. One last thing before we log an investigation ticket for this case. Can you please share your expected time you desire to get from the API?

We would like to see a similar performance to this in-browser tool.

This PDF file has only 1 image component per page (and nothing else I presume), we are thinking it shouldn’t take that much time to just extract the already existing image for the conversion.

@bsant

We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.

Issue ID(s): PDFNET-57152

You can obtain Paid Support Services if you need support on a priority basis, along with the direct access to our Paid Support management team.