Insufficient memory when converting searchable pdf

Hi team,

We noticed that when converting searchable PDFs, memory usage became uncontrollable. We have a pdf file with 20 pages and the file size is 5.43MB. In order to reduce memory usage, we split the pdf into single pages and then convert it to searchable pdf. One of the pages is a floor plan of the house. Similar to this image: 11.png (35.3 KB). When converting this page, memory usage will be unusually high.

The following the sample code

        Document.CallBackGetHocr cbgh = bufferedImage -> {
            try {
                Logger.Log("return empty HOCR XML");
                return EMPTY_HOCR_XML;
            } catch (Exception e) {
                Logger.Error(e);
                return EMPTY_HOCR_XML;
            } finally {
                bufferedImage.flush();
                bufferedImage = null;
                System.gc();
            }
        };

        for (int i = 0; i < singlePageFilePathList.size(); i++) {
            String singlePageFilePath = singlePageFilePathList.get(i);
            try (Document doc = new Document(singlePageFilePath)){
                doc.convert(cbgh);
                doc.save(singlePageFilePath);
            } catch (Exception e) {
                Logger.Error(e);
            }
        }

We have a machine with 2GB free memory. But before we see the log of this line Logger.Log(“return empty HOCR XML”);, the 2GB free memory has been used up. When we switch to a machine with more free memory, we can see the log of this line Logger.Log(“return empty HOCR XML”);, but it appears more than 5,000 times, so we interrupt the process. We saved each BufferedImage as a tiff file, and the images in more than 5,000 tiff files are all the same:
tiff.png (18.8 KB)

Maybe there really are so many identical border in that page, but is there any way to reduce the memory usage?

Thanks

@Rich_Yu

We need to investigate this case in order to further determine the issue. Can you please confirm if you are using the latest version of the API? Also, please share your sample PDF document for our reference so that we can test the scenario in our environment and address it accordingly.

Yes, we tested it with aspose-pdf-23.10 but it still ran out of memory.
Attached is the sample PDF:
sample.pdf (5.4 MB)

@Rich_Yu

We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.

Issue ID(s): PDFNET-55874

You can obtain Paid Support Services if you need support on a priority basis, along with the direct access to our Paid Support management team.

Hi Team,

Since we are using the JAVA version, could you please file a ticket to the JAVA library as well?

Thanks!

@Rich_Yu
We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.

Issue ID(s): PDFJAVA-43285

You can obtain Paid Support Services if you need support on a priority basis, along with the direct access to our Paid Support management team.