Memory issues with Java Apose.PDF

eric.w.sachse · June 26, 2017, 6:28pm

We are running Aspose.PDF version 17.4 and we are experiencing a memory leak from Aspose. Our application runs under JBoss EAP 6.3, and it appears that Aspose is managing its memory in ThreadLocal structures. What we have found is that JBoss creates threads on demand based on the volume of inbound web requests. We have found that sometimes Aspose cleans up after itself, but most of the time the internal structures are left bound to a JBoss web thread that JBoss let run to completion.

We are calling the Document.dispose() method after processing each PDF document, but it does not immediately free up memory. We tried added in calls to MemoryCleaner.clear() at the end of each transaction, but that caused a spike in CPU activity that took down our servers during testing. Limiting the calls to MemoryCleaner.clear() does not work, since the call to MemoryCleaner.clear() has to be in the same thread that allocated the code. If JBoss ended the web thread, then the memory is held by Aspose and there is no way to free it.

I have some statistics from the Dynatrace tool that we use to monitor our application in production. I will attach screen shots from the tool.

codewarior · June 27, 2017, 6:42am

@eric.w.sachse,

Thanks for contacting support.

Can you please share some details regarding the scenario in which you are using the API and also please share if the the problem is occurring due to some specific document or it is appearing for all the files.

Furthermore, please share the code snippet and any additional details, so that we can try replicating the issue in our environment. We are sorry for this inconvenience.

PS, From above description, I have also found that memory utilisation increases with every subsequent call to the system.

eric.w.sachse · June 27, 2017, 5:26pm

This is a screenshot of a memory snapshot from the Dynatrace monitoring tool when the JBoss server hit an Out of Memory Exception. It is filtered on classes that start with “com.aspose”. I will upload code that demonstrates how we are using the API.

This is occurring on all documents. We use the Aspose PDF API to look for URL links in PDF files, and if we find a URL, we use Aspose to remove those links.

june_19_2017_snapshot.jpg (520.2 KB)

eric.w.sachse · June 27, 2017, 6:36pm

This method is the code we use to search for URLs in a PDF file. I simplified the code and removed things like logger statements

/**
 * @param inputFile
 * @return
 */
public boolean containsMediaLinks(String inputFile)
{       
    boolean mediaLinks = false;

    String regexFilter = "(?i)(https?://|www\\.)(([-a-zA-Z0-9]+\\.){0,}(youtube\\.com|dropbox\\.com))[-a-zA-Z0-9+&@#/%?=~_|!:,.;]*[-a-zA-Z0-9+&@#/%=~_|]?"
	
    com.aspose.pdf.Document pdfDocument = null;
    try
    {
        //open document
        pdfDocument = new com.aspose.pdf.Document(inputFile);
        
        // Step 1: Loop over the annotations
        // Go through each page
        for (int i = 1; i <= pdfDocument.getPages().size(); i++)
        {
            try
            {
                Page currentPage = pdfDocument.getPages().get_Item(i);
                // Get the annotations
                for (int j = 1; j <= currentPage.getAnnotations().size(); j++)
                {
                    try
                    {
                        Annotation annotation = currentPage.getAnnotations().get_Item(j);
                        if (annotation instanceof LinkAnnotation)
                        {
                            LinkAnnotation linkAnnotation = (LinkAnnotation) annotation;
                            if (linkAnnotation.getAction() instanceof GoToURIAction)
                            {
                                GoToURIAction myAction = (GoToURIAction) linkAnnotation.getAction();
                                if (myAction.getURI() != null && myAction.getURI().matches(regexFilter))
                                {
                                    mediaLinks = true;
                                    break;
                                }
                            }
                        }
                    }
                    catch (Exception annotEx)
                    {
	             annotEx.printStackTrace();
                    }
                }
            }
            catch (Exception pageEx)
            {
		pageEx.printStackTrace();
            }

            // If we found mediaLinks, do not need to continue checking the remaining pages in the PDF.
            if (mediaLinks)
            {
                break;
            }
        }


        // Step 2: Look for Text that is a URL.
        //
        if (!mediaLinks)
        {
            com.aspose.pdf.TextFragmentAbsorber textFragmentAbsorber = new com.aspose.pdf.TextFragmentAbsorber(regexFilter);
            TextExtractionOptions textExtractionOptions = new TextExtractionOptions(TextExtractionOptions.TextFormattingMode.Raw);
            textFragmentAbsorber.setExtractionOptions(textExtractionOptions);
            //set text search option to specify regular expression usage
            com.aspose.pdf.TextSearchOptions textSearchOptions = new com.aspose.pdf.TextSearchOptions(true);
            textFragmentAbsorber.setTextSearchOptions(textSearchOptions);

            //accept the absorber for first page of document
            pdfDocument.getPages().accept(textFragmentAbsorber);
            //get the extracted text fragments into collection
            com.aspose.pdf.TextFragmentCollection textFragmentCollection = textFragmentAbsorber.getTextFragments();

            //loop through the fragments
            for(com.aspose.pdf.TextFragment textFragment : (Iterable<com.aspose.pdf.TextFragment>)textFragmentCollection)
            {
                String foundText = textFragment.getText();
                mediaLinks = true;
                break;
            }
        }
    }
    catch (Exception ex)
    {
	ex.printStackTrace();
    }
    finally
    {
        if (pdfDocument != null)
        {
            try
            {
                pdfDocument.close();
            }
            catch (Exception ex)
            {
		ex.printStackTrace();
            }
            try
            {
                pdfDocument.dispose();
            }
            catch (Exception ex)
            {
				ex.printStackTrace();
            }
        }
    }
    return mediaLinks;
}

codewarior · June 27, 2017, 9:33pm

@eric.w.sachse,

Thanks for sharing the details.

I have tried executing the code but I am afraid I am getting an error “Invalid escape sequence (valid ones are \b \t \n \f \r " ’ \ )” on following code line. Can you please take a look and fix the issues, so that we can further test the scenario in our environment.

String regexFilter = "(?i)(https?://|www\\.)(([-a-zA-Z0-9]+\\.){0,}(youtube\.com|dropbox\.com))[-a-zA-Z0-9+&@#/%?=~_|!:,.;]*[-a-zA-Z0-9+&@#/%=~_|]?"

Invalid_Escape_Sequence.png (4.3 KB)

eric.w.sachse · June 28, 2017, 6:19pm

I fixed the code in the post that contains the code.

codewarior · June 29, 2017, 9:56am

@eric.w.sachse,

Thanks for the updates. We are working on testing the scenario with updated code and will get back to you soon.

codewarior · June 29, 2017, 2:03pm

@eric.w.sachse,

Thanks for your patience.

I have tested the scenario using one of my sample PDF files and as per my observations, the memory utilization is hiked by 1 GB but for the sake of further investigation, I have logged it as PDFJAVA-36865 in our issue tracking system. We will further look into the details of this problem and will keep you updated on the status of correction. Please be patient and spare us little time. We are sorry for this inconvenience.

eric.w.sachse · October 16, 2017, 2:36pm

Any update on this issue?

asad.ali · October 16, 2017, 7:04pm

@eric.w.sachse

Thanks for contacting support.

I am afraid that earlier logged issue has not been yet resolved, due to other pending issues in the queue. Our product team will definitely investigate the issue as per their development schedule and as soon as we have some definite updates regarding resolution progress, we will let you know. Please spare us little time.

We are sorry for the inconvenience.

eric.w.sachse · January 29, 2018, 6:10pm

Checking in on this issue.

Do you have a status updated on this issue?

asad.ali · January 29, 2018, 9:55pm

@eric.w.sachse

Thanks for contacting support.

We have checked the status of earlier logged ticket and we are afraid that it is not resolved yet. However, the investigation against this ticket has been started and as soon as respective team shares their feedback, we will inform you. Please be patient and spare us little time.

We are sorry for the inconvenience.

eric.w.sachse · April 3, 2018, 1:00pm

Checking in again on this issue.

What is the status for a resolution to this issue?
We in the process of evaluating products that we license, and it does not make sense to continue paying for a product if you do not fix open issues that are causing us issues in production.

asad.ali · April 3, 2018, 6:26pm

@eric.w.sachse

Thanks for your inquiry.

I am afraid that earlier logged issue has not been resolved yet. Please note that your issue has been logged under normal/free support model where issues are resolved on first come first serve basis. As shared earlier, there are other pending issues in the queue as well which were logged prior to your issue. We will definitely provide resolution to your issue after resolving previously logged issues, as each reported issue receives equal attention and significance from us.

Furthermore, we also offer paid support to our customers, who want their issues resolved on urgent basis, as issues logged under paid support model, have precedence over the issues logged under free/normal support mode.You may please check Paid Support FAQ for more information.

We are sorry for the inconvenience.

rcooksey · April 3, 2018, 8:24pm

Memory issues as discussed here are symptomatic of a fundamental product flaw. I’m not sure why we would need to log this issue under a paid support model to have it rectified.

I’m looking for a better answer as to if/when this will be addressed.

asad.ali · April 3, 2018, 9:24pm

@rcooksey

Thanks for contacting support.

Please note that we realize the severity of the issue, which is why we already have escalated this issue priority to next level. However, the issue is not yet resolved due to other issues which were logged prior to this and meant to be resolved first. We have suggested Paid Support option only if the issue is blocker and needs to be resolved on urgent basis.

Moreover, I am afraid that we are not in a position to share any reliable ETA for now - however the investigation against your issue has been started and is in progress. As you know that memory related issues are complex in their nature and dependent upon multiple components of the API, rectification to such issues may take certain amount of time - when there are other parallel queues of issues as well.

Nevertheless, we have taken notice of your concerns and definitely provide you update as soon as some progress towards resolution of the issue has been made. We highly appreciate your cooperation in this regard. Please give us little time.

We are sorry for the inconvenience.

eric.w.sachse · August 24, 2018, 7:37pm

Checking in again on this issue.

What is the status for a resolution to this issue?

This issue was submitted on June 2017, and it has been over a year waiting for a fix for this issue.

asad.ali · August 24, 2018, 9:04pm

@eric.w.sachse

Thanks for your inquiry.

We are sorry for the delay and inconvenience faced. Please note that due to low priority and long queue of pending issues, your issue could not be resolved. Please also note that issues related to memory consumption and performance of the API are complex in nature and depend upon many internal components of the API. Which is why they need more time to get investigated and resolved.

In reference to earlier logged issue, we have already made some investigations and improved performance of the API in latest version i.e. Aspose.PDF for Java 18.7. However, I am afraid that issue has not been closed yet due to some further integration and performance tests. As investigation and working over resolution of the logged ticket is already in process, we will surely inform you as soon as it is completed. We greatly appreciate your patience in this regard. Please spare us little time.

We are sorry for the inconvenience.

curmas · September 18, 2018, 8:27am

Hello support team,

we have a similar memory problem with pdf conversion under JBoss 7.0.4.
At the moment we are forced to restart the customer’s application server daily. This is not a permanent solution.
Calling com.aspose.pdf.MemoryCleaner.clear() does not have any effects.
You provide a new method com.aspose.pdf.MemoryCleaner.clearCurrentThreadLocals() since Aspose.PDF for Java 18.3.
Is this a possible solutions for the problem?
Are there any other news to solve this bug? Thanks a lot.

Kind regards
Matthias

eric.w.sachse · September 18, 2018, 2:05pm

I see that other people are experiencing the same issue that we reported over a year ago. Can you give an update on this issue?