High CPU Utilization on Google Load Balancer (back end) - during multiple PDF conversion

dev.raz · March 29, 2022, 9:47am

Hi,
I am using Aspose.total with Developer OEM license.
I am using the latest version of Aspose words [21.12 ]
I have deployed in Java application using Google Cloud Load Balancer - back end

When I try to convert 100 pages of DOCX files to PDF, there is no issue for the first time.

But, if I try to process the same files for 5 times concurrently (or)
5 different such file conversion takes place at a time,
CPU Utilization of my back end server reaches more than 60%

1.) Why these much high CPU utilization for even 5 files ?

2.) If CPU utilization of the Google Cloud server instance is more than 40%,
when we access the Java back end service for File Preview (which includes PDF file conversion),
the below error throws :

502 Exception
The server encountered a temporary error and could not complete your request.

In aspose library, is there any default timeout between 1st PDF file conversion and the next one ?
(i.e., do we need to add particular time interval (or)
can we process any number of PDF conversion at a time)

Could you check from your side once and let us know the possible fixes.

alexey.maslov · March 29, 2022, 1:46pm

Hi @dev.raz,
thank you for your request. Сould you please provide examples of files for which this situation occurs?

dev.raz · March 29, 2022, 4:13pm

Hi @alexey.maslov,
Thank You for your reply.

DOCX_100_Pages.docx (7.1 MB)
This DOCX file conversion to PDF for multiple times results in High CPU Utilization.

alexey.maslov · March 29, 2022, 4:33pm

@dev.raz
Thank you for providing the document. We will analyze this scenario on our side and inform you about the results.

alexey.maslov · March 30, 2022, 1:15pm

@dev.raz,

Thank you for your patience. We analyzed your request and found similar behavior on our side.
We managed to reduce CPU usage by adding a time interval. There is no default timeout between conversions in the Aspose library, and you need to specify it yourself to reduce possible CPU usage.

Please note that performance and memory usage all depend on complexity and size of the documents you are generating. While rendering a document to fixed page formats (e.g. PDF), Aspose.Words needs to build two model in the memory – one for document and the other for rendered document.

The process of building layout model is not linear; for some documents, it may take a minute to render one page and may take a few seconds to render 100 pages, or vice versa. We are always working on improving performance; but, rendering will be always running slower than simple saving to flow formats (e.g. doc/docx).

Hope this answers your query. Please let us know if you have any more queries.

dev.raz · March 30, 2022, 5:17pm

Hi @alexey.maslov
Thank You for your continuous support.

We managed to reduce CPU usage by adding a time interval.

1) Could you share where you have added this time interval ?

There is no default timeout between conversions in the Aspose library, and you need to specify it yourself to reduce possible CPU usage.

2) Could you share a sample code for our reference ?

dev.raz · March 30, 2022, 5:27pm

Hi @alexey.maslov,

In tomcat server (Google Cloud Load Balancer - back end), during this High CPU Utilization issue,
we found the below error in tomcat - catalina.out

SEVERE [main] org.apache.catalina.loader.WebappClassLoaderBase.checkThreadLocalMapForLeaks The web application [appname] created a ThreadLocal with key of type [java.lang.ThreadLocal] (value [java.lang.ThreadLocal@38eb3f40]) and a value of type [com.aspose.pdf .internal.l59y.l1v] (value [com.aspose.pdf .internal.l59y.l1v@7c9ffc73]) but failed to remove it when the web application was stopped. Threads are going to be renewed over time to try and avoid a probable memory leak.

Could you check from your side once and let us know.

alexey.maslov · March 31, 2022, 7:46am

@dev.raz,

Thank you for your patience. This is a simplified source code snippet:

public void DocxToPdfMultithreading() throws Exception
    {
        for(int i = 0; i < 5; i++)
        {
            Thread.sleep(3000); // insert timeout here
            MultiThreadRun runner = new MultiThreadRun(mFilePathInMain, mFilePathOutMain);
            runner.start();
        }
    }

    private class MultiThreadRun extends Thread
    {
        MultiThreadRun(String filePathIn, String filePathOut)
        {
            mFilePathIn = filePathIn;
            mFilePathOut = filePathOut;
        }

        public void run()
        {
            try
            {
                Document doc = new Document(mFilePathIn);
                doc.save(mFilePathOut + getName() + ".pdf");
            }
            catch (Exception e)
            {
                e.printStackTrace();
            }
        }

        private String mFilePathIn;
        private String mFilePathOut;
    }

You need to choose the necessary timeout time yourself to find the perfect balance between processor load and execution time specifically for your case.

Now for your next question:

We are working on a solution to this problem. This issue has been logged as WORDSJAVA-2685. We will keep you updated and will let you know as soon as the issue is resolved.

As a temporary solution, you can paste the following line into your …\conf\context.xml file:
<Context antiResourceLocking="true">

Sorry for the inconvenience.

dev.raz · April 4, 2022, 7:22am

@alexey.maslov
Thank You for the continuous support.

You need to choose the necessary timeout time yourself to find the perfect balance between processor load and execution time specifically for your case.

In our scenario, we cannot predict the time interval between two file conversions.
Because we are processing different types of files with different sizes.

1 MB with 10 pages
50 MB file with 100 pages
200 MB file with 500 pages

Hence we are not able to set the exact timeout from our side.

_-----------------------------------------------------------------------------------------------------------------------

As a temporary solution, you can paste the following line into your …\conf\context.xml file:
<Context antiResourceLocking="true">

If we add the above line in context.xml, tomcat server (OS : Red Hat) is failed to start.

We have found the below error in tomcat - catalina.out

SEVERE [main] org.apache.catalina.loader.WebappClassLoaderBase.checkThreadLocalMapForLeaks Webアプリケーション[applicationwarname]はタイプ[java.lang.ThreadLocal]
（値[java.lang.ThreadLocal@5d809a71]）のキーと値タイプ[io.grpc.netty.shaded.io.netty.util.internal.InternalThreadLocalMap]
（値[io.grpc.netty.shaded.io.netty.util.internal.InternalThreadLocalMap@25afa0f1]）のThreadLocalを作成しましたが、
それはWebアプリケーションの停止時に削除されていません。スレッドは時間の経過とともに更新され、メモリリークの可能性を回避しようとしています。

Could you check this scenario from your side and let us know.

alexey.maslov · April 4, 2022, 8:27am

@dev.raz,

Unfortunately, we cannot reproduce this scenario. Please create a sample Java application (source code without compilation errors ) that helps us to reproduce your problem on our end and attach it here for testing. Also, please provide information about the version of Java you are using, the OS you are using, and the version of Tomcat.

alexey.maslov · April 5, 2022, 11:36am

A post was split to a new topic: Problem with using tomcat when converting PDF multiple times

alexey.maslov · April 5, 2022, 11:43am

@dev.raz,

I’ve moved your last post on the Aspose.PDF forum so my colleagues on the other team can reply to you.

dev.raz · April 11, 2022, 7:27am

@alexey.maslov

When I try to convert 100 pages of DOCX files or 100 pages of PPTX or 100 pages of XLSX to PDF, there is no issue for the first time.
But, if I try to process the same files for 5 times concurrently (or)
5 different such file conversion takes place at a time,
CPU Utilization of my back end server reaches more than 60%

[We are in need of urgent solution]
Could you verify it once and provide us temporary solution to order avoid high memory utilization for this scenario.

alexey.maslov · April 11, 2022, 11:05am

@dev.raz,

you can use e.g. a fixed 3 second timeout as a workaround for this issue as I posted earlier:

This is a simplified source code snippet:

...
public void DocxToPdfMultithreading() throws Exception
    {
        for(int i = 0; i < 5; i++)
        {
            Thread.sleep(3000); // insert timeout here
            MultiThreadRun runner = new MultiThreadRun(mFilePathInMain, mFilePathOutMain);
            runner.start();
        }
    }

    private class MultiThreadRun extends Thread
    {
        MultiThreadRun(String filePathIn, String filePathOut)
        {
            mFilePathIn = filePathIn;
            mFilePathOut = filePathOut;
        }

        public void run()
        {
            try
            {
                Document doc = new Document(mFilePathIn);
                doc.save(mFilePathOut + getName() + ".pdf");
            }
            catch (Exception e)
            {
                e.printStackTrace();
            }
        }

        private String mFilePathIn;
        private String mFilePathOut;
    }
...

In addition, we suggest you please use SaveOptions.MemoryOptimization property to optimize the memory performance. Setting this option to true can significantly decrease memory consumption while saving large documents at the cost of slower saving time.

Hope this helps you. Please let us know if you have any further questions.