Performance Issue with Aspose PDF for Java

ramesh.lingala.citi · August 29, 2017, 3:13am

Hi,

We are using Aspose 10.3 PDF for Java.
This is being deployed on the websphere server along with our application. This is a linux based machine.

We are getting the performance issue (i.e. CPU utilization goes very high almost 99-100%) whenever any use uses the functionality which generate a PDF.

I am attaching the stack trace from the log file. Appreciate, if you can check this and let us know if some solution is already worked upon for such situations.

This always happens at below lines in code - (Whenever we call processParagraphs method)
com.aspose.pdf.ADocument.processParagraphs(Unknown Source)
com.aspose.pdf.Document.processParagraphs(Unknown Source)

Stack Trace from the log file -

Thread Name:WebContainer : 2 ID:371 Time:Fri Aug 25 06:05:31 EDT 2017 State:RUNNABLE Priority:5
com.aspose.pdf.internal.p346.z6$z1.m1(Unknown Source)
com.aspose.pdf.internal.p346.z2.tryGetValue(Unknown Source)
com.aspose.pdf.internal.p536.z9.m4(Unknown Source)
com.aspose.pdf.internal.p536.z9.m1(Unknown Source)
com.aspose.pdf.internal.p536.z9.m1(Unknown Source)
com.aspose.pdf.internal.p587.z29.aS_(Unknown Source)
com.aspose.pdf.internal.p587.z29.m37(Unknown Source)
com.aspose.pdf.internal.p540.z23.m2(Unknown Source)
com.aspose.pdf.internal.p540.z23.m4(Unknown Source)
com.aspose.pdf.Page.getRect(Unknown Source)
com.aspose.pdf.internal.p581.z13.m6(Unknown Source)
com.aspose.pdf.internal.p581.z13.(Unknown Source)
com.aspose.pdf.internal.p581.z13.(Unknown Source)
com.aspose.pdf.internal.p581.z13.(Unknown Source)
com.aspose.pdf.TextBuilder.(Unknown Source)
com.aspose.pdf.z59.m1(Unknown Source)
com.aspose.pdf.z59.m5(Unknown Source)
com.aspose.pdf.Cell.m1(Unknown Source)
com.aspose.pdf.Row.m1(Unknown Source)
com.aspose.pdf.Row.m1(Unknown Source)
com.aspose.pdf.Table.m1(Unknown Source)
com.aspose.pdf.z59.m5(Unknown Source)
com.aspose.pdf.Cell.m1(Unknown Source)
com.aspose.pdf.Row.m1(Unknown Source)
com.aspose.pdf.Row.m1(Unknown Source)
com.aspose.pdf.Table.m1(Unknown Source)
com.aspose.pdf.z59.m5(Unknown Source)
com.aspose.pdf.Page.m2(Unknown Source)
com.aspose.pdf.Page.processParagraphs(Unknown Source)
com.aspose.pdf.ADocument.processParagraphs(Unknown Source)
com.aspose.pdf.Document.processParagraphs(Unknown Source)
com.citi.ewr.mail.EWRGeneratePDF.reviewCompletePdf(EWRGeneratePDF.java:275)
com.citi.ewr.command.EwrExportToPdfCommand.process(EwrExportToPdfCommand.java:73)
com.citi.ewr.servlet.FrontServlet.doGet(FrontServlet.java:75)
javax.servlet.http.HttpServlet.service(HttpServlet.java:575)
javax.servlet.http.HttpServlet.service(HttpServlet.java:668)
com.ibm.ws.webcontainer.servlet.ServletWrapper.service(ServletWrapper.java:1232)
com.ibm.ws.webcontainer.servlet.ServletWrapper.handleRequest(ServletWrapper.java:781)
com.ibm.ws.webcontainer.servlet.ServletWrapper.handleRequest(ServletWrapper.java:480)
com.ibm.ws.webcontainer.servlet.ServletWrapperImpl.handleRequest(ServletWrapperImpl.java:178)
com.ibm.ws.webcontainer.filter.WebAppFilterChain.invokeTarget(WebAppFilterChain.java:136)
com.ibm.ws.webcontainer.filter.WebAppFilterChain.doFilter(WebAppFilterChain.java:97)
com.citi.auth.AuthenticationFilter.doFilter(AuthenticationFilter.java:119)
com.ibm.ws.webcontainer.filter.FilterInstanceWrapper.doFilter(FilterInstanceWrapper.java:195)
com.ibm.ws.webcontainer.filter.WebAppFilterChain.doFilter(WebAppFilterChain.java:91)
com.ibm.ws.webcontainer.filter.WebAppFilterManager.doFilter(WebAppFilterManager.java:967)
com.ibm.ws.webcontainer.filter.WebAppFilterManager.invokeFilters(WebAppFilterManager.java:1107)
com.ibm.ws.webcontainer.servlet.CacheServletWrapper.handleRequest(CacheServletWrapper.java:87)
com.ibm.ws.webcontainer.WebContainer.handleRequest(WebContainer.java:940)
com.ibm.ws.webcontainer.WSWebContainer.handleRequest(WSWebContainer.java:1817)
com.ibm.ws.webcontainer.channel.WCChannelLink.ready(WCChannelLink.java:200)
com.ibm.ws.http.channel.inbound.impl.HttpInboundLink.handleDiscrimination(HttpInboundLink.java:463)
com.ibm.ws.http.channel.inbound.impl.HttpInboundLink.handleNewRequest(HttpInboundLink.java:530)
com.ibm.ws.http.channel.inbound.impl.HttpInboundLink.processRequest(HttpInboundLink.java:316)
com.ibm.ws.http.channel.inbound.impl.HttpICLReadCallback.complete(HttpICLReadCallback.java:88)
com.ibm.ws.ssl.channel.impl.SSLReadServiceContext$SSLReadCompletedCallback.complete(SSLReadServiceContext.java:1820)
com.ibm.ws.tcp.channel.impl.AioReadCompletionListener.futureCompleted(AioReadCompletionListener.java:175)
com.ibm.io.async.AbstractAsyncFuture.invokeCallback(AbstractAsyncFuture.java:217)
com.ibm.io.async.AsyncChannelFuture.fireCompletionActions(AsyncChannelFuture.java:161)
com.ibm.io.async.AsyncFuture.completed(AsyncFuture.java:138)
com.ibm.io.async.ResultHandler.complete(ResultHandler.java:204)
com.ibm.io.async.ResultHandler.runEventProcessingLoop(ResultHandler.java:775)
com.ibm.io.async.ResultHandler$2.run(ResultHandler.java:905)
com.ibm.ws.util.ThreadPool$Worker.run(ThreadPool.java:1881)

Please let me know if you need anything else.

asad.ali · August 29, 2017, 12:17pm

@ramesh.lingala.citi

Thanks for contacting support.

I have tested the scenario in our environment with Aspose.Pdf for Java 17.7 in a WebSphere Application and was unable to notice the issue. It looks like the issue, you have mentioned, is related with specific routine of execution as well as PDF document which is being processed. Would you please share a sample code snippet to demonstrate the routine of your execution along with the sample PDF document. We will again test the scenario in our environment and address it accordingly.

ramesh.lingala.citi · August 29, 2017, 6:56pm

Hi,
I have few questions related to this. Please check these.

I will not be able to provide the PDF as it didn’t get generate whenever there is an issue. But I am providing you a sample code which tell you that what exactly we are trying to do in PDF. It doesn’t contain the test methods where we are actually adding the data. Can’t provide those because those are confidential.

PDF_Sample_Code.zip (1.4 KB)

Please check the attached file and let us know if you can find something.
Environment Details (May be there is something specific about the versions of the environment). Could you please check and let me know if the Aspose version we are using is compatible with below mentioned JDK, Webspher, Linux.

Aspose 10.3
JDK 1.7.0_111
Web Sphere Server - 8.5
Linux machine OS Version - RHEL 6.x (where Web Sphere Server is installed)

Also, few additional point on which I want to check -

While searching the forum, I found that there are two similar issues someone else had. Below are the links.
Hang issue in Aspose.Pdf for Java 17.2.0
Java Tool Hung

Our Issue might be similar to these. In one the thread it is mentioned that there was some issue and it was fixed.

We are currently using Aspose 10.3 but you have mentioned that you have tested it with Aspose 17.7.
Could you please check and let us know if you can provide us the Aspose version 17.7 and can we use it without changing the code? Or it is fully backward compatible?
Do we need a separate license for the new version or our existing license will cover it?

Thanks

asad.ali · August 29, 2017, 7:45pm

@ramesh.lingala.citi

Thanks for providing details about the scenario.

We are working over testing the scenario in our environment and gathering the related information. We will get back to you as soon as we complete testing. Please be patient and spare us little time.

ramesh.lingala.citi · August 29, 2017, 8:24pm

Hi Asad,

I would also like to understand that we are using below lines in the code … you can check in the file I attached earlier.

pdfDocument = new Document();
.
.
.
pdfDocument.processParagraphs();
pdfDocument.optimize();
pdfDocument.setPageLayout(PageLayout.OneColumn);
pdfDocument.save(pdfFile);

Could you please help in understanding that what are the specific use of processParagraphs(); and optimize();?

Why do we need to use them?

Thanks

codewarior · August 30, 2017, 8:46am

@ramesh.lingala.citi,

The ProcessParagraph(…) method calculates and renders all the elements referenced inside PDF file, before it is saved. Whereas the optimize(…) method is used for Optimization or linearization, which refers to the process of making a PDF file suitable for online browsing using a web browser. When using this method, the API optimizes a file for web display.

ramesh.lingala.citi · August 31, 2017, 2:40am

Hi Asad,
Any update on this issue ?? Were you able to test this in your environment?

Thanks

asad.ali · August 31, 2017, 12:00pm

@ramesh.lingala.citi

Thanks for writing back.

We are working over setting up the specified environment as well as investigating other issues and we will definitely try our best to share results with you by the end of this week. Please be patient and spare us little time.

We are sorry for this inconvenience.

ramesh.lingala.citi · August 31, 2017, 5:48pm

Hi Asad,
Thanks for replying. Meanwhile your team is analyzing the issue I would like to check on below query regarding same …
While analyzing the issue from our end, we found out that it always stuck at the point where we are calling the method processParagraph(). You can check the initial logs I have provided in original post.

In our code we are using the methods as given below. To do the analysis, we tried generating the PDF without using the processParagraph() and optimize() method, and it generated the PDF properly.
So, just wanted to check and confirm if these methods are necessary to use or without the use of these methods also, we can generate the PDF. Will not using these method generate the PDF different in any way ?

pdfDocument = new Document();
.
.
.
pdfDocument.processParagraphs();
pdfDocument.optimize();
pdfDocument.setPageLayout(PageLayout.OneColumn);
pdfDocument.save(pdfFile);

Appreciate your help.

asad.ali · August 31, 2017, 8:17pm

@ramesh.lingala.citi

Thanks for writing to us.

As shared earlier, the ProcessParagraph() method is used to calculate and render all elements referenced inside the PDF file before saving it. For example if you need to determine total page count of PDF file before saving it, you can determine it after calling this method.

The Document.Optimize() method is used for optimization or linearization for fast browser view.

So, in case if you do not need any of the above mentioned functionalities, you may skip using of those methods, as it will not change the way, you are generating PDF with and PDF will be generated properly. In case of any further assistance, please feel free to let us know.

ramesh.lingala.citi · September 1, 2017, 1:19pm

Hi Asad,

Regarding the performance issue, while analyzing from our end, we found out that it may be because of some special UTF 8 char.

We found below warning message in the logs, which is coming from the Aspose.
Warning: Could not find any font that contains the needed symbol: “?” . Used standard font.

So, whenever there is some UTF 8 char, for which its not finding the proper font, it hangs and CPU utilization goes up.

Could you please check from this point also, if its helpful to find the issue?

Is there any way that once all the content is added to PDF, we can find the non-printable char or special UTF 8 char which may cause the issue and remove them from PDF Document?

Appreciate your help.

asad.ali · September 1, 2017, 7:28pm

@ramesh.lingala.citi

Thanks for sharing more details regarding the scenario.

We will definitely test the scenario with shared perspectives and let you know the results. However it seems that the issue is related to specified environment. Furthermore, would you please run a test with latest version of the API, which is Aspose.Pdf for Java 17.8 and share performance differences with us.

Each new version of the API, releases with some new enhancements as well as some updates in old method calls and classes, which details can easily be found in Release Notes of specified API version.

Furthermore, I have also checked the code file, which you have shared and it seems that your most of the code will be supported by latest version of the API. Though please note that we always recommend to use latest version of the API as it contains more fixes and performance improvements.

You can upgrade to any/latest release versions of API published before license subscription expiry. However please note that the license file can be used for as long as you want (unless you are using it with API versions published before license expiry). License expiry date can be checked in <SubscriptionExpiry>20170825</SubscriptionExpiry> tag, where 2017 is year, 08 is month and 25 is the date of month.