Convert PDF to Text and HTML in CDAP Application using Aspose.PDF - High memory consumption

Hello, I’m trying to convert pdf documents into plain text and html in a cdap application (http://cask.co/products/cdap/) and it’s taking away too much memory. The code is placed inside a flowlet which is run by a flow.

Do you have any idea???

public class ProcessingFlow extends AbstractFlow {
@Override
public void configure() {
setName(“ProcessingFlow”);
addFlowlet(“rawToTextFlowlet”, new RawToTextFlowlet());
}
}
public class RawToTextFlowlet extends AbstractFlowlet {
public void process(Document document) {
Document pdfDocument = new Document(“C:\Users\myUser\workspace\Input\10208345-225-6.pdf”);
TextExtractionOptions options = new TextExtractionOptions(TextExtractionOptions.TextFormattingMode.Pure);
TextAbsorber absorber = new TextAbsorber(options);
pdfDocument.getPages().accept(absorber);
String text = absorber.getText();
}
}

Thanks in advance

pd: we are using Aspose PDF 11.7.0
Example 9832594-114-10.pdf (267.4 KB)

@edaran

Thank you for contacting support.

Please note that we always recommend latest version of the API because it includes more features and bug fixes. Likewise, support is provided based on latest available version and other resources including example project, documentation, API references etc. are all kept up to date. Therefore, please upgrade to Aspose.PDF for Java 19.9 and then share your kind feedback with us.

Hello, as we renewed licence we are using one of the latest versions (19.12) and we haven’t seen any upgrade regarding memory usage.

Any idea? Can you please take a look to this issue within CDAP?

Thanks in advance

@edaran,

I have observed your comments and like to inform that Aspose.PDF latest version is 20.2. Also please share complete working project to reproduce issue.

Ok, as I can’t share my private project, I created a simple CDAP project to reproduce the slow performace of Aspose library and its memory usage within CDAP framework.

https://github.com/ivanpatos/cdap-aspose-example

There you can see a small guide in order to deploy the app (Windows 10, jdk1.8.0_144, Aspose 20.2)

We need to have this issue resolved, otherwise we will not renew the licence as we don’t see any upgrade. Any question please let me know.

@edaran,

We are looking into this and will get back to you with feedback soon.

Hi, any update??? Our licence is about to expire and we’re afraid we will lose support

Regards!

@edaran,

Thanks for sharing further details.

We have logged an investigation ticket as PDFJAVA-39268 in our issue tracking system. We will further look into details of it and keep you posted with the status of its resolution. Please be patient and spare us some time.

We are sorry for the inconvenience.

We have found that the jars you provide are signed. Loading them in local environment is really slow, it takes serveral minutes (~8/10 minutes), as the JVM used by CDAP takes this time to check out the signed jars.

As a workaround, in our local environment, we have modified original files to remove the signature stuff. Can you please review/investigate this issue? We think this is also related to the memory issue.

Regards

@edaran,

I like to inform this issue has been added recently in our issue tracking system and as per our company policy, the first priority for investigation is given to the Paid Support i.e. Enterprise and Priority Support on first come first serve basis. After that the issues from normal support forum are scheduled for investigation on first come first serve basis. I request for your patience and we will keep you update regarding issue status and will resolve this as soon as possible.

Hi, thanks for the support. Our licence is expiring tomorrow. If you have any news please let me know ASAP and we will reconsider renewing the licence.

Regards,

@edaran

I like to inform that we have started working on this issue and will share good news with you soon. I request for your patience.

Any news???

Thanks!

@edaran,

I like to inform that we have worked on this issue and need more information for further investigation. It is very important for us to know exact amount of memory that is used on your environment and maximum memory available for execution during Aspose.PDF conversion. Also we want to know desired amount of used memory suitable for your request. In this way we can determine whether it is possible to satisfy the request and find possible solutions.