Conversion in CDAP framework

Hi, I’m trying to convert pdf documents into plain text and html in a cdap application (http://cask.co/products/cdap/) and it’s taking way too long. Inside a spring boot app it takes like 20 seconds while inside CDAP platform it takes around 8 minutes. The code is placed inside a custom action (https://docs.cask.co/cdap/4.0.1/en/developers-manual/building-blocks/workflows.html#workflow-custom-actions).

Do you have any idea why it takes too long??? Could it be a “thread” problem??? Maybe a dependency issue???

Thanks in advance

@martin.scovotti,

Kindly send us the complete details of the use case, including source PDF documents, code and let us know which Aspose.Pdf for Java API version you are using. We will investigate and share our findings with you.

Thanks for the reply.

The code is simple as this:

public class PdfToPlainTextAction extends AbstractCustomAction {

	@Override
	public void run() throws Exception {
		getContext().execute(new TxRunnable() {
			@Override
			public void run(DatasetContext context) throws Exception {
				
				Document pdfDocument = new Document("C:\\Users\\myUser\\workspace\\Input\\10208345-225-6.pdf");
				TextExtractionOptions options = new TextExtractionOptions(TextExtractionOptions.TextFormattingMode.Pure);
				TextAbsorber absorber = new TextAbsorber(options);
				pdfDocument.getPages().accept(absorber);
				String text = absorber.getText();

			}
		});
	}

}

It takes too long on line:
Document pdfDocument = new Document(“C:\Users\myUser\workspace\Input\10208345-225-6.pdf”);

and when doing the conversion:
String text = absorber.getText();

We are using Aspose PDF 11.7.0

10208345-225-6.pdf (166.0 KB)

Thanks!

@martin.scovotti,

You are using quite an old version 11.7.0 of Aspose.Pdf for Java API. Kindly download and try the latest version 17.9 of Aspose.Pdf for Java API, and then let us know how that goes into your environment. We have tested your source PDF and code with latest version 17.9 and it takes 4 seconds.

Thanks for your fast response. I’ve tried the last version as you requested.

We are still facing the same problem, it takes around 9 minutes to convert a single document.

Aspose conversion works great if we run as an standalone java app, but this is not our case. We are deploying the app in CDAP. Have you tried it in a similar environment? I believe it’s not a code issue, but something regarding the CDAP ecosystem and its interaction with Aspose. Any idea or suggestion is appreciated.

Thanks in advance :slight_smile:

@martin.scovotti,

CDAP is the third party product and we are not certain about the issue. We recommend you please raise this issue with the CDAP support team. Kindly list down all steps to replicate the same environment. We will investigate and share our findings with you.