Hi,
In two of our production environments, with two different customers the PDF text extraction is taking a long time (5 - 20 minutes, some maybe longer - in one instance 45 minutes).
The time appears to be consumed by a call that is being made to the com.aspose.pdf.facades.PdfExtractor classes’ extractText method.
I’ve currently 8 example PDFs:
CON2148000.pdf: PDF document, version 1.7, 417762 bytes
CON2163292.pdf: PDF document, version 1.6, 329696 bytes, 431 seconds
Credit.pdf: PDF document, version 1.6, 99827 bytes, 2791826ms (46 minutes)
BOR-GC-Checklist-CON2065161.pdf: PDF document, version 1.6, 1827922 bytes
Checklist-2-CON2147021.pdf: PDF document, version 1.7, 653901 bytes
Checklist-CON2147021.pdf: PDF document, version 1.7, 65300 bytes
CON2163292.pdf: PDF document, version 1.6, 329696 bytes
LBK-Contract.pdf: PDF document, version 1.7, 510885 bytes
I’m currently getting permission from one or both customers to send Aspose some of these samples.
One customer’s system is multi-user, multi-client system and has logging for timings.
The other customer’s system multi-user, single-client system but the logging for timings is a little different (we may need to alter this to ease investigation).
We have been evaluating all sorts of things to try to understand what is occuring on the system at the time these issues arise but have not found anything suspicous to date.
I believe we are running jdk1.7.0_80 but will need to confirm this. The aspose jar is aspose.pdf-17.9.jar.
A code sample:
PdfExtractor extractor = new PdfExtractor();
if (!NonNullString.isEmpty(_password)) {
ByteArrayOutputStream output = new ByteArrayOutputStream();
PdfFileSecurity security = new PdfFileSecurity(new ByteArrayInputStream(_input), output);
security.decryptFile(_password);
_input = output.toByteArray();
output.close();
}
getStart = System.currentTimeMillis();
extractor.bindPdf(new ByteArrayInputStream(_input));
getCompleted = System.currentTimeMillis();
LOG.info("execute: bindPdf completed. " + “Elapsed time: " + ((getCompleted - getStart) / 1000) + " seconds.”);
getStart = System.currentTimeMillis();
extractor.extractText();
getCompleted = System.currentTimeMillis();
LOG.info("execute: extractText completed. " + “Elapsed time: " + ((getCompleted - getStart) / 1000) + " seconds.”);
getStart = System.currentTimeMillis();
_outputText = getText(extractor);
getCompleted = System.currentTimeMillis();
LOG.info("execute: getText completed. " + “Elapsed time: " + ((getCompleted - getStart) / 1000) + " seconds.”);
extractor.close();
Have you any suggestions as to what I can try to do to further the investigation?
Clayton