Slow performace

jlnebril · November 29, 2017, 12:14pm

Hi,
we are testing aspose ocr java solution to include in our ERP solution.
We use a simple source to try to get the text from a PDF file.
The performance is soo bad!!!
The bad performance is bad due that I am using trial version???
My test computer has 8 Gb RAM and i7 proccesor and I am using java 1.7
Best regards
Jose

ikram.haq · November 29, 2017, 5:21pm

@jlnebril,

Trial version only limits the result to display. It has nothing to do with the performance of the API. Please share the sample PDF file that you are using at your end. We will evaluate it and update you about our findings.

jlnebril · November 29, 2017, 5:47pm

Hi Ikram
thanks
I attach the pdf file, is very simple.
Also I include the source and exec times.
1.- com.aspose.pdf.Document pdfDocument = new com.aspose.pdf.Document(FileNamePdf); spends many seconds
2.- jpegDevice.process( pdfDocument.getPages().get_Item(1), imageStream);
spends many minutes
3.- OcrEngine.setImage(ImageStream.fromFile(FileJPG));
spends many minutes
4.- OcrEngine.getText();
spends many many minutes.

Thanks Ikram

 public String GetText( String FileNamePdf, String Idioma )  {
    
    com.aspose.ocr.OcrEngine OcrEngine = new com.aspose.ocr.OcrEngine(); 
    com.aspose.pdf.Document pdfDocument = new com.aspose.pdf.Document(FileNamePdf);
    //OcrEngine.
    String Buffer = "";
    String FileJPG = FileNamePdf + ".jpg";
    try {
        com.aspose.pdf.devices.Resolution resolution = new 
        com.aspose.pdf.devices.Resolution(300);
        com.aspose.pdf.devices.JpegDevice jpegDevice = new 
        com.aspose.pdf.devices.JpegDevice(resolution, 100);            
        java.io.OutputStream imageStream = new java.io.FileOutputStream( FileJPG);            

        // Perform OCR operation on one page at a time
        jpegDevice.process( pdfDocument.getPages().get_Item(1), imageStream);            
        OcrEngine.setImage(ImageStream.fromFile(FileJPG));
        if ( OcrEngine.process()) {
        Buffer = "" + OcrEngine.getText();
        }
        // Close the stream
        imageStream.close();
        System.out.println("Result: " + Buffer );
    } catch ( Exception e ) {
      int i = 0;
    }
    return Buffer;
 }

JL2017112915119584315440_2017-FA-01-0003.PDF (21.4 KB)
2017112915119584315440_2017-FA-01-0003.PDF (21.4 KB)

ikram.haq · November 30, 2017, 7:44am

@jlnebril,

We have investigated the issue. It was found that the data in the PDF is in tabular format. Please note that current implementation does not support extracting data from table format. This is to update you that reading data from tabular format issue has been logged into our system with ID OCRNET-2941. The issue ID has been link with this thread. You will be notified automatically in this forum thread once any update is available.