We're sorry Aspose doesn't work properply without JavaScript enabled.

Free Support Forum - aspose.com

Text extraction partially

Hi Team,

I have just got started with aspose.pdf with java on windows. I am simply reading a pdf file and trying to read the texts but for some reason it is only able to read few characters. Am I doing something wrong here ? See the below code.

public void ReadTextFromPdf(){
String _dataDir = “C:\Development\”;
String filePath = _dataDir + “test.pdf”;

    Document pdfDocument = new Document(filePath);

    TextAbsorber textAbsorber = new TextAbsorber();
    
    pdfDocument.getPages().accept(textAbsorber);

    String extractedText = textAbsorber.getText();                
    try {
    	System.out.println("extractedText = "+ extractedText)  
    } catch (java.io.IOException e) {
        e.printStackTrace();
    } 
}

My input file (test.pdf) contains the below lines of text.
“This is a test paragraph. Here we are testing ASPOSE pdf functionality.
We are trying to read all the texts.”

Output is:
extractedText = Evaluation Only. Created with Aspose.PDF. Copyright 2002-2021 Aspose Pty Ltd.
This is a test parag

@ruban.prakash

You are using Aspose.PDF in evaluation mode. Please get 30 days temporary license and apply it before extracting text to avoid the shared issue.

Hey Tahir,

I will take the 30 day’s temporary license but the issue I am facing, is it because of that ? I mean I am able to read only few characters, Please confirm.

Thanks,
Ruban

@ruban.prakash

Please attach your input PDF file here for testing. We will investigate the issue and provide you more information on it.

aspose-pdf-sample.pdf (41.4 KB)

This is the sample file.

@ruban.prakash

We have not faced the shared issue. Please check the attached image for detail.
image.png (20.4 KB)

Please make sure that you are using latest version of Aspose.PDF for Java 22.5 and apply the license correctly.

Tahir,

I am still facing the same issue. Could you please try without loading the aspose license and see what output you get ?

Thanks,
Ruban

@ruban.prakash

Without setting license, the text output will not be correct. Please execute the following code example and share the output with us.

License lic = new License();
lic.setLicense(licPath);
System.out.println(BuildVersionInfo.ASSEMBLY_VERSION);

Document pdfDocument = new Document(MyDir + "aspose-pdf-sample.pdf");

TextAbsorber textAbsorber = new TextAbsorber();

pdfDocument.getPages().accept(textAbsorber);

String extractedText = textAbsorber.getText();                
System.out.println("extractedText = "+ extractedText) ;