Only partial test exracted from image

phadkemilind · December 21, 2020, 1:20am

Hi ,
When extracting text from jpeg, only few lines are extracted. pl refer to attached doc ( jpeg)
image1.jpeg (334.4 KB)

Appreciate your advise.

thanks

asad.ali · December 21, 2020, 9:00pm

@phadkemilind

Aspose.PDF does not offer functionality to extract text from images. Instead, it allows you to extract search from PDF documents. Would you please confirm which API you are using along with sharing the sample code snippet so that we can share our feedback with you accordingly.

phadkemilind · December 21, 2020, 10:09pm

Document pdfDocument = new Document(“TestDoc3.pdf”);

        // Create TextAbsorber object to extract text
        TextAbsorber textAbsorber = new TextAbsorber();

        // Accept the absorber for all the pages
        pdfDocument.Pages.Accept(textAbsorber);

        // Get the extracted text
        String extractedText = textAbsorber.Text;

        // Create a writer and open the file
        TextWriter tw = new StreamWriter("TestDoc3_out.txt");
        // Write a line of text to the file
        tw.WriteLine(extractedText);
        // Close the stream
        tw.Close();

phadkemilind · December 21, 2020, 10:12pm

TestDoc3.pdf (1.3 MB)

asad.ali · December 22, 2020, 7:03pm

@phadkemilind

The document that you have shared, does not have any text inside it. It consists of different images and Aspose.PDF is not capable of extracting text from images. Furthermore, you can extract images from the PDF and use Aspose.OCR in order to extract text from the images.