We're sorry Aspose doesn't work properply without JavaScript enabled.

Free Support Forum - aspose.com

Extract Text Method

I previously requested how to extract the text from a PDF using PDF.Kit.

I was instructed to review the following link:

http://www.aspose.com/documentation/file-format-components/aspose.pdf.kit-for-.net-and-java/extract-text-from-pdf-document.html

The method in the link uses ExtractText and GetText, but requires that the GetText method save to a file. I want to know if there is a way to get the text without having to save to a file and then open and read the file to get the text. Can there not just be a method to get the text into a string variable?

Hi,

You can save the text to a MemoryStream and then convert it to a string.

Thanks.

Thank you!

Hello Nicolas,<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" />

Thanks for considering Aspose.

Please refer to the following code snippet, in which text has been extracted from Pdf file and passed to memory stream and finally displayed over the command prompt.

[Java]

ByteArrayOutputStream ms = new ByteArrayOutputStream();

PdfExtractor extractor = new PdfExtractor();
extractor.bindPdf("C:/pdftest/HTMLtest.pdf");
extractor.extractText();
extractor.getText(ms);
String strInput = null;

try {

strInput = ms.toString();

} catch (Exception ioe) {
System.out.println("Error trying to read input.");
System.exit(1);
}
System.out.println("Extracted Text" + strInput );