Extract text from pdf using vb.net

Hello All,

i used this code to extract text from pdf, but i found it extract only one line from text?can you tell me why?

Dim dlg As OpenFileDialog = New OpenFileDialog

If (dlg.ShowDialog = DialogResult.OK) Then

'open document

Dim pdfDocument As New Aspose.Pdf.Document(dlg.FileName)

Dim extractedText As String

For Each pdfPage As Page In pdfDocument.Pages

Using textStream As New MemoryStream()

'create text device

Dim textDevice As New Devices.TextDevice()

'set text extraction options - set text extraction mode (Raw or Pure)

Dim textExtOptions As New TextExtractionOptions(TextExtractionOptions.TextFormattingMode.Pure)

textDevice.ExtractionOptions = textExtOptions

'convert a particular page and save text to the stream

textDevice.Process(pdfPage, textStream)

'close memory stream

textStream.Close()

'get text from memory stream

extractedText = Encoding.Unicode.GetString(textStream.ToArray())

End Using

Next pdfPage

TextBox1.Text = extractedText

Thanks.

Hi,


Thanks for using our products.

Can you please share the source PDF file so that we can test the scenario at our end. We are sorry for this inconvenience.

Thanks for your answer, the file of pdf contains arabic and english words, it is just for testing the product and who to deal with two languages

Hi there,

Thanks for considering Aspose. I've tested your sample code with source document, its working fine. It seems you are evaluating Aspose.Pdf without license. As evaluation version has two limitations, evaluation watermark and at most four elements of any collection can be viewed. Please make a request for 30 days temporary license to evaluate our product without any limitation. Hopefully your issue will be resolved.

Please feel free to contact us for any further assistance.

Best Regards,