Is there a function to extract the PDF or PDF page to a string?

Von · March 25, 2010, 12:11pm

Hello,

The Apose.pdf.kit allows you to extract the text of the PDF document to a file.

Von

shahzadlatif · March 26, 2010, 12:32am

Hi Von,

Thank you very much for considering Aspose.

You can extract text to a MemoryStream using GetText method of PdfExtractor class in Aspose.Pdf.Kit for .NET; you can then convert the data in the MemoryStream to String.

I hope this helps. If you have any further questions, please do let us know.
Regards,

Von · March 26, 2010, 5:31pm

The below only return blank lines what am I doing wrong?

Dim pageCount As Integer = 1

Dim currentpage As String

Dim extractor As PdfExtractor = New PdfExtractor()

extractor.BindPdf(PDFfilename)

extractor.ExtractText()

While extractor.HasNextPageText()

Dim mem As New MemoryStream()

extractor.GetNextPageText(mem)

Dim sr As New StreamReader(mem)

currentpage = Trim(sr.ReadToEnd())

sr.Close()

mem.Close()

currentpage.Replace(vbCrLf, """" & ", " & """")

writetodebugfile(currentpage & vbCrLf, 0)

pageCount = pageCount + 1

End While

End Function

shahzadlatif · March 28, 2010, 12:21pm

Hi Von,

Your code looks fine; please share the sample PDF file you’re working with, so we could test the issue at our end.

We’re sorry for the inconvenience and appreciate your cooperation.
Regards,

vonwallace · March 29, 2010, 9:37am

How do I share with you the pdf?

What is your email?

Von

shahzadlatif · March 30, 2010, 1:10am

Hi Von,

You can either mark this post as private and then attach the file with it; Or you can send it using ‘Contact -> Send shahzad.latif an email’ option at the top of this post.

I hope this helps.
Regards,