I am sure I just don’t know where to look, but how do I read pdf files? I am specifically looking for how to read into tables in the pdfs. If you could just point me to the documentation I should be ok, I only seem to be able to find the documentation on making pdfs. Examples would be a plus! Thank you.
Hi Stan,
You can read, edit and manipulate an existing PDF file using Aspose.Pdf.Kit . You can find the details in the [documentation of Aspose.Pdf.Kit ](http://www.aspose.com/documentation/file-format-components/aspose.pdf.kit-for-.net-and-java/index.html)
.
As far as your current requirement is concerned, can you please try FormDataConverter , which helps to import and export data between data tables and the database, or AutoFiller which allows you to read data from the data table and then fill the form fields to show the data on the PDF file.
I hope it helps. If it doesn’t satisfy your requirement, then please elaborate it a little bit so we could help out in a better way.
Regards,
I am having trouble finding how I can upload a PDF file and read the text and tables that are in the pdf. These pdfs were not made with Aspose, but frorm an outside source. I need to be able to pull the tables out of the pdf and get the values - any idea where I shouls start? Autofiller looks like it converts from DataTables/DB -> Pdf and FormDataConvertor also does not look like it can read PDFs.
Bottom line - where do I start if I want to read pre-existing PDF files?
I followed the code almost exactly from here: http://www.aspose.com/documentation/file-format-components/aspose.pdf.kit-for-.net-and-java/aspose.pdf.kit.pdfextractor.extracttext.html
string FILENAME;
string OUTPUT;
protected void Page_Load(object sender, EventArgs e)
{
FILENAME = Server.MapPath("~/aspose/files/temp.pdf");
OUTPUT = Server.MapPath("~/aspose/files/temp.txt");
}
protected void btn_Click(object sender, EventArgs e)
{
fu.SaveAs(FILENAME);
PdfExtractor pe = new PdfExtractor();
pe.Password = "";
pe.BindPdf(FILENAME);
pe.ExtractText();
pe.GetText(OUTPUT);
}
And all I get is a blank txt file and the error:
Exception Details: System.NullReferenceException: Object reference not set to an instance of an object.
Source Error:
|
Any idea what I am doing wrong?
Hi Stan,
I understand your requirement now. I think following programmer’s guide topic can be helpful for you: Extract Text from PDF Document
Also, Programmer’s Guide and Knowledge Base can be a good starting point, because these sections explain things in detail.
As far as the exception is concerned, your code is fine, however there might be some problem with the PDF file contents. Can you please share the PDF file with us so we could investigate the problem at our end.
We’re looking forward to help you out.
Regards,
You were correct, I was trying to parse a random pdf that was a form, and not just data like the one I am actually trying to use.
I did get it parsed into text, but it basically gave me the same results as PDFBox: http://naspinski.net/post/ParsingReading-a-PDF-file-with-C-and-AspNet-to-text.aspx
which is to say it was a very hard to comprehend and parse series of text lines that do not follow the order in the table; I am really hoping that your product does make this easier to do than I have done in the past.
I truly wish I could post the pdf so you could try it, but I cannot, so let me try to explain it: this is a simple 2 page document with 2 tables of data, I just want the first table and in the info inside of it. That table is 12 columsn by 8 rows, pretty simple. but the data comes out in a pain in the butt ordering like this (these are column numbers):
1,3,4,6,8,11,12,column headers,9, 10, and so on.
Is there a way to pull a table of data from a pdf with logical organization? Or is it always in this haphazard arrangement? I have been able to pull the required information in the past, but like I said it is a pain in the butt. Thank you
Hi Stan,
I’m sorry to inform you that PdfExtractor.ExtractText method can only extract text in raw format. I have logged a new feature request to extract text in some properly ordered fashion. Our team will be looking into this requirement. However, I’m afraid this feature might not be available in short time.
We’re sorry for the inconvenience. If you have any other questions, please do let us know.
Regards,