Trying to grab all text from a pdf including information entered into fields (textbox) and notes - doesn’t seem to extract.
Any ideas?
Hi Scot,
Thank you very much for considering Aspose.
Can you please share the PDF and the code snippet you’re trying to use at your end? We’ll test the issue at our end and will update you accordingly.
We’re sorry for the inconvenience.
Regards,
Attached are two PDF documents.
One w/ notes. One w/ forms.
Hi Scot,
I’m testing the issue at my end, however I have noticed that the text is not being extracted from the file at all. Can you please share whether you’re able to extract the other text but not the form contents? Or, you’re unable to extract any text from these files.
Please share your thoughts on this,so we would be able to continue investigating the issue.
We’re sorry for the inconvenience.
Regards,
I am able to extract the form field names, but not the populated text.
As for the sheet with the sticky note-I can extract everything but the sticky note text.
code:
PdfExtractor pd = new PdfExtractor()
pd.BindPdf(@“c:\test.pdf”);
pd.ExtractTect();
pd.GetText(@“C:\test.txt”);
You must be using the un-licensed copy? When I run those files with the un-licensed PDF (trial) it doesn’t extract any text but when run with the licensed DLL it extracts like I have mentioned.
Hi Scot,
In order to extract form field values and the annotations of any kind you can use the following code snippets.
Code to extract form filed values:
//First a input pdf file should be assigned
Aspose.Pdf.Kit.Form form = new Aspose.Pdf.Kit.Form(“form.pdf”);
//get all field names
String[] allfields = form.FieldsNames;
for (int i = 0; i < allfields.Length; i++)
{
// Get the appearance attributes of each field, consequtively
string value = form.GetField(allfields[i]);
}
form.Save();
Code to extract annotations:
editor.BindPdf("test.pdf");
Enum[] annotType ={ AnnotationType.Text };
ArrayList annotList = editor.ExtractAnnotations(1, 1, annotType);
for (int i = 0; i < annotList.Count; i++)
{
Hashtable currentNode = (Hashtable)annotList[i];
Aspose.Pdf.Kit.Annotation annot = new Annotation(currentNode);
MessageBox.Show(annot.Contents);
}
I hope this works for you. If you're still not satisfied then please do let us know.
Regards,