Extracting Form Fields using Aspose.PDF for .NET

chmxmpdf · October 27, 2011, 10:25am

We're evaluating Aspose.Total for a different purpose, but a question came up about extracting form fields using Aspose.PDF for .NET.

Do you have a sample with the full references and "using" statements in C# that you can share with me to run through on my end to see if it can extract the fields from this particular document?

The forms in question are "encoded as “PDF 1.7, XFA 2.5, Dynamic Layout”. Thanks

rashid.ali · October 27, 2011, 11:21am

Hi Chmxmpdf,

Thanks for your inquiry, you can extract easily form fields using Aspose.Pdf for .NET.
You can use below mentioned code to get the value of a particular field.

//open document
Document pdfDocument = new Document("input.pdf");

//get a field
TextBoxField textBoxField = pdfDocument.Form["textbox1"] as TextBoxField;

//get field value
Console.WriteLine("PartialName : {0} ", textBoxField.PartialName);

Console.WriteLine("Value : {0} ", textBoxField.Value);

You can use below mentioned code to get values from all the fields of a PDF document.

//open document
Document pdfDocument = new Document("input.pdf");

//get values from all fields
foreach (Field formField in pdfDocument.Form)
{
Console.WriteLine("Field Name : {0} ", formField.PartialName);

Console.WriteLine("Value : {0} ", formField.Value);
}

For above mentioned code only two namespaces are required as mentioned below.
using Aspose.Pdf;
using Aspose.Pdf.InteractiveFeatures.Forms;

Kindly visit below link for more detail about working with Aspose.Pdf for .NET forms.

http://www.aspose.com/documentation/.net-components/aspose.pdf-for-.net/working-with-forms.html

Thanks & Regards,

chmxmpdf · October 27, 2011, 11:40am

Thanks. That's not working on this particular document.

I've attached the document. It's “PDF 1.7, XFA 2.5, Dynamic Layout"

Do you have an example using Aspose.PDF for .NET that will iterate through the form controls on this document?

Thanks,

Chris Hartman

codewarior · October 27, 2011, 1:27pm

Hi Chris,

Thanks for your interest in our products. Before I comment further, please share some details regarding your requirement

Either you need to extract the form fields ?
Or you need to extract the values inside form fields ?
Or you need to get the information regarding fields inside PDF form.

chmxmpdf · October 27, 2011, 1:36pm

I need to be able to extract the values inside the form fields. <= this is the goal.

Also, it'd be good if I could loop through all the form fields, get the field names, and also access the values.

Thanks

nausherwan.aslam · October 28, 2011, 2:11am

Hi Chris,

Thank you for providing the details about your requirement. You may check the following documentation links for details and code snippets as per your requirement.

Please let us know if you need any further assistance.

Thank You & Best Regards,

codewarior · October 28, 2011, 2:22am

Hi Chris,

Adding more to Nausherwan’s comments, you may also check the following link for instructions on Identifying form fields names.

chmxmpdf · October 28, 2011, 7:32am

Thanks. I've tried all the suggestions above on the attached PDF that is described as "are encoded as “PDF 1.7, XFA 2.5, Dynamic Layout"

For example, the first form control has "Joe's Bar and Grill" in it.

In all cases (so far), the "foreach (Field formField in pdfDocument.Form)" and "foreach (string str in pdfDocument.FieldNames)" collections are empty after opening the attached PDF.

Were you able to get different results? Thank you

I've attached the PDF in question.

nausherwan.aslam · October 28, 2011, 8:08am

Hi Chris,

Thank you for sharing the template file.

We have found your mentioned issue using your shared template PDF file. Your issue has been registered in our issue tracking system with issue id: PDFNEWNET-31582. You will be notified via this forum thread regarding any update against your issue.

Sorry for the inconvenience,

chmxmpdf · November 2, 2011, 12:12pm

Thanks,

My manager asked me to request a status update. I know I'll get updated when you resolve the issue, however, he asked me because there's a pressing need to find a solution for extracting the XFA forms data from Acord e-documents.

If/when you get a chance. . . .

chmxmpdf · November 10, 2011, 3:11pm

I was wondering if there was a rough ETA on when a fix for this might be available? Thanks

Issue ID => PDFNEWNET-31582

andrey.nekrasov · November 11, 2011, 1:42am

This document contains hierarchical XFA form, that's why you can't access form fields via Document.Form.

You could use Document.Form.XFA property for accessing to XFA fields.

Document.Form.XFA.Datasets contains field data and Document.Form.XFA.Template contains field templates (which descibes field appearance etc)

Below is an example of iteration of the XFA form fields:

//recursive function to enumerate fields

private void enumFields(XmlNode node, string path)

{

//if this node has subnodes then call this routine recursively

if (node.NodeType == XmlNodeType.Element && node.HasChildNodes)

{

string subPath;

//path for the subfield

if (path == "")

{

subPath = node.Name;

}

else

{

subPath = path + "/" + node.Name;

}

foreach (XmlNode subNode in node.ChildNodes)

{

enumFields(subNode, subPath);

}

//if this text node then show field information

else if (node.NodeType == XmlNodeType.Text)

{

Console.WriteLine("Field name : {0}", path);

Console.WriteLine("Value : {0}", node.Value);

}

public void main()

{

Document pdfDocument = new Document("inFile.pdf");

//get values from all fields

if (pdfDocument.Form.XFA != null)

{

//get field data

XmlNode data = pdfDocument.Form.XFA.Datasets;

//enumerate fields

enumFields(data, "");

}

You also can get/set value of the particular field using the following code:

pdfDocument.Form.XFA["F[0].P1[0].Form_EditionIdentifier_A[0]" = "NEW VALUE";

Console.WriteLine(pdfDocument.Form.XFA["F[0].P1[0].Form_EditionIdentifier_A[0]"]);

chmxmpdf · November 11, 2011, 8:05am

Excellent. Thank you. That resolves the issue.

aspose.notifier · December 12, 2011, 2:17am

The issues you have found earlier (filed as 31582) have been fixed in this update.

This message was posted using Notification2Forum from Downloads module by aspose.notifier.