Extracting Form Fields using Aspose.PDF for .NET

We're evaluating Aspose.Total for a different purpose, but a question came up about extracting form fields using Aspose.PDF for .NET.

Do you have a sample with the full references and "using" statements in C# that you can share with me to run through on my end to see if it can extract the fields from this particular document?

The forms in question are "encoded as “PDF 1.7, XFA 2.5, Dynamic Layout”. Thanks

Hi Chmxmpdf,

Thanks for your inquiry, you can extract easily form fields using Aspose.Pdf for .NET.
You can use below mentioned code to get the value of a particular field.

//open document
Document pdfDocument = new Document("input.pdf");

//get a field
TextBoxField textBoxField = pdfDocument.Form["textbox1"] as TextBoxField;

//get field value
Console.WriteLine("PartialName : {0} ", textBoxField.PartialName);

Console.WriteLine("Value : {0} ", textBoxField.Value);

You can use below mentioned code to get values from all the fields of a PDF document.

//open document
Document pdfDocument = new Document("input.pdf");

//get values from all fields
foreach (Field formField in pdfDocument.Form)
{
Console.WriteLine("Field Name : {0} ", formField.PartialName);

Console.WriteLine("Value : {0} ", formField.Value);
}

For above mentioned code only two namespaces are required as mentioned below.
using Aspose.Pdf;
using Aspose.Pdf.InteractiveFeatures.Forms;

Kindly visit below link for more detail about working with Aspose.Pdf for .NET forms.

http://www.aspose.com/documentation/.net-components/aspose.pdf-for-.net/working-with-forms.html

Thanks & Regards,

Thanks. That's not working on this particular document.

I've attached the document. It's “PDF 1.7, XFA 2.5, Dynamic Layout"

Do you have an example using Aspose.PDF for .NET that will iterate through the form controls on this document?

Thanks,

Chris Hartman

Hi Chris,

Thanks for your interest in our products. Before I comment further, please share some details regarding your requirement

  • Either you need to extract the form fields ?
  • Or you need to extract the values inside form fields ?
  • Or you need to get the information regarding fields inside PDF form.

I need to be able to extract the values inside the form fields. <= this is the goal.

Also, it'd be good if I could loop through all the form fields, get the field names, and also access the values.

Thanks

Hi Chris,<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" />

Thank you for providing the details about your requirement.

You may check the following documentation links for details and code snippets as per your requirement.

Get Value from an Individual Field of PDF Document

Get Values from all the Fields of PDF Document

Please do let us know if you need any further assistance.

Thank You & Best Regards,

Hi Chris,

Adding more to Nausherwan’s comments, you may also check the following link for instructions on
Identifying form fields names.

Thanks. I've tried all the suggestions above on the attached PDF that is described as "are encoded as “PDF 1.7, XFA 2.5, Dynamic Layout"

For example, the first form control has "Joe's Bar and Grill" in it.

In all cases (so far), the "foreach (Field formField in pdfDocument.Form)" and "foreach (string str in pdfDocument.FieldNames)" collections are empty after opening the attached PDF.

Were you able to get different results? Thank you

I've attached the PDF in question.

Hi Chris,<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" />

Thank you for sharing the template file.

We have found your mentioned issue using your shared template PDF file. Your issue has been registered in our issue tracking system with issue id: PDFNEWNET-31582. You will be notified via this forum thread regarding any update against your issue.

Sorry for the inconvenience,

Thanks,

My manager asked me to request a status update. I know I'll get updated when you resolve the issue, however, he asked me because there's a pressing need to find a solution for extracting the XFA forms data from Acord e-documents.

If/when you get a chance. . . .

I was wondering if there was a rough ETA on when a fix for this might be available? Thanks

Issue ID => PDFNEWNET-31582

This document contains hierarchical XFA form, that's why you can't access form fields via Document.Form.
You could use Document.Form.XFA property for accessing to XFA fields.
Document.Form.XFA.Datasets contains field data and Document.Form.XFA.Template contains field templates (which descibes field appearance etc)
Below is an example of iteration of the XFA form fields:

//recursive function to enumerate fields
private void enumFields(XmlNode node, string path)
{
//if this node has subnodes then call this routine recursively
if (node.NodeType == XmlNodeType.Element && node.HasChildNodes)
{
string subPath;
//path for the subfield
if (path == "")
{
subPath = node.Name;
}
else
{
subPath = path + "/" + node.Name;
}
foreach (XmlNode subNode in node.ChildNodes)
{
enumFields(subNode, subPath);
}
}
//if this text node then show field information
else if (node.NodeType == XmlNodeType.Text)
{
Console.WriteLine("Field name : {0}", path);
Console.WriteLine("Value : {0}", node.Value);
}
}

public void main()
{
Document pdfDocument = new Document("inFile.pdf");
//get values from all fields
if (pdfDocument.Form.XFA != null)
{
//get field data
XmlNode data = pdfDocument.Form.XFA.Datasets;
//enumerate fields
enumFields(data, "");
}
}

You also can get/set value of the particular field using the following code:
pdfDocument.Form.XFA["F[0].P1[0].Form_EditionIdentifier_A[0]" = "NEW VALUE";
Console.WriteLine(pdfDocument.Form.XFA["F[0].P1[0].Form_EditionIdentifier_A[0]"]);

Excellent. Thank you. That resolves the issue.

1 Like

The issues you have found earlier (filed as 31582) have been fixed in this update.


This message was posted using Notification2Forum from Downloads module by aspose.notifier.