Accessing fields with duplicate field names

I’m currently using Aspose PDF Kit to process and modify PDF documents provided by a 3rd party. I’ve come across a situation where two text fields on different pages have identical field names. This situation is preventing me from processing all the fields in the PDF file as required. Attached is a file that demonstrates the issue I’m having. Code snippet is below.


Aspose.Pdf.Kit.Form form = new Aspose.Pdf.Kit.Form(@“D:\Temp\test.doc.pdf”);
string[] fieldNames = form.FieldNames;
fieldNames.ToList().ForEach(x => Console.WriteLine (x));

I realize a workaround would be to give each field a unique field name, but the input in my particular situation is to process more than 50k pdf files.

Is there another workaround? Or a way to help me identify a pdf form that has field names duplicated?

Thanks in advance for any help,
Anthony

Hi Anthony,

Thanks for using our products.

First of all, let me inform you that Aspose.Pdf for .NET and Aspose.Pdf.Kit for .NET have been merged into a single product and Aspose.Pdf.Kit for .NET has been discontinued as a separate product. All the features of Aspose.Pdf.Kit for .NET are available under Aspose.Pdf.Facades namespace of Aspose.Pdf for .NET v6.x. You can check the following documentation links to upgrade your code from Aspose.Pdf.Kit for .NET to the new merged Aspose.Pdf for .NET.

http://www.aspose.com/blogs/aspose-blogs/shahzad-latif/archive/2011/06/11/migrating-from-legacy-code-to-merged-aspose.pdf-for-.net.html

Kindly use the latest version of Aspose.Pdf for .NET v6.7. However, It may not be possible using Aspose.Pdf APIs to process the fields with the same name. Moreover, I logged this problem with ID: PDFNEWNET-33309 in our Issue Tracking System. We will further look into the details of this issue and will keep you updated via this forum thread on the status of correction.

We apologize for your inconvenience.

Thanks & Regards,

Has there been any update on this?

As a workaround, until the issue is fixed, I did more digging and discovered that the WidgetAnnotions do maintain each field's "Widget" / UI representation. It appears that in a PDF a field can have multiple user-entries for the same field, but the backing field is the same. When a user types in one of the fields, the text automatically appears in the other field because the two widgets are just two different visual representations of the same field. With that in mind, I looked at the WidgetAnnotations, and can see that each "Widget" is accessible with Aspose. Right now the one thing that I can't figure out how to get the Font from the WidgetAnnotation.

Any thoughts on how to get font information from a widget annotation?

I've tried with the following code, but can't quite get there. In the code below 'form' is an instance of Aspose.Pdf.Facades.Form

string s ="";

for (int p = 1; p <= form.Document.Pages.Count; p++)

{

Aspose.Pdf.InteractiveFeatures.Annotations.AnnotationCollection ann = form.Document.Pages[p].Annotations;

Aspose.Pdf.InteractiveFeatures.Annotations.Annotation annotation;

for (int i = 1; i <= ann.Count; i++)

{

annotation = ann[i];

s = s + p + "," +

annotation.FullName + "," +

annotation.Rect.ToString().Replace(',', '|') + "\r\n";

Aspose.Pdf.InteractiveFeatures.Annotations.AppearanceDictionary ad = annotation.Appearance;

foreach (System.Collections.DictionaryEntry field in ad)

{

//((Aspose.Pdf.XForm)(field.Value)).Resources.Fonts[1]

}

}

}

Hi,

Thanks for using our products.

jmoericke:

Has there been any update on this?

Thanks for your patience. Our development team is working hard to get PDFNEWNET-33309 fixed but I am afraid its not yet completely resolved. However, I have requested the development team to share the ETA regarding its resolution. As soon as we get required information, we will be more than happy to update you with the status of correction. Please be patient and spare us little time.

We apologize for your inconvenience.

jmoericke:

Any thoughts on how to get font information from a widget annotation?

I am sorry to inform you that get Font information from widget annotation is currently not available in Aspose.Pdf for .NET. However, I have logged a new feature request as PDFNEWNET-33507 in our issue tracking system. Our development team is looking into this feature and you will be updated via this forum thread once it is supported.

We apologize for your inconvenience.

Thanks & Regards,

Hi Jaosn,


Thanks for your patience.

Our development team has made significant progress regarding the implementation of the feature to get Font information from a Widget Annotation. So can you please share some sample PDF document that you are using, so that before releasing the new version, we can test this scenario at our end. We really appreciate your cooperation in this regard.

Hi Anthony,

Thanks for your patience.

I am pleased to share that the issue PDFNEWNET-33507 reported earlier has been resolved and its hotfix will be included in upcoming release version of Aspose.Pdf for .NET 7.1.0. Please try using the following code snippet to get Font information from a widget annotation.

[C#]
<!–[if gte mso 9]>
<w:WordDocument>
<w:View>Normal</w:View>
<w:Zoom>0</w:Zoom>
<w:TrackMoves/>
<w:TrackFormatting/>
<w:PunctuationKerning/>
<w:ValidateAgainstSchemas/>
<w:SaveIfXMLInvalid>false</w:SaveIfXMLInvalid>
<w:IgnoreMixedContent>false</w:IgnoreMixedContent>
<w:AlwaysShowPlaceholderText>false</w:AlwaysShowPlaceholderText>
<w:DoNotPromoteQF/>
<w:LidThemeOther>EN-US</w:LidThemeOther>
<w:LidThemeAsian>X-NONE</w:LidThemeAsian>
<w:LidThemeComplexScript>X-NONE</w:LidThemeComplexScript>
<w:Compatibility>
<w:BreakWrappedTables/>
<w:SnapToGridInCell/>
<w:WrapTextWithPunct/>
<w:UseAsianBreakRules/>
<w:DontGrowAutofit/>
<w:SplitPgBreakAndParaMark/>
<w:EnableOpenTypeKerning/>
<w:DontFlipMirrorIndents/>
<w:OverrideTableStyleHps/>
</w:Compatibility>
<m:mathPr>
<m:mathFont m:val=“Cambria Math”/>
<m:brkBin m:val=“before”/>
<m:brkBinSub m:val="–"/>
<m:smallFrac m:val=“off”/>
<m:dispDef/>
<m:lMargin m:val=“0”/>
<m:rMargin m:val=“0”/>
<m:defJc m:val=“centerGroup”/>
<m:wrapIndent m:val=“1440”/>
<m:intLim m:val=“subSup”/>
<m:naryLim m:val=“undOvr”/>
</m:mathPr></w:WordDocument>
<![endif]–><!–[if gte mso 10]>

/* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-parent:""; mso-padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin-top:0in; mso-para-margin-right:0in; mso-para-margin-bottom:10.0pt; mso-para-margin-left:0in; line-height:115%; mso-pagination:widow-orphan; font-size:11.0pt; font-family:"Calibri","sans-serif"; mso-ascii-font-family:Calibri; mso-ascii-theme-font:minor-latin; mso-hansi-font-family:Calibri; mso-hansi-theme-font:minor-latin; mso-bidi-font-family:"Times New Roman"; mso-bidi-theme-font:minor-bidi;}

<![endif]–>

Aspose.Pdf.Facades.Form form = new Aspose.Pdf.Facades.Form(inFile, outFile);

for (int p = 1; p <= form.Document.Pages.Count; p++)

{

AnnotationCollection ann = form.Document.Pages[p].Annotations;

WidgetAnnotation annotation;

for (int i = 1; i <= ann.Count; i++)

{

annotation = (WidgetAnnotation)ann[i];

string fontName = annotation.DefaultAppearance.FontName;

double fontSize = annotation.DefaultAppearance.FontSize;

AppearanceDictionary ad = annotation.Appearance;

Console.Out.WriteLine("annotation name:" + annotation.FullName);

foreach (System.Collections.DictionaryEntry field in ad)

{

Console.Out.WriteLine(" font name:" + ((Aspose.Pdf.XForm)(field.Value)).Resources.Fonts[fontName].FontName);

Console.Out.WriteLine(" font size:" + fontSize);

}

}

}

form.Close();

The issues you have found earlier (filed as PDFNEWNET-33507) have been fixed in this update.


This message was posted using Notification2Forum from Downloads module by aspose.notifier.

Hi Anthony,


Thanks for your patience.

We have further investigated the issue PDFNEWNET-33309 reported earlier and as per our observations, the PDF file which you have shared, contains two fields with different names: TextField1 and TextField3. There are no duplicated fields, but field TextField1 has child annotations associated with this field. You should use Field object as collection to access to child annotations of the field and Count to get a count of child annotations.

[C#]

Document doc = new
Document(TestSettings.GetInputFile(“33309.pdf”));<o:p></o:p>

foreach (Field field in doc.Form)

{

Console.WriteLine(field.PartialName + " " + field.Count);

if (field.Count > 0)

{

Console.WriteLine("Field has child annotations : ");

//enumerate kid annotations and show their position on the page:

foreach (WidgetAnnotation annot in field)

{ Console.WriteLine(" annotation position: " + annot.Rect); }

}

}