Knowing the X-Y- location of fields after converting them from Word to PDF?

I have a situation where I want to create a template in word, fill it in using aspose.words except I want to leave some fields blank. These fields are used by a later process.


At this point I want to convert the Word Doc to PDF for insertion into a downstream system but I also need to be able to tell that system about the unfilled fields either by name or by X,Y location. Is there an elegant way in aspose.total to have fields in a word form and “convert” them to fields in a PDF form?

I appreciate any advice!

Hi John,

Thanks for your inquiry. Aspose.Words uses our own Rendering Engine to layout documents into pages. The Aspose.Words.Layout namespace provides
classes that allow to access information such as on what page and where
on a page particular document elements are positioned, when the document
is formatted into pages. Please read about LayoutCollector and
LayoutEnumerator from here:

jfraser:

I also need to be able to tell that system about the unfilled fields either by name or by X,Y location.

Please use following code example to get the X,Y location of a Field.

Document doc = new Document(MyDir + “in.docx”);

LayoutCollector layoutCollector = new LayoutCollector(doc);

LayoutEnumerator layoutEnumerator = new LayoutEnumerator(doc);

var collection = doc.GetChildNodes(NodeType.FieldStart, true);

foreach (FieldStart fStart in collection)

{

var renderObject = layoutCollector.GetEntity(fStart);

layoutEnumerator.Current = renderObject;

RectangleF location = layoutEnumerator.Rectangle;

Console.WriteLine(location.X);

Console.WriteLine(location.Y);

}

jfraser:

Is there an elegant way in aspose.total to have fields in a word form and “convert” them to fields in a PDF form?

Could you please share some more detail about this query along with input Word document and expected output Pdf? We will then provide you more information about your query.

Tahir, excellent reply thank you for that information.


I believe I might have found the answer to the last part of my question using:

aspose.words.saving.pdfsaveoptions.preserveformfields

To clarify what I want to do for that second part. I want to be able to have form fields created in my word template, populate SOME of them, convert to PDF and then be able to extract the X,Y locations of the fields.

So is there a “layoutEnumerator.Rectangle” equivlant in Apose.PDF and will the method I mention above (“preserve form fields”) do what I think it will in terms of converting the word document to PDF but keeping form fields available for referencing in PDF?

Assuming everything above is true, my final question would be:

"What is the best way to name them in word and then make that naming schema available on the PDF side?"

Example:

I have a field in a word template that I want to call “name.first” representing the first name of someone. If I save to PDF and preserve the form fields can I access it by the same name in the PDF? what property is that on the field in both word and pdf?

Thanks again for your help, the example code you provide is EXTREMELY helpful!

Hi John,


Thanks for sharing the details.

I would like to answer from Aspose.Pdf perspective. This API provides the feature to create as well as manipulate existing PDF file. It also provides the capability to add form fields, fill form fields, get information regarding form fields and remove forms fields from PDF document. As per your requirements, you can use this API to get information about all form fields inside the PDF document. For further details, please visit


PS, you can get form field information by using its name or index. When adding form fields inside Word file, please ensure you provide appropriate name to each field so that you can use the same name while accessing field information from PDF document. However for certain PDF files, if you are not certain about form field name, please follow the instructions specified over Identifying form fields names

Nayyer, thanks for your response.

You said "When adding form fields inside Word file, please ensure you provide appropriate name to each field so that you can use the same name while accessing field information from PDF document."


This is my exact issue. You see I have a word document with content controls in it. I have a specific title and tag that I give those fields and bind some data into it.

At this point I save to PDF with the flag "PreserveFormFields" It does indeed preserve the fields however it does not preserve the naming and instead it appears to give them a random generic number as it's name.

Example: a simple word doc with a plain text control with title and tag called "signer/firstname"

after saving to pdf, the naming is "1389394364" and I can't find any reference back to the original field name which is vitally important for me.

Any thoughts or is this a bug in the code? Is there a propertly on the word field that I need to specifically use to get the name to translate over to the PDF? See the attached documents

Hi John,

The control used in your sample document is a content control (structured document tag). PreserveFormFields option supports the legacy form fields and not the content controls.

You can use form fields in your case or let us know if you want to use content controls in any case. We will log a new feature request in case of content controls because this feature is not supported at the moment.

Best Regards,

Muhammad, thank you for the reply. Yes, in my case the older form fields are less than ideal so please put in that feature request.


In the meantime is there a way that I could fake it? Example, would the X,Y, Width, Height be the same and/or the order? If I maintained a list from the word doc is there any way to associate them to the new document? It would be ugly and slow but it would be a good short term step. I don’t plan to have any that overlap etc.

I appreciate the help!

Hi John,

This issue has been logged into our issue tracking system as WORDSNET-11148. We will keep you updated on this issue in this thread.

As far as order and X, Y, Width and Height etc. are concerned, these should remain the same because content controls are rendered like shapes, but still there is no guaranty if this is a reliable solution until preserve content controls feature is available.

Best Regards,

Hi John,

Thanks for being patient.

Regarding WORDSNET-11148, our development team has completed the work on your issue and has come to a conclusion that we won't be able to fix this issue in Aspose.Words. Please do the following steps to workaround this issue:

  1. Set PdfSaveOptions.PreserveFormFields to true when saving to pdf, to save the SDT nodes as AcroFrom fields in PDF.
  2. Use StructuredDocumentTag.Id to specify the AcroForm name.

I hope, this helps.

Best regards,

The issues you have found earlier (filed as ) have been fixed in this update. This message was posted using BugNotificationTool from Downloads module by MuzammilKhan