Content Controls not Preserved during Converting XML to PDF

As I see the issue logged it says “Content Controls not Preserved during Converting XML to PDF”.
Is aspose provide way to preserve content controls in PDF also?

@mhtsharma9,

Thanks for your inquiry. Please ZIP and upload your input Word document and Aspose.Words generated PDF file showing the undesired behavior here for testing. We will investigate the issue on our end and provide you more information.

I am asking is aspose provide any way by which we can see content controls of the word document in the converted PDF?

As par our system, we have a word document with content controls. And we will convert that word document into PDF for review process. And the bu annotating PDF an authorised person will review the document. That annotations we need to match to content controls.

But the challenge is Word document works on ranges and PDF works on co-ordinates. So if Aspose is giving us any capability to preserve content controls in PDF document also we can easily get that to which content controls the annotation belongs.

@mhtsharma9,

Please set PdfSaveOptions.PreserveFormFields to true when saving to PDF to save the StructuredDocumentTag nodes (Content Controls) as AcroFrom fields in PDF.

Document doc = new Document("D:\\Temp\\input.docx");

PdfSaveOptions opts = new PdfSaveOptions();
opts.PreserveFormFields = true;

doc.Save("D:\\Temp\\18.7.pdf", opts);

You can also use StructuredDocumentTag.Id to specify the AcroForm name. Hope, this helps.

If you are looking for something else, please ZIP and attach the following resources here for testing:

  • Your simplified input Word document
  • Aspose.Words generated output PDF file showing the undesired behavior
  • Your expected PDF file showing the correct output. Please create this document by using MS Word.

As soon as you get these pieces of information ready, we will start further investigation into your above issue and provide you more information. Thanks for your cooperation.

Thanks for quick response.
There are some more queries,
Can we block the content control for editing in PDF? No content control should be editable.
And how we can provide names for these forms?

There are some issues or expected behaviors in the converting content controls.

  1. Even if content in content control is more than one line, then also converted PDF shows that in one line. (SimpleTextControl SDT in provided zip file)
  2. For block level SDTs, it is not cnverting the content control to AcroFrom field. It is similar to WORDSNET-16962. (BlockLevelContentControl in provided zip file)
    UserFiles.zip (63.9 KB)

@mhtsharma9,

Please try using the following code:

Document doc = new Document("D:\\Temp\\UserFiles\\SimpleTestDoc2.docx");

int i = 0;
foreach (StructuredDocumentTag sdt in doc.GetChildNodes(NodeType.StructuredDocumentTag, true))
{
    sdt.Title = "title_" + i;
    // here you can set anymore properties of content controls
    i++;
}

PdfSaveOptions opts = new PdfSaveOptions();
opts.PreserveFormFields = true;

PdfEncryptionDetails encryptionDetails = new PdfEncryptionDetails(string.Empty, "password", PdfEncryptionAlgorithm.RC4_128);
encryptionDetails.Permissions = PdfPermissions.DisallowAll;
encryptionDetails.Permissions = PdfPermissions.ContentCopy | PdfPermissions.ContentCopyForAccessibility | PdfPermissions.DocumentAssembly |
                                PdfPermissions.HighResolutionPrinting |
                                PdfPermissions.Printing;

opts.EncryptionDetails = encryptionDetails;

doc.Save("D:\\Temp\\UserFiles\\18.7.pdf", opts);

To address this problem, we have logged the following issue:
WORDSNET-17180: Multi-Line Content Control renders as a Single Line control in PDF

To address this problem, we have logged the following issue:
WORDSNET-17181: Block level SDT not converting to Editable AcroFrom field

Your thread has also been linked to these issues and you will be notified via this thread as soon as these issues are resolved. Sorry for the inconvenience.

Above shared code is not setting the Name to the form field in PDF.
Can you please verify?
I have checked SDT’s title and Properties (Name, Partial Name, Full Name) of form field of PDF are not matching.

@mhtsharma9,

Please share your expected PDF document containing the form field control with Name attribute set. Please also share a screenshot showing the Name attribute that you want to set in PDF. Thanks for your cooperation.

Any progress on WORDSNET-17180 & WORDSNET-17181?

@mhtsharma9,

Unfortunately, these issues are not resolved yet. Please check below the current status of these issues:

WORDSNET-17180: The implementation of this issue has been postponed till a later date and there are no estimates available at the moment.

WORDSNET-17181: We are currently doing analysis of this issue to determine the root cause.

We will inform you via this thread as soon as these issues are resolved. We apologize for your inconvenience.

I’ve attached the docx document and its generated using the above given code.
test.zip (203.2 KB)

I tried to fetch the form field information as:

void testPDF(String PdfLoadOptions) {
    Document pdfDocument = new Document("./test2.pdf", "password");
    foreach (Aspose.Pdf.Forms.Field formField in pdfDocument.Form)
    {
        string name = formField.PartialName;
        string value = formField.Value;
        string content = formField.Contents;
        Console.WriteLine("name");
        Console.WriteLine(name);
                
        Console.WriteLine("value");
        Console.WriteLine(value);
           
        Console.WriteLine("content");
        Console.WriteLine(contents);
    }
}

And get the following response:

name
-1035726730
value
Enter Sponsor Name
formfield

name
-1133400152
value
<<Compound Number>>
formfield

name
-1133400152_121
value
<<Compound Number>>
formfield
.
.
.

@mhtsharma9,

I am afraid, your query is not clear enough. Are you getting new problems with Aspose.PDF API? Or is it still related to Aspose.Words? Please provide more details on the issue you mentioned in your previous post. We will then investigate the issue on our end and provide you more information.

Here I replied in response of this.

@mhtsharma9,

We are working on your query and will get back to you soon.

@mhtsharma9,

The Name field which you see on the Text Field Properties screen corresponds to the StructuredDocumentTag.Id property. This property is currently read-only in Aspose.Words and is ‘system generated’. We have logged your requirement in our issue tracking system i.e. allow to update StructuredDocumentTag.Id property. Your ticket number is WORDSNET-17810. We will further look into the details of this problem and will keep you updated on the status of the linked issues. We apologize for your inconvenience.

Can you tell how can we get the title property in PDF which we set at the time of conversion?

@mhtsharma9

We have tested the scenario from perspective of Aspose.PDF and found that Aspose.Words API is converting Form Fields into Annotations while generating PDF from input DOC file. Also, while converting form fields into annotations, Aspose.Words API is not keeping the original names Annotation_Name.png (5.1 KB). As soon as the conversion issue (i.e. DOC to PDF) is resolved, you may use following code snippet with Aspose.PDF to extract/retrieve the names of annotations:

Document doc = new Document(dataDir + "test2.pdf");
// Loop through all the annotations
foreach (Annotation annotation in doc.Pages[1].Annotations)
{
  var name = annotation.Name;
}
1 Like

@mhtsharma9,

Regarding WORDSNET-17810, it is to update you that we have completed the analysis of this issue and come to a conclusion that we would not be able to implement the fix to this issue. Your issue (WORDSNET-17810) has now been closed with ‘Won’t Fix’ resolution. Please see the following analysis details:

According to specification ISO_IEC_29500-1_2011

17.5.2.38 sdtPr (Structured Document Tag Properties)
17.5.2.18 id (Unique ID)

This element specifies a unique numerical ID for the parent structured document tag. This ID shall be persisted through multiple sessions (i.e. shall not be changed once specified). If multiple structured document tags specify the same decimal number value for the id attribute, then the first structured document tag in the document shall maintain this original ID, and all subsequent structured document tags shall have new identifiers assigned to them when the document is opened. If this element is omitted, then the parent structured document tag shall have a new unique identifier assigned to it when the document is opened.

The issues you have found earlier (filed as WORDSNET-17181) have been fixed in this Aspose.Words for .NET 21.11 update also available on NuGet.