Missing data from java XFA extraction

Hi, we are using Aspose Total with java for PDFs to extract raw text from a PDF Xfa form. We are facing an issue where the text extracted is incomplete.
Some of the XFA field values are missing. All the values that are not at the first array level of the extracted data is missing. For example
Form123[0].page][0].person[0].lastname[0]=Smith
Form123[0].page][0].person[0].firtname[0]=John
But if I have a value at
Form123[0].page][0].person[0].lastname[1]=Parker
Form123[0].page][0].person[0].firtname[1]=Peter
That value is not extracted. It is extracted when looping through the XFA fields but not from the “text” version of the XFA PDF.

@brissonp

Would you please share your sample PDF along with the sample code snippet that you are using? We will test the scenario in our environment and address it accordingly.

Attached is the sample XFA I took online as I can’t share our documents.

And the code to extract the test is as follow. Note that none of the form values are extracted or shown in system.out.println.

// Load dynamic XFA form

Document document = new Document(“sampleXFA.pdf”);

// Set the form fields type as standard AcroForm because XFA extraction will return the common PDF message “Please wait…”

document.getForm().setType(FormType.Standard);

// Save the resultant PDF

document.save(“Standard_AcroForm.pdf”);

// The standard acroform document is created nicely

// Re-create a new Document from standard

Document doc = new Document(“Standard_AcroForm.pdf”);

// Create TextAbsorber object to extract text

TextAbsorber textAbsorber = new TextAbsorber();

// Accept the absorber for all the pages

doc.getPages().accept(textAbsorber);

// Get the extracted text

String extractedText = textAbsorber.getText();

doc.close();

System.out.println(extractedText);

Pascal

sampleXFA.pdf (144 KB)

@brissonp

Would you please share the actual requirements of yours. Do you simply want to read the values from the form fields or you want to convert the XFA form into Standard AcroForm? Actually, TextAbsorber Class is used to extract text from PDF and Extracting form fields values is carried out using different methods.