Unable to parse form XML from PDF file

We are attempting to pull the XML(or even just the FORM XML) from the following file but we get an error on the sceen.

Could not execute the test pdf parser activity due to one or more errors:
AsposeWrapper.Asposepdf.reflectedInvokeMethod: AsposeWrapper.Asposewords.acceptAllRevisions() Exception occurred System.IO.IOException: Read pdf error:Trailer is not found. at xeb116a323308e2f7.x7759a935a2782a02.readPdf() at xeb116a323308e2f7.x7759a935a2782a02..ctor(String filename, SByte[] ownerPassword) at Aspose.Pdf.Kit.Form.xedff4d4fd296f454() at Aspose.Pdf.Kit.Form..ctor(String srcFileName) at AsposeWrapper.Asposepdf.extractXML(EntityType context, String sourceFilePathNew)

Hi,

Thank you for considering Aspose.

Can you please provide your code and let us check it?

Aspose.Pdf.Pdf asposePDF = new Aspose.Pdf.Pdf();

String sourceFilePath="c:\\installers\\test.pdf";

Aspose.Pdf.Kit.Form frm= new Aspose.Pdf.Kit.Form(sourceFilePath);

//Manipulate the paths and files
String sourceFileName = Path.GetFileName(sourceFilePath);
....

String tempFilePath = Path.Combine(tempDirectory, outputFileName);

//Create a new xml file to contain the content of the pdf.
System.IO.FileStream xmlOutputStream = new FileStream(tempFilePath, FileMode.Create);
//Export all the pdf fields' value into the xml file.
frm.ExportXfdf(xmlOutputStream);
xmlOutputStream.Close();

Hi,

I have tested with Aspose.Pdf.Kit v 2.6.4.4 and was not able to reproduce the error.

Thanks.

Following the upgrade of both PDF and PDF Kit the file is returning XML:

<?xml version="1.0" encoding="utf-8" ?>

[**-**](http://webrps02.webridge.com/Grants/Doc/0/7V62N9GTVFTKN3EC1HBLCN113D/oppN62473-08-R-RELEASE-cfda12.300.pdf#) <xfdf xmlns="**http://ns.adobe.com/xfdf/** " xml:space="**preserve** ">

<fields />

</xfdf>

However, Adobe Acrobat Exports a different file when you export it. Try

Menu item Document:Forms:Export Data looks like the following

How can I extract this XML?

<?xml version="1.0" encoding="UTF-8" ?>

**-** <xfa:data xmlns:xfa="**http://www.xfa.org/schema/xfa-data/1.0/** ">

**-** <grantwrapper:GrantApplicationWrapper xmlns:glob="**http://apply.grants.gov/system/Global-V1.0** " xmlns:grant="**http://apply.grants.gov/system/MetaGrantApplication** " xmlns:grantwrapper="**http://apply.grants.gov/system/MetaGrantApplicationWrapper** " xmlns:header="**http://apply.grants.gov/system/Header-V1.0** ">

**-** <grant:GrantApplication xmlns:globLib="**http://apply.grants.gov/system/GlobalLibrary-V1.0** ">

**-** <header:GrantSubmissionHeader glob:schemaVersion="**1.0** ">

<glob:HashValue glob:hashAlgorithm="**SHA-1** ">**UjBsR09EbGhjZ0dTQUxNQUFBUUNBRU1tQ1p0dU1GUXhEUzhi**</glob:HashValue>

<header:AgencyName>**NAVAL FACILITIES ENGINEERING COMMAND**</header:AgencyName>

<header:CFDANumber>**12.300**</header:CFDANumber>

<header:ActivityTitle>**Basic and Applied Scientific Research**</header:ActivityTitle>

<header:OpportunityID>**N62473-08-R-RELEASE**</header:OpportunityID>

<header:OpportunityTitle>**Release of Captive Bred San Clemente Loggerhead Shrike on San Clemente Island, California**</header:OpportunityTitle>

<header:OpeningDate>**2008-01-07**</header:OpeningDate>

<header:ClosingDate>**2008-01-17**</header:ClosingDate>

<header:SubmissionTitle>**dsdsd**</header:SubmissionTitle>

</header:GrantSubmissionHeader>

**-** <grant:Forms>

**-** <RR_PerformanceSite:RR_PerformanceSite xmlns:RR_PerformanceSite="**http://apply.grants.gov/forms/RR_PerformanceSite-V1.1** " RR_PerformanceSite:FormVersion="**1.1** ">

**-** <RR_PerformanceSite:PrimarySite>

Dear djgilles

Thanks for considering our products!

Please try Form.ExportXml( ) instead of Form.ExportXfdf( ). I would like to remind you that XFDF and XML are two different kinds of data formats.

Best regards.

Changing the call to exportXML gave me the correct XML as you suggested. Thanks!

However, there is another issue. The XML when extracted from Adobe PDF viewer is complete. The Aspose exported version gives the following error.

[**-**](http://webrps02.webridge.com/Grants/Doc/0/1SAITJV6684KTFFR7EDGNH4CC1/oppN62473-08-R-RELEASE-cfda12.300.pdf.xml#) <RR_SF424:EstimatedProjectFunding>

<RR_SF424:TotalEstimatedAmount />

<RR_SF424:TotalfedNonfedrequested />

<<SPAN

The XML page cannot be displayed

Cannot view XML input using XSL style sheet. Please correct the error and then click the Refresh button, or try again later.

End tag 'SFLLL:FederalProgramName' does not match the start tag 'SFLLL:LobbyingActivitiesDisclosure'. Error processing reso...

Can you take a look into this issue? thanks!

Hi,

I have checked the pdf and was able to reproduce this error. I have logged this as PDFKITNET-4330 in our issue tracking system. We eill try our best to resolve this issue as soon as possible.

Thanks.

Would you like some additional PDFs to take a look at? Let me know if I can provide more files to help you.

Hi,

If you have any more files which are giving same issues then please post them here.

Thanks.

I’ve attached some additoinal files. Is there any update on this issue?

Dear dijilles,

The samples you provided are much different with ours. We need some times to solve this problem, and the ETA is one week. Thanks for your patience.

Best regards.

Dear dijiles,

A good news is that this problem has been solved. It will be included in the coming new release.

Best regards.

Hi,

Please try new version 3.0.0.0.