Save to Excel produces invalid XML

Using

new Document("test.pdf").Save(draftPath, SaveFormat.Excel);

produces a .xls file which is an XML file. But it looks like string values in <Data> elements are not encoded properly. So if a PDF file containing <= and saved this way, it produces an invalid XML file. And it breaks .NET XmlReader and Aspose.Cells fails to open it as well.

The issue is observer using Aspose.Pdf version 16.11.0.0, was it fixed in later versions?

@dzmitry.martynau,
We have tested your scenario with the latest version 17.8 of Aspose.Pdf for .NET API and could not replicate said problem in our environment. Kindly download and try the latest version 17.8 of Aspose.Pdf for .NET API, and then let us know how that goes into your environment. If you would be able to replicate the problem, then share the source PDF with us. We will investigate and share our findings with you.

Best Regards,
Imran Rafique

Did you try a PDF file with <= in the text? Using the latest nuget 17.8.0:

sample2.pdf (3.0 KB)

Produces:

<?xml version="1.0"?>
<?mso-application progid="Excel.Sheet"?>
<ss:Workbook xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet">
<ss:Worksheet ss:Name="page 1">
<ss:Table>
<ss:Row> <ss:Cell><ss:Data ss:Type="String"></ss:Data></ss:Cell>
 <ss:Cell><ss:Data ss:Type="String">This symbol breaks:</ss:Data></ss:Cell>
 <ss:Cell><ss:Data ss:Type="String"><=</ss:Data></ss:Cell>
</ss:Row>
</ss:Table>
</ss:Worksheet>
</ss:Workbook>

The text in the last //Cell/Data is not encoded properly <=, instead of

<ss:Cell><ss:Data ss:Type="String"><=</ss:Data></ss:Cell>

should be

<Cell><Data ss:Type="String">&lt;=</Data></Cell>

As a dev I can assume that the file was generated by a custom code (a serious error to be passed during code review) where a more correct/safe way is use a dedicated library, e.g. XmlWriter from BCL.

I had to implement a custom parser to extract data which will probably break if you fix your code, so I had to implement it twice using an XML parser to take over for correct files :frowning:

@dzmitry.martynau,
We managed to replicate the problem of invalid XML in our environment. It has been logged under the ticket ID PDFNET-43225 in our bug tracking system. We have linked your post to this ticket and will keep you informed regarding any available updates. We are sorry for the inconvenience caused.

Best Regards,
Imran Rafique

The issues you have found earlier (filed as PDFNET-43225) have been fixed in Aspose.PDF for .NET 18.4. This message was posted using BugNotificationTool from Downloads module by asad.ali