Save XLSX to PDF results in missing data

When we open an XLSX file with Aspose.Cells and save it as a PDF the saved PDF does not match the XLSX file it came from. Numerous data points and charts have different or missing data.

I attached an Example XLSX file and code for you to reproduce.

SaveToPdfProblemExample.zip (974.5 KB)

You will see that the source Excel file (Test1.xlsx) has all the data correctly displayed.

When we open the file using Aspose and save it as a PDF file, numerous data points display incorrectly.

We can save the source Excel file to a new Excel file and the new Excel file’s data is correct.

Something is happening in Aspose code that is causing the PDF to get corrupted/messed up.

Compare the following between the Excel (source or newly saved) and the PDF to see the data differences:

Page: Chart or Table Name | Explanation

1: Minimum Interest Penalty Summary | The Late Payments line has different values.
1: Minimum Interest | No data in the PDF version.
1: ARP | No data in the PDF version.
2: ARP | The Proj ARP column has all zeroes.
2: Monthly Payment Trends | Missing the Actual Payments and Projected Payments lines.
2: Borrower Payment Detail (Annual) | Projected and % of Projected columns have different values.
3: Borrower Monthly Revenue Detail (TTM) | All columns except Actual have different values.
3: Monthly Revenue Trends | Not lines on the PDF chart.
3: Borrower Revenue Detail (Annual) | No data in the PDF table.

@jcapson
By testing with sample files and code on the latest version v24.7, we can reproduce the issue. Discovered data loss when converting file to PDF.

We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.

Issue ID(s): CELLSNET-56200

You can obtain Paid Support Services if you need support on a priority basis, along with the direct access to our Paid Support management team.

Great, thanks!

@jcapson,

You are welcome. Once we figure it out or have some other updates on your issue, we will let you know.

@jcapson ,

The source file uses Dynamic Array Formulas. Please refresh dynamic array formulas before calling formula calculation.

Code:

//Open an excel file
Workbook workbookForPdf = new Workbook(fileName);

// Set the file name.
string pdfFileName = @"C:\Junk\TestPDFFormat.pdf";
string excelFileName = @"C:\Junk\TestExcelFormat.xlsx";

// Refresh dynamic array formulas before calling formula calculation.
workbookForPdf.RefreshDynamicArrayFormulas(true);

// Calculate the formula's in the workbook to get the correct values.
workbookForPdf.CalculateFormula(false);

// Save the workbook in PDF format.
workbookForPdf.Save(pdfFileName, SaveFormat.Pdf);

// Save the workbook in Xlsx format.
workbookForPdf.Save(excelFileName, SaveFormat.Xlsx);

Thank you for the solution, calling RefreshDynamicArrayFormulas does solve the problem.

I am curious to know:

(a) Why when saving to PDF there is any need to at all to call functions to calculate or refresh formulas? I can’t think of reason to not do it, so I would think it should be done by default.

(b) Why aren’t dynamic array formulas also refreshed by the CalculateFormula call?

Thanks!

@jcapson,

Good to know that calling RefreshDynamicArrayFormulas resolves your issue.

a) It is best to calculate formulas before rendering to PDF because sometimes cached (calculated) values stored in the Excel file’s source data are not updated. We do not calculate formulas by default when rendering to PDF, as formula calculations might be time-consuming for complex spreadsheets.

b) The CalculateFormula method covers/calculates Excel’s standard and advanced formulas but does not calculate dynamic array formulas, so you have to call the RefreshDynamicArrayFormulas() method explicitly if your spreadsheet has those formulas to be calculated. Moreover, see the document for further details on dynamic array formulas for your reference.
https://docs.aspose.com/cells/net/calculation-of-dynamic-array-formulas/

@jcapson
To put it briefly, it is for performance consideration that we do not do those operations automatically. For most of users, they have done the needed operations before saving, or the template file was saved by ms excel with the correctly calculated values already. So there is no need for them to do such kind of time-consuming operations again.

For dynamic array formulas, it is more complicated than common formulas and generally it requires more times to process. For such kind of formulas, sometimes the change of the data they reference may cause the formula’s applied range change, so those formulas need to be refreshed(spilled) before calculation.

However, it is not the requirement for all situations. For example, user may set some dynamic array formulas in program and then calculate the workbook. Because those dynamic array formulas have been spilled correctly when setting, so it is sure the program has no need to refresh them again. Or another example, user just creates those formulas and data in ms excel manually and saves the template file, then converts the template file to pdf in program. For such situation there is no need to refresh and calculate them again in the program either. So, we do not perform the refresh operation automatically either.

When calling CalculateFormula(), those dynamic array formulas will be calculated too, but we do not change their applied range according to the change of referenced data. If user is not sure whether they need to be refreshed, it is better for them to call RefreshDynamicArrayFormulas() explicitly to make sure they can get the expected result.

Thank you Shi and Sahi for the detailed information!

@jcapson
You are welcome. If you have any questions, please feel free to contact us at any time.