DataLoss when converting excel to pdf

Hi,
I am converting excel file to pdf with option as to fit all the sheet content in single page. After generating pdf, there is some data loss.
TestTrunk.zip (395.0 KB)

As you can see that, in columns A,C,F etc… instead of “ooooo”, there’s only “ooo”.
Expected behavior is if column cannot accommodate content in the allotted width, I would expected the data to decrease its font but not delete some part of the text.
FYI, The styles I am applying on the sheet

var tempFileName = System.IO.Path.GetTempPath() + "TestTrunk.pdf";
var worksheet = workbook.Worksheets[0];
var saveOptions = new Aspose.Cells.PdfSaveOptions
{
    GridlineType = GridlineType.Hair,
    OptimizationType = PdfOptimizationType.MinimumSize,
};
Style style = workbook.CreateStyle();
style.IsTextWrapped = true;
StyleFlag flag = new StyleFlag();
flag.WrapText = true;
if (range != null)
{
    range.ApplyStyle(style, flag);
}
var pageSetup = worksheet.PageSetup;
 pageSetup.PrintHeadings = true;
pageSetup.Order = PrintOrderType.DownThenOver;
pageSetup.Orientation = PageOrientationType.Portrait;
pageSetup.PaperSize = PaperSizeType.PaperLetter;
pageSetup.PrintComments = PrintCommentsType.PrintSheetEnd;
pageSetup.FitToPagesTall = 1;
pageSetup.FitToPagesWide = 1;
pageSetup.BlackAndWhite = false;
pageSetup.HeaderMargin =0.3;
pageSetup.FooterMargin = 0.3;
pageSetup.PrintGridlines = true;
workbook.Save(tempFileName, saveOptions);

@shreyap,

Since you are applying re-wrap text for the data range and you don’t auto-fit rows in code, so some columns’ cells will show truncated contents. You should not re-wrap text in code. You may call auto fit rows and columns functions to show the data properly. See the sample code that will work better.
e.g.,
Sample code:

var tempFileName = "e:\\test2\\TestTrunk1.pdf";
Workbook workbook = new Workbook("e:\\test2\\TestTrunk.xlsx");
var worksheet = workbook.Worksheets[0];
worksheet.AutoFitRows();
worksheet.AutoFitColumns();
var saveOptions = new Aspose.Cells.PdfSaveOptions
{
    GridlineType = GridlineType.Hair,
    OptimizationType = PdfOptimizationType.MinimumSize, 
         
};
var pageSetup = worksheet.PageSetup;
pageSetup.PrintHeadings = true;
pageSetup.Order = PrintOrderType.DownThenOver;
pageSetup.Orientation = PageOrientationType.Portrait;
pageSetup.PaperSize = PaperSizeType.PaperLetter;
pageSetup.PrintComments = PrintCommentsType.PrintSheetEnd;
pageSetup.FitToPagesTall = 1;
pageSetup.FitToPagesWide = 1;
pageSetup.BlackAndWhite = false;
pageSetup.HeaderMargin =0.3;
pageSetup.FooterMargin = 0.3;
pageSetup.PrintGridlines = true;

workbook.Save(tempFileName, saveOptions);

Please find attached the output PDF for your reference.
TestTrunk1.pdf (264.8 KB)

image.png (553 Bytes)
@amjad.sahi, I am sorry for the above wrong reference. I am applying autofit on rows and columns in data.

var tempFileName = System.IO.Path.GetTempPath() + "TestTrunk.pdf";
var worksheet = workbook.Worksheets[0];
var saveOptions = new Aspose.Cells.PdfSaveOptions
{
    GridlineType = GridlineType.Hair,
    OptimizationType = PdfOptimizationType.MinimumSize,
};
Style style = workbook.CreateStyle();
style.IsTextWrapped = true;
StyleFlag flag = new StyleFlag();
flag.WrapText = true;
if (range != null)
{
    range.ApplyStyle(style, flag);
}
worksheet.AutoFitColumns();
 worksheet.AutoFitRows();
 var pageSetup = worksheet.PageSetup;
 pageSetup.PrintHeadings = true;
pageSetup.Order = PrintOrderType.DownThenOver;
pageSetup.Orientation = PageOrientationType.Portrait;
pageSetup.PaperSize = PaperSizeType.PaperLetter;
pageSetup.PrintComments = PrintCommentsType.PrintSheetEnd;
pageSetup.FitToPagesTall = 1;
pageSetup.FitToPagesWide = 1;
pageSetup.BlackAndWhite = false;
pageSetup.HeaderMargin =0.3;
pageSetup.FooterMargin = 0.3;
pageSetup.PrintGridlines = true;
workbook.Save(tempFileName, saveOptions);

In the pdf you have shared, L6 is still truncated. It is supposed to be ‘jkj’, I can only see ‘jk’. And also some of the content of column C and A is shifted towards right. Complete content is not visible.

@shreyap,

Yes, I noticed the issue when converting your template Excel file to PDF using my pasted sample code. I found some data in columns is still truncated or is not displayed fully in the cells when converting your Excel file to PDF.

We require thorough evaluation of the issue. We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.

Issue ID(s): CELLSNET-56759

You can obtain Paid Support Services if you need support on a priority basis, along with the direct access to our Paid Support management team.

@shreyap
There are many rows and columns in the worksheet. If FitToPagesTall and FitToPagesWide are set as 1, the print scale becomes 10% ( the min limit value). Everything should be scaled to 10%, so the width of column and height of row becomes very small. In order to ensure the minimum font size, cropping is necessary. And it’s hard to read the pdf.
If you only want to print all columns to one page, you can only set pageSetup.FitToPagesWide and remove pageSetup.FitToPagesTall = 1;

And you can set PdfSaveOptions.AllColumnsInOnePagePerSheet property as the following codes:

 // pageSetup.FitToPagesTall = 1;
 // pageSetup.FitToPagesWide = 1;
  saveOptions.AllColumnsInOnePagePerSheet = true;

@simon.zhao ,
I would like to have all rows and columns in single sheet

@shreyap,

In that case, you should use OnePagePerSheet of PdfSaveOptions instead. I tested using the following sample code (please try it) and it works fine and the output PDF is fine tuned (there is no truncated data (data loss) or other issue).
e.g.,
Sample code:

var tempFileName = "e:\\test2\\TestTrunk_new1.pdf";
Workbook workbook = new Workbook("e:\\test2\\TestTrunk.xlsx");
var worksheet = workbook.Worksheets[0];
worksheet.AutoFitRows();
worksheet.AutoFitColumns();
var saveOptions = new Aspose.Cells.PdfSaveOptions
{
    GridlineType = GridlineType.Hair,
    OptimizationType = PdfOptimizationType.MinimumSize,
    OnePagePerSheet = true 
         
};
var pageSetup = worksheet.PageSetup;
pageSetup.PrintHeadings = true;
pageSetup.Order = PrintOrderType.DownThenOver;
pageSetup.Orientation = PageOrientationType.Portrait;
pageSetup.PaperSize = PaperSizeType.PaperLetter;
pageSetup.PrintComments = PrintCommentsType.PrintSheetEnd;
pageSetup.BlackAndWhite = false;
pageSetup.HeaderMargin =0.3;
pageSetup.FooterMargin = 0.3;
pageSetup.PrintGridlines = true;

workbook.Save(tempFileName, saveOptions);

Please find attached the output PDF file for your reference.
TestTrunk_new1.pdf (264.1 KB)

Let us know if you find any issue with the output PDF?

@amjad.sahi
Thank you for the help. It solves the problem of truncating but I see that the page is scaled up to an unusually large size. It is not honoring the paperSize value which I have set to letter.

@shreyap
Thank you for your feedback. We will further investigate your issue. In addition, if you want to put all the data on the worksheet on one page, it will inevitably cause a crowded situation, and you can only see the clear data by zooming in when viewing the PDF.

@John.He , I understand that it is difficult to add so much content in a single sheet. We are ready to zoom in and look at it but we do not want the data to be truncated.

@shreyap ,

It is the feature of OnePagePerSheet, the paper size set in page setup is ignored when OnePagePerSheet is set to true.

When the minimum zoom 10% is set to the sheet, there may be some issues here and there. If you try it in Excel, and manually save the file to pdf in Excel. You will find that the pdf generated by Excel loses lots of data.
Anyway, we will investigate it further to see whether we can improve the output pdf.