ASPOSE.PDF IsBlank() method taking too much time and got stuck for heavy documents

Hi Team,

We reported couple of issues related 21.3 ASPOSE, as suggested we migrated to 22.2 version , we observed few empty page issues during our conversion process using aspose, , to remove those empty pages we are using ASPOSE.PDF IsBlank() after merging , but during this process it is taking too much time , for Ex:605 pages to merge it took 6 sec but to remove blank or empty pages it is taking 2+ mins, do we have any alternative for this?

foreach (Aspose.Pdf.Page page in pdfDocument.Pages)
{
i = i + 1;
bool value = page.IsBlank(0.01d);
if (value)
{
pdfDocument.Pages.Delete(i);
}
}

@Haranath

Could you please attach your input, problematic and expected output PDF files here for testing? We will investigate the issue and provide you more information on it.

input_output_merged.zip (2.5 MB)

attached input excels and merged pdf

@Haranath

It seems that you are using Aspose.Cells to convert Excel file to PDF. Could you please share the complete source code that you are using here for further testing?

Aspose.Cells.License license = new Aspose.Cells.License();
license.SetLicense(AppDomain.CurrentDomain.BaseDirectory + “\Aspose.Total.lic”);
FileFormatInfo fileFormatInfo = FileFormatUtil.DetectFileFormat(pInFile);
Aspose.Cells.LoadOptions loadOptions;
if (fileFormatInfo.FileFormatType == FileFormatType.Html)
{
Aspose.Cells.HtmlLoadOptions htmlLoadOptions = new Aspose.Cells.HtmlLoadOptions
{
AutoFitColsAndRows = true
};
loadOptions = htmlLoadOptions;
}
else
{
loadOptions = new Aspose.Cells.LoadOptions();
}
loadOptions.CheckExcelRestriction = false;
Workbook workbook = new Workbook(pInFile, loadOptions);
workbook.Settings.CheckExcelRestriction = false;
Aspose.Cells.PdfSaveOptions pdfSaveOptions = new Aspose.Cells.PdfSaveOptions();
// option to set all the columns of excel in one page.
pdfSaveOptions.AllColumnsInOnePagePerSheet = true;
pdfSaveOptions.MergeAreas = true;
/* Retain the structure of original excel /
pdfSaveOptions.ExportDocumentStructure = true;
/
Formula calculation for any digit formulla applied in excels /
/it is best to call Workbook.CalculateFormula() just before rendering the spreadsheet to PDF.
* This ensures that the formula dependent values are recalculated, and the correct
* values are rendered in the PDF.
/
workbook.CalculateFormula();
/
For custom named range, need below code to display number instead of ## */
Range[] range = workbook.Worksheets?.GetNamedRanges();
// workbook.Worksheets.get
if (range != null && range.Length > 0)
{
foreach (var r in range)
{
if (!r.Address.Contains(":"))
{
r.Worksheet.Cells[r.Address].PutValue(r.Value?.ToString());
}

                }
            }
            /* For some sheet, print area is applied, to display all the data in pdf, 
             * below setting is needed */
            foreach (Worksheet wks in workbook.Worksheets)
            {
                foreach (Cell cell in wks.Cells)
                {
                    Style style = cell.GetStyle();
                    /* if cell is numeric, consider its category type as numeric and update the format */
                    if ((cell.DisplayStringValue.Contains("#") ||
                            (cell.DisplayStringValue.ToLower().Contains("e") &&
                            cell.Type == CellValueType.IsNumeric)))
                    {
                        style.HorizontalAlignment = TextAlignmentType.Right;
                        cell.PutValue(cell.Value.ToString());
                    }
                    /* If cell is numeric minus value and display having space between value
                        and minus sign, use below condition to remove the space in between */
                    else if (cell.Type == CellValueType.IsNumeric && cell.Value.ToString().Contains("-")
                            && cell.DisplayStringValue.Contains(" "))
                    {
                        cell.PutValue(Regex.Replace(cell.DisplayStringValue, @"\s+", ""));
                        style.HorizontalAlignment = TextAlignmentType.Right;
                    }
                    style.VerticalAlignment = TextAlignmentType.Center;
                    cell.SetStyle(style);
                }
                wks.PageSetup.PrintArea = "";
                wks.PageSetup.BottomMargin = 1;
                wks.PageSetup.LeftMargin = 1;
                wks.PageSetup.RightMargin = 1;
                wks.PageSetup.TopMargin = 1;
            }
            workbook.Save(pOutFile, pdfSaveOptions);

once PDF is created , we need to remove blacnk pages for that i am using below code

Aspose.Pdf.License licenseKey = new Aspose.Pdf.License();
licenseKey.SetLicense(“Aspose.Total.lic”);
PdfFileEditor pfe = new PdfFileEditor();
pfe.Concatenate(inputFiles, outputFile);

            //Deleting blank pages
            using (Aspose.Pdf.Document pdfDocument = new Aspose.Pdf.Document(outputFile))
            {
                int i = 0;

                foreach (Aspose.Pdf.Page page in pdfDocument.Pages)
                {
                    i = i + 1;
                    bool value = page.IsBlank(0.01d);
                    if (value)
                    {
                        pdfDocument.Pages.Delete(i);
                    }
                }

                pdfDocument.Save(outputFile);
            }

@Haranath,

I used your code segments to convert both Excel files to PDF via Aspose.Cells for .NET v22.4 (latest version). The output PDFs are attached here. We will now evaluate your original issue regarding Aspose.PDF API where IsBlank method is taking more time as you pointed out. We will get back to you with further updates after testing the scenario.
files1.zip (644.9 KB)

@Haranath

We have logged this problem in our issue tracking system as PDFNET-51697. You will be notified via this forum thread once this issue is resolved.

We apologize for your inconvenience.