Convert PDF to template based Excel Format using Aspose.PDF for Java

asad.ali · December 5, 2022, 8:44pm

We are afraid that the earlier logged tickets have not been yet resolved. We will update you via this forum threads once we have some updates to share in this regard. Please spare us some time.

We are sorry for the inconvenience.

nishitha · December 12, 2022, 1:24pm

Hi Asad,
PDFJAVA-40869 issue status showed as resolved. Could you please provide the resolution of this issue.

Thank you…

asad.ali · December 12, 2022, 6:35pm

@nishitha

This issue has been fixed in 22.12 version of the API which is an upcoming update. It will be released at the end of this month. We will notify you as soon as new API release is published and available for use.

nishitha · December 14, 2022, 5:45pm

Hi @asad.ali

Thank you for the update.
Could you please try to resolve PDFJAVA-41859 issue also ASAP as we are waiting for this solution to implement this feature in our product.

Thank you…

asad.ali · December 14, 2022, 7:41pm

@nishitha

As it is performance related issues, we are afraid that it may take some time to get fully investigated and fixed. We will however surely consider your concerns and let you know as soon as some progress is made towards its resolution.

We are sorry for the inconvenience.

asad.ali · February 20, 2023, 8:23pm

@nishitha

This is not a bug. For the most accurate recognition of the structure of tables, we need to keep at memory objects on all pages at the same time. The attached document needs ~9G of the heap size. When using more heap size, the conversion is faster. There are out results:

-Xmx12G - 9 min
-Xmx9G - 14 min
-Xmx8G - OutOfMemory

Alternatively, you could split the pdf file into smaller files, convert them separately to Excel and merge them via Aspose.Cells. Please, see the code snippet below. 1G of HeapSize should be enough for it.

Document doc = new Document("pdf_template_file_50k.pdf");
ExcelSaveOptions options = new ExcelSaveOptions();
int index = 0;
Workbook workbook = new Workbook();
workbook.getWorksheets().removeAt(0);

for(Page pdfPage : doc.getPages())
{
    Document newDocument = new Document();
    ByteArrayOutputStream stream = new ByteArrayOutputStream();
    newDocument.getPages().add(pdfPage);
    newDocument.save(stream, options);

    Workbook tempWorkbook = new Workbook(new ByteArrayInputStream(stream.toByteArray()));    
    workbook.combine(tempWorkbook);    
    System.out.println("Page " + (index + 1) + " from " + doc.getPages().size());
    workbook.getWorksheets().get(index).setName("Sheet" + (index + 1));

    index ++;
}

workbook.save("result.xlsx");

nishitha · February 23, 2023, 1:37pm

Hi @asad.ali,
Thank you for the update and sample code.
We will try in this way.

Thank you…