OutOfMemory Error During XLS TO PDF convertation

Hi,

I'm trying to convert file from xls to pdf format and receive java.lang.OutOfMemoryError on file with size 3MB.

Here is link to github project https://github.com/ximagination80/TEST1 and java jar file is on attachment

Could you please clarify, can we use your libarary on linux instance without graphical interface? What instance characteristics should be used for correct library work?

Current amazon EC2 instance configuration:

[ec2-user@ip-172-30-0-62 ~]$ lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 1 On-line CPU(s) list: 0 Thread(s) per core: 1 Core(s) per socket: 1 Socket(s): 1 NUMA node(s): 1 Vendor ID: GenuineIntel CPU family: 6 Model: 62 Model name: Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz Stepping: 4 CPU MHz: 2494.070 BogoMIPS: 4988.14 Hypervisor vendor: Xen Virtualization type: full L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 25600K NUMA node0 CPU(s): 0

[ec2-user@ip-172-30-0-62 ~]$ java -version openjdk version "1.8.0_31" OpenJDK Runtime Environment (build 1.8.0_31-b13) OpenJDK 64-Bit Server VM (build 25.31-b07, mixed mode)

1GB RAM

java -jar test.jar

    NAME: Book1.xls
    SIZE: 0
    TIME: 2413

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space at com.aspose.cells.zatl.n(Unknown Source) at com.aspose.cells.zatl.a(Unknown Source) at com.aspose.cells.zan.c(Unknown Source) at com.aspose.cells.zan.a(Unknown Source) at com.aspose.cells.zan.a(Unknown Source) at com.aspose.cells.zan.a(Unknown Source) at com.aspose.cells.zan.b(Unknown Source) at com.aspose.cells.zan.c(Unknown Source) at com.aspose.cells.zan.a(Unknown Source) at com.aspose.cells.a.d.zeq.a(Unknown Source) at com.aspose.cells.zbuz.e(Unknown Source) at com.aspose.cells.zbuz.a(Unknown Source) at com.aspose.cells.Workbook.a(Unknown Source) at com.aspose.cells.Workbook.save(Unknown Source) at Main.Main$$anonfun$transform$1.apply$mcV$sp(Main.scala:36) at Main.Main$.withTimeCalculating(Main.scala:19) at Main.Main$.transform(Main.scala:32) at Main.Main$$anonfun$1.apply(Main.scala:14) at Main.Main$$anonfun$1.apply(Main.scala:13) at scala.collection.immutable.List.foreach(List.scala:381) at Main.Main$.delayedEndpoint$Main$Main$1(Main.scala:13) at Main.Main$delayedInit$body.apply(Main.scala:7) at scala.Function0$class.apply$mcV$sp(Function0.scala:34) at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12) at scala.App$$anonfun$main$1.apply(App.scala:76) at scala.App$$anonfun$main$1.apply(App.scala:76) at scala.collection.immutable.List.foreach(List.scala:381) at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:35) at scala.App$class.main(App.scala:76) at Main.Main$.main(Main.scala:7) at Main.Main.main(Main.scala)

Also 100% CPU was used during several minutes. Regards, Alexey

Hi,

Thanks for your posting and using Aspose.Cells.

We have looked into this issue and found that your workbook (Book2.xls) has a worksheet which contains more than 900 pages. You cannot render all of these pages into a single page. There OnePagePerSheet option as true will not work and will consume your CPU and will throw out of memory exception. Please set OnePagePerSheet to false to get rid of this exception.

I have attached the screenshot explaining your issue for a reference.

So your code should be like this. Changes are highlighted in red.

Java
val workbook = new Workbook(name)
val saveOptions = new PdfSaveOptions()
saveOptions.setOnePagePerSheet(false)
workbook.save(s"result$name$index.pdf", saveOptions)

Hi,

Thanks for quick answer.

Book2.xls has only 3 pages.

On 1-st page contains ~8400 rows

On 2-d page contains ~600 rows

On 3-d page contains ~10 rows

As I found conversation takes about 1.2 GB RAM.

It means, that library is extracts all data into RAM and process it after extraction.

As I could say from my experience it will be more correct to process data as stream of rows.

It will consume less RAM and "OutOfMemory" error will not occur. Hope, my comment is useful

Regards, Alexey

Hi,

Thanks for quick answer.

Book2.xls has only 3 pages.

On 1-st page contains ~8400 rows

On 2-d page contains ~600 rows

On 3-d page contains ~10 rows

As I found conversation takes about 1.2 GB RAM.

It means, that library is extracts all data into RAM and process it after extraction.

As I could say from my experience it will be more correct to process data as stream of rows.

It will consume less RAM and "OutOfMemory" error will not occur. Hope, my comment is useful

Regards, Alexey

Hi Alexey,

Thanks for your posting and using Aspose.Cells.

I have rechecked the Book2.xls and sheet named Orders contain 8400 rows and when you check this sheet in Microsoft Excel Print Preview, you will see there are 972 pages.

972 pages cannot fit into a single page. So OnePagePerSheet will not work. Please check my screenshot again and you will see there are 972 pages.

Once, you will set OnePagePerSheet to false, you will not get any exception. Also, you can use SheetRender.PageCount property to get the number of pages inside the worksheet programmatically.