ModuleNotFoundError: No module named 'com.aspose'

Hello There, I am getting an error while running the simple python code in Spark Cluster.

ModuleNotFoundError: No module named ‘com.aspose’

—> 10 from com.aspose.cells.wrapper import StreamBuffer
11 @JImplementationFor(“com.aspose.cells.wrapper.StreamBuffer”)
12 class _StreamBuffer(object):
13 @JOverride(sticky=False)
14 def write(self, chunk):

ModuleNotFoundError: No module named ‘com.aspose’

The same code is working fine without any issues in a python notebook. is there any settings to be activated to run asposecells in SPARK notebook?

Please advise.

SparkSession - in-memory

SparkContext

Spark UI

Version
v3.0.1

Python 3.10.12

image.png (11.4 KB)

while jpype works, unable import modules from aspose.

image.png (23.2 KB)

@ramkumar.rajamani
From the provided screenshot information, it appears that there is an issue with your import. Workbooks do not exist in Cells. Please refer to the following code.

from asposecells.api import Workbook

Regarding the classes contained in the Aspose.Cells for Python via Java library, please refer to the APIs document.

Also, would you like to provide sample code and file? We will check it soon.

@ramkumar.rajamani
It seems that the aspose.cells jars are not loading correctly. Please try the following code:

import jpype
import asposecells

jpype.startJVM(classpath=[r"path_to_jars/*"])
print(jpype.getClassPath())
from asposecells.api import Workbook

Please change “path_to_jars” to the directory where JavaClassBridge.jar is located. Thanks.

Thanks Nick. Tried your suggestion and got the same error.

image.png (43.3 KB)

@ramkumar.rajamani
It seems like you have installed aspose-cells, but the Python interpreter cannot find the installed package. Did you install the package using pip or conda? You can try uninstalling and reinstalling the package or restarting the Spark kernel. Alternatively, try manually deleting the aspose-cells package installed in the site-packages directory.

Thanks Zhu, module was installed using pip. since this is IBM spectrum spark cluster, planning to add this jars to the extra class path of drivers and executors to see if this goes away.

thanks for your inputs, will try reinstalling modules as well.

@ramkumar.rajamani
Sure, for now, let’s first consider the machine environment configuration. If you have any further questions, feel free to communicate at any time.

Circling back on this issue.
I did try reinstalling and adding the JARS, extraclass path, restarting kernel. I still have the same issue. no module named com.aspose. i tried importing using py4j using spark jvm and doesnt help.
quick question. is it working for you in PYSPARK environment?

7 from jpype.types import *

8 from jpype import imports

—> 10 from com.aspose.cells.wrapper import StreamBuffer

11 @JImplementationFor(“com.aspose.cells.wrapper.StreamBuffer”)

12 class _StreamBuffer (object):

13 @JOverride(sticky=False )

14 def write(self, chunk):

ModuleNotFoundError: No module named ‘com.aspose’

@ramkumar.rajamani
It has not been tested in a PYSPARK environment. We will set up a PYSPARK environment and test it. Thank you.

@ramkumar.rajamani
I tested Jupyter Notebook on Ubuntu 22.04 with the following steps and it worked fine:

  1. install default-jdk (openjdk 11)
  2. install python (3.10.12)
  3. pip install aspose-cells (24.2.0)
  4. pip install pyspark (3.5.0)
  5. pip install jupyter
  6. append the following line to ~/.bashrc
    export PATH=$PATH:/home/dev/.local/bin
  7. run source ~/.bashrc
  8. run jupyter-notebook
  9. create a ‘Notebook’ in Jupyter and run the example code to test aspose-cells

import jpype
import asposecells
jpype.startJVM()
from asposecells.api import Workbook, FileFormatType

workbook = Workbook(FileFormatType.XLSX)
workbook.getWorksheets().get(0).getCells().get(“A1”).putValue(“Hello World”)
workbook.save(“output.xlsx”)
print(“hello”)

jpype.shutdownJVM()

jupyter-notebook.png (52.0 KB)

Please check the difference between my test environment and yours. Thank you.

Thanks for your help Nick. It works for me too in a python kernel.

image.png (12.4 KB)

it doesnt work when I do the same in Spark Python.

image.png (33.0 KB)

all our process (batch) runs via spark python kernel via batch and we kind of stuck and unable to move forward with our proof of concept using asposecells due to this module not found error while running this job via spark python kernel.

Hi @ramkumar.rajamani
We’ll run some more tests, and we’ll let you know if we make any progress.

@ramkumar.rajamani
My previous steps worked to run aspose-cells in the Python kernel. But how to run it in Spark Python? Please give detailed steps so that I can reproduce this issue. Thanks.