Convert pdf to markdown cause 'OOM'

python==3.8.1,aspose-words==24.10.0 code: import aspose.words as aw document = aw.Document(source_file) document.save(f"{result}.md")

exception:  File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 613, in result_iterator
    yield fs.pop().result(end_time - time.monotonic())
  File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 439, in result
    return self.__get_result()
  File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 388, in __get_result
    raise self._exception
  File "/usr/local/lib/python3.8/concurrent/futures/thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/kai_wis_ai_dps/kai_wis_ai_dps/app/service/document_convert_service.py", line 302, in convert_to_md
    document = aw.Document(source_file)
RuntimeError: Proxy error(OutOfMemoryException): Exception of type 'System.OutOfMemoryException' was thrown.

@David_Matin Could you please attach the problematic input PDF here for testing? We will check the issue and provide you more information.

this is pdf url: https://drive.google.com/file/d/1DRCrMLYNXXYcShPzJStqWFCRZYkoGbv_/view?usp=drive_link

@David_Matin Thank you for additional information. But the link is not accessible. Could you please simply attach the file here in the forum?

this is link:
https://drive.google.com/file/d/1DRCrMLYNXXYcShPzJStqWFCRZYkoGbv_/view?usp=drive_link

@David_Matin Thank you for additional information. I tested conversion on my side and cannot reproduce the problem. But the conversion takes about 15 minutes on my side.

Thank u.What is your memory usage?In my pc, it’s about 9G.In linux,memory limit is 8G, cause OOM exception

@David_Matin Yes, memory usage is quite high on my side too.

We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.

Issue ID(s): WORDSNET-27508

You can obtain Paid Support Services if you need support on a priority basis, along with the direct access to our Paid Support management team.

OK, Thank u!

1 Like