使用aspose.words中的save，将pdf转换为md格式，进行到第19页卡住

David_Matin · May 29, 2024, 7:50am

代码如下：

result = get_filename_without_ext(source_file)
document = aw.Document(source_file)
document.save(f"{result}.md")

我已经把pdf按页数拆分为若干小pdf，循环对这些pdf转换，在使用这个文件时，当进行到第19页，流程卡住，等待一个小时也未继续，没有报错
VDA 6.2-2004_en.pdf (1.7 MB)

alexey.noskov · May 29, 2024, 8:02am

@David_Matin
We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.

Issue ID(s): WORDSNET-27028

You can obtain Paid Support Services if you need support on a priority basis, along with the direct access to our Paid Support management team.

David_Matin · May 29, 2024, 9:08am

please

David_Matin · May 29, 2024, 11:32am

test_page19.pdf (637.0 KB)

或者使用这个文件，我在代码中，使用@timeout(5)都无法结束这个操作

David_Matin · May 29, 2024, 11:32am

import aspose.pdf as ap
import aspose.words as aw
from timeout_decorator import timeout
lic_dir = './Aspose.lic'
license = ap.License()
license.set_license(lic_dir)
aw.License().set_license(lic_dir)
@timeout(5)
def test():
    document = aw.Document("test_page19.pdf")
    document.save("test", aw.SaveFormat.MARKDOWN)
if __name__ == '__main__':
    test()

alexey.noskov · May 29, 2024, 11:51am

@David_Matin 我们已成功重现该问题并将其记录到我们的缺陷跟踪系统中。问题解决后我们会通知您。对于给您带来的不便，我们深表歉意。

David_Matin · May 30, 2024, 6:38am

现在还有一个文件，我使用一个300页的pdf，根据页数拆分为300份pdf，然后使用document.save(f"{result}.md") ，但内存使用超过了12G，出现了oom

David_Matin · May 30, 2024, 6:53am

本地测试如图所示
%OFS[GO[@2KM[A]H554J}WH.png (17.7 KB)

vyacheslav.deryushev · May 30, 2024, 12:29pm

@David_Matin 您能否与我们共享该文件，以便我们调查该问题？

David_Matin · May 31, 2024, 8:13am

这个是内存占用过大的文件，使用转换方法卡住的文件在上面有列出

vyacheslav.deryushev · May 31, 2024, 12:24pm

@David_Matin 感谢您报告此问题。我们已经在我们的内部问题跟踪系统中打开了以下新工单，并将根据免费支持政策中提到的条款提供它们的修复：

Issue ID(s): WORDSNET-27039

如果您需要优先支持以及直接联系我们的付费支持管理团队，您可以获得付费支持服务。