Use the following code for
try:
document1 = aw.Document("自訂格式(電商匯入中心)11.pdf")
save_options = aw.saving.MarkdownSaveOptions()
save_options.image_resolution = 150
document1.save(f"自訂格式(電商匯入中心)11.md", save_options)
except Exception:
traceback.print_exc()
file:
自訂格式(電商匯入中心)11.pdf (228.2 KB)
memory use :
@David_Matin
When converting a PDF to Markdown using Aspose.Words, you may encounter high memory usage, especially with large documents. To help mitigate this issue, you can enable memory optimization in the MarkdownSaveOptions
class. This option reduces memory consumption during the saving process, although it may increase the time taken to save the document.
Here’s how you can modify your code to enable memory optimization:
try:
document1 = aw.Document("自訂格式(電商匯入中心)11.pdf")
save_options = aw.saving.MarkdownSaveOptions()
save_options.image_resolution = 150
save_options.memory_optimization = True # Enable memory optimization
document1.save(f"自訂格式(電商匯入中心)11.md", save_options)
except Exception:
traceback.print_exc()
By setting save_options.memory_optimization
to True
, you can significantly decrease the memory footprint during the conversion process. However, keep in mind that this may result in slower performance when saving the document.
For more details on the MarkdownSaveOptions
class and its properties, you can refer to the official documentation here and for memory optimization specifics, check this page.
If you continue to experience issues, consider checking the size and complexity of the PDF document you are converting, as these factors can also impact memory usage.
Sources:
[1]: MarkdownSaveOptions Class | Aspose.Words for .NET
[2]: SaveOptions.MemoryOptimization | Aspose.Words for .NET
@David_Matin It looks like the problem occurs becasue the image cannot be loaded properly. On my side Aspose.Words throws an exception upon loading your PDF document.
We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.
Issue ID(s): WORDSNET-27991
You can obtain Paid Support Services if you need support on a priority basis, along with the direct access to our Paid Support management team.
The issues you have found earlier (filed as WORDSNET-27991) have been fixed in this Aspose.Words for .NET 25.4 update also available on NuGet.
Hello!I has updated aspose-words to 25.4.0.Now a new problem has arisen.
document = aw.Document(source_file)
RuntimeError: Proxy error(FileLoadException): The file cannot be opened. It might have unsupported format or be corrupted. —> TypeInitializationException: The type initializer for ‘fY’ threw an exception. —> TypeInitializationException: The type initializer for ‘SkiaSharp.SKImageInfo’ threw an exception. —> …
@David_Matin Could you please describe the environment where the problem occurs? We will check the issue on our side and provide you more information.
@alexey.noskov in k8s,ubuntu,python3.8,newest aspose-words
@David_Matin Thank you for additional information. Unfortunately, I cannot reproduce the problem on my side using the following simple code and Dockerfile:
doc = aw.Document("/temp/in.docx")
doc.save("/temp/out.pdf")
FROM ubuntu:20.04
ENV DEBIAN_FRONTEND=noninteractive
RUN apt update
RUN apt install -y python3.8 python3-pip
RUN apt install -y pkg-config libicu-dev
RUN python3.8 -m pip install aspose-words
# Copy function code
COPY app.py ./
ENTRYPOINT ["python3.8", "app.py"]
@alexey.noskov Did you use my document? I am saving the pdf in markdown format.
https://drive.google.com/file/d/1tVqPTe_CE0-GNuGS5sh-9ErIRm_586RZ/view?usp=sharing
code:
document = aw.Document(source_file)
save_options = aw.saving.MarkdownSaveOptions()
save_options.image_resolution = 300
document.save(f"{result}.md", save_options)
@David_Matin I have tested with your input document and still the problem is not reproducible on my side. The provided PDF document is successfully converted to Markdown.
@alexey.noskov Does this have anything to do with parsing the file after I split it? I first use aspose-pdf(24.3.0) to split the pdf file into a number of pdf according to the page number, and then these pdf conversion md
@David_Matin Could you please attach the PDF document that is passed to Aspose.Words and causes the problem? We will test with it and let you know the result.
this is file:
https://drive.google.com/file/d/1tVqPTe_CE0-GNuGS5sh-9ErIRm_586RZ/view?usp=sharing
@David_Matin Thank you for additional information. unfortunately, the problem is still not reproducible on my side. Please try with the docker file provided above and let us know if the problem is reproducible on your side.
@alexey.noskov Our Linux version is CentOS Linux release 7.9.2009 (Core).I reproduced the problem on this version of the machine.Same code and file,same error message.
@David_Matin I tested with CentOS and still the problem is not reproducible on my side.
Dockerfile:
FROM centos/python-38-centos7
USER root
# Install ICU package.
RUN yum -y install icu
WORKDIR /usr/app/src
# Copy function code
COPY app.py ./
RUN pip install aspose-words
ENTRYPOINT [ "python3", "app.py"]
app.py:
doc = aw.Document("/temp/in.pdf")
save_options = aw.saving.MarkdownSaveOptions()
save_options.image_resolution = 300
doc.save("/temp/out.md", save_options)
this time I used in linux server not in k8s
@David_Matin Unfortunately, I still cannot reproduce the problem on my side.
@alexey.noskov After updating the version, an new error was reported :
Unhandled exception. System.TypeInitializationException: The type initializer for 'SkiaSharp.SKObject' threw an exception.
---> System.DllNotFoundException: Unable to load shared library 'libSkiaSharp' or one of its dependencies. In order to help diagnose loading problems, consider setting the LD_DEBUG environment variable: liblibSkiaSharp: cannot open shared object file: No such file or directory
at SkiaSharp.SkiaApi.sk_version_get_milestone()
at SkiaSharp.SkiaSharpVersion.get_Native()
at SkiaSharp.SkiaSharpVersion.CheckNativeLibraryCompatible(Boolean throwIfIncompatible)
at SkiaSharp.SKObject..cctor()
--- End of inner exception stack trace ---
at SkiaSharp.SKObject.DeregisterHandle(IntPtr handle, SKObject instance)
at SkiaSharp.SKObject.set_Handle(IntPtr value)
at SkiaSharp.SKNativeObject.Dispose(Boolean disposing)
at SkiaSharp.SKObject.Dispose(Boolean disposing)
at SkiaSharp.SKBitmap.Dispose(Boolean disposing)
at SkiaSharp.SKNativeObject.Finalize()