Convert PDF to Word Docs

Trying to convert pdf to word in python. Using the docs as is. Getting the following error:

Traceback (most recent call last): File "/home/jupyter/Tahoe_Test/Aspose/EnclaveTesting/WordHandler.py", line 69, in conv_pdf document.save(output_stream, save_options) RuntimeError: Proxy error(NullReferenceException): Object reference not set to an instance of an object.

I am following the instructions from documentation. Here is my function:

def conv_pdf(self, pdf_bytes):
        try:
            # Load the PDF document from bytes
            print("Loaded the document from bytes")
            with io.BytesIO(pdf_bytes) as input_stream:
                document = apdf.Document(input_stream)

            # Configure save options for DOCX format
            print("Configuring save options")
            save_options = apdf.DocSaveOptions()
            save_options.format = apdf.DocSaveOptions.DocFormat.DOC_X
            
            with io.BytesIO() as output_stream:
                print("Saving output")
                document.save(output_stream, save_options)
                print("Returning output")
                return output_stream.getvalue()

Removing the save options causes it to work, but obviously then the stream is a pdf not a word formatted document.

@acn

It seems you are encountering a NullReferenceException when trying to save a PDF document as a Word document using Aspose.PDF in Python. This error typically indicates that there is an issue with the object you are trying to use, which in this case is likely related to the save_options or the document object.

Here are a few steps you can take to troubleshoot and resolve the issue:

  1. Check the PDF Document: Ensure that the PDF document you are trying to convert is valid and not corrupted. You can do this by opening it in a PDF viewer.

  2. Review Save Options: Make sure that the DocSaveOptions object is being configured correctly. You can try simplifying the options to see if a specific setting is causing the issue. For example, you can start with just setting the format and see if that works:

    save_options = apdf.DocSaveOptions()
    save_options.format = apdf.DocSaveOptions.DocFormat.DOC_X
    
  3. Use a Basic Save Method: As you mentioned, removing the save options allows the conversion to work. You can try saving the document without any options first, and then gradually add options back in to identify which one is causing the problem.

  4. Check for Updates: Ensure that you are using the latest version of the Aspose.PDF library, as bugs and issues are often fixed in newer releases.

  5. Example Reference: You can refer to examples from the Aspose documentation or GitHub repositories that demonstrate how to convert PDF to DOCX. For instance, there are examples in C# that show how to use DocSaveOptions effectively, which might provide insights into how to structure your code correctly.

Here is a simplified version of your function that you can try:

def conv_pdf(self, pdf_bytes):
    try:
        with io.BytesIO(pdf_bytes) as input_stream:
            document = apdf.Document(input_stream)

        save_options = apdf.DocSaveOptions()
        save_options.format = apdf.DocSaveOptions.DocFormat.DOC_X
        
        with io.BytesIO() as output_stream:
            document.save(output_stream, save_options)
            return output_stream.getvalue()
    except Exception as e:
        print(f"An error occurred: {e}")

If the issue persists, consider reaching out to the Aspose support team with the specific details of your PDF document and the error message for further assistance.

Sources:
[1]: PDFToDOC.cs
[2]: ConvertPDFToDOCOrDOCXFormat.java

I tried all this it did not work

@acn

Would you please make sure if you have initialized and set the license before using any functionality of the API? Also, please make sure that all Windows fonts are properly installed in the system. In case issue still persists, please share your complete environment details like OS Name and Version, API Version, Py Version, etc. We will log an investigation ticket and share the ID with you.