Pdf to doc conversion produces a null reference error

Hi,
I am trying to convert a pdf document to .doc but get and error as below. The pdf contains ECG diagrams etc. Unfortunately I cannot share the file with you as it was produced within the NHS and is protected by GDPR. Was does this error say. Is there a workaround?

Error in converting from source format pdf to target format: doc, System.NullReferenceException: Object reference not set to an instance of an object.
at #=zwrlaqk0l24zNVxh6XrXGv4fOJs62HUHT9in0F6uA3TJM9h03ug==.#=zRcf_SzZLtmw3h_zL2JFfy9PC1rhwHv_vMQ==(#=zFkTRLAitE1JXKlZSN9LuMTE= #=z$DE8q4mKsnbx, Double& #=zRDkKeWO73TVt35NbyA==, Double& #=zD_usODOf6F0osmKSDA==)
at #=zwrlaqk0l24zNVxh6XrXGv4fOJs62HUHT9in0F6uA3TJM9h03ug==.#=zxiIpuX429A$nRTKXdw==(#=zM8gldwPuW51LR$FlICwXkW1kuA3iXTLPLgYc54SsuJAO #=zkqzN9sE=, #=zg5ebno$rDoMgdIYZOQZI_mq3Xg412ELe5nNJL9toddOO #=zw2bhbiTV_7eI, Boolean #=z4VeG9d7H6dBAvjhc3Q==)
at #=zwrlaqk0l24zNVxh6XrXGv4fOJs62HUHT9in0F6uA3TJM9h03ug==.#=zGZ1QN_M=(#=zM8gldwPuW51LR$FlICwXkW1kuA3iXTLPLgYc54SsuJAO #=zkqzN9sE=, #=zg5ebno$rDoMgdIYZOQZI_mq3Xg412ELe5nNJL9toddOO #=zKocuSST4SQQ_, #=zL8H5iYFNJj2z8wCS5AWMDI2dgUWsLPRkiA== #=zDAme1ao=)
at #=zm0Nm6vA0_BB3XqEjQUvORu5P3BCFdKpBMpCPmomNcHYcW50RKg==.#=zGZ1QN_M=(#=zM8gldwPuW51LR$FlICwXkW1kuA3iXTLPLgYc54SsuJAO #=zkqzN9sE=, #=zg5ebno$rDoMgdIYZOQZI_mq3Xg412ELe5nNJL9toddOO #=zKocuSST4SQQ_)
at #=zPkOgAoks0N8t8aKVrRueQxVUc1EwDvSwxKhpjmc5Ntby.#=z1vTE5bC4JKjpaIeZmQ==(#=zM8gldwPuW51LR$FlICwXkW1kuA3iXTLPLgYc54SsuJAO #=zkqzN9sE=, List1 #=zBmdhfbs=) at #=zPkOgAoks0N8t8aKVrRueQxVUc1EwDvSwxKhpjmc5Ntby.#=zRPYijLs5cMH5wNFAmQ==(#=zM8gldwPuW51LR$FlICwXkW1kuA3iXTLPLgYc54SsuJAO #=zkqzN9sE=, List1 #=zBmdhfbs=)
at #=zPkOgAoks0N8t8aKVrRueQxVUc1EwDvSwxKhpjmc5Ntby.#=zeRS8cHJ2j5Cj(#=zM8gldwPuW51LR$FlICwXkW1kuA3iXTLPLgYc54SsuJAO #=zkqzN9sE=, List1 #=z56SDIiMC4KGk) at #=zPkOgAoks0N8t8aKVrRueQxVUc1EwDvSwxKhpjmc5Ntby.#=zTNCRf7Y=(#=zVHjcSoyqDDZssI6VM1YwrvqJvqen #=zlSxcLBk=, #=zRG_05qgBhqz6wd74$czdqSsFCYQ1g6OIpw== #=z0mcW988=, #=zM8gldwPuW51LR$FlICwXkW1kuA3iXTLPLgYc54SsuJAO #=zkqzN9sE=) at #=zMy5LTYmcoxjWOgkzRMYzbHS9wOymG8ag_w==.#=zxusxeEg=(Int32 #=zkeku1Mc=, IList1 #=z_9XFXFc8QOT9k5_PLw==, #=zUJCvsdv3KeEO #=zCFcLX4A=)
at #=zMy5LTYmcoxjWOgkzRMYzbHS9wOymG8ag_w==.#=zTNCRf7Y=()
at #=zJi9gvmQAFEXMydBmx2ixaMEEYyRY.#=zXV_z796vDDAB(Document #=zlSxcLBk=, #=zM8gldwPuW51LR$FlICwXkW1kuA3iXTLPLgYc54SsuJAO& #=z4UPnTVOl_mqXqPijxg==, UnifiedSaveOptions #=zIO$KG6o=, Int32& #=zc0GE08BS_20V, Boolean #=zpKCPzDs=)
at #=zJi9gvmQAFEXMydBmx2ixaMEEYyRY.#=zRb56u8_kGuH6r97WgA==(Document #=zuhbINn8O67xG, #=zM8gldwPuW51LR$FlICwXkW1kuA3iXTLPLgYc54SsuJAO& #=zkqzN9sE=, UnifiedSaveOptions #=zIO$KG6o=, Int32& #=zJMJ9RuZqZa49, Boolean #=zpKCPzDs=)
at #=z6ToAXSRQETOwkFyWz3wHO1A=.#=zKJLnpuA=(Document #=zlSxcLBk=, Stream #=z9tmF_tppWnDG, DocSaveOptions #=zIO$KG6o=)
at Aspose.Pdf.Document.#=zs$2Ej5XJ9k_O(Stream #=z4bKpSMrylvwg, SaveOptions #=zIO$KG6o=)
at Aspose.Pdf.Document.#=zHJlNC$orLHsf(Stream #=z9tmF_tppWnDG, SaveOptions #=zIO$KG6o=)
at Aspose.Pdf.Document.#=zHJlNC$orLHsf(Stream #=z9tmF_tppWnDG, SaveFormat #=zrnkcU4g=)
at Aspose.Pdf.Document.#=zHJlNC$orLHsf(String #=zbFZxNwRXV7ss, SaveFormat #=zrnkcU4g=)
at Converters.PdfDocumentCls.Save(String path, OutputFormats outputFormat)
at Converters.DocumentConverter.PdfConverter.SaveConvertedDocument(String path, OutputFormats outputFormat)
at Converters.DocumentConverter.SaveTargetFileToWinvoice(OutputFormats outputFormat)
at Converters.DocumentConverter.SaveTargetFilesToWinvoice(OutputFormats sourceFormat)

@reechi

The System.NullReferenceException you are encountering during the conversion of a PDF document to a DOC format typically indicates that the code is trying to access an object that has not been initialized. This can happen for various reasons, especially when dealing with complex PDF documents that may contain elements like diagrams or images.

Here are some steps you can take to troubleshoot and potentially work around this issue:

  1. Check the PDF Document: Ensure that the PDF document is not corrupted and is compatible with the Aspose.PDF library. Sometimes, specific elements in the PDF can cause issues during conversion.

  2. Use Save Options: When converting, you can specify DocSaveOptions to control the conversion process. This can help in managing how the content is interpreted during the conversion. For example, you can set the recognition mode to Flow, which may help in handling complex layouts better.

    Here’s a sample code snippet demonstrating how to use DocSaveOptions:

    using Aspose.Pdf;
    
    // Load the PDF document
    Document pdfDocument = new Document("path/to/your/document.pdf");
    
    // Create DocSaveOptions object
    DocSaveOptions saveOptions = new DocSaveOptions();
    saveOptions.Mode = DocSaveOptions.RecognitionMode.Flow; // Set recognition mode
    saveOptions.RelativeHorizontalProximity = 2.5f; // Set horizontal proximity
    saveOptions.RecognizeBullets = true; // Enable bullet recognition
    
    // Save the document to DOC format
    pdfDocument.Save("path/to/output/document.doc", saveOptions);
    
  3. Update Aspose.PDF Library: Ensure that you are using the latest version of the Aspose.PDF library. Updates often include bug fixes and improvements that may resolve issues like the one you are experiencing.

  4. Simplify the Document: If possible, try to simplify the PDF document by removing complex elements or converting it to a simpler format before using Aspose.PDF for conversion.

  5. Contact Support: If the issue persists, consider reaching out to Aspose support with details about the PDF document structure (without sharing the document itself) and the specific error message. They may provide insights or fixes for your specific case.

By following these steps, you should be able to either resolve the issue or find a suitable workaround for converting your PDF document to DOC format.

Hi
Replying on behalf of my colleague @reechi.
Thank you for your quick response and advice. Unfortunately…

  1. These Pdfs are not corrupted.
  2. I have added code matching the snippet and we have re-tested, but the same error message is returned.
  3. We updated to Aspose.Pdf version 25.3.0.0 last week and were getting the error on that version. I see there is now a version 25.4.0.0 available…I have tried that now, but the same error occurs.
  4. Unfortunately simplifying the PDF documents is not an option.
  5. Can you advise how we should contact support with this issue? Can this be done directly from this forum?
    Thanks, Steve

@SteveCalder

Are you using the API in a non-Windows environment? If yes, please switch to Aspose.Pdf.Drawing instead of Aspose.PDF for .NET. Also, such errors occur when fonts are missing in the system. Please try installing msttcorefonts package. If issue still persists, share your sample document and environment details with us sot that we can test the scenario in our environment and address it accordingly.

Just to keep you updated, we are using Windows. There are some issues with data governance on these documents, so it may take some time to source a sample document. But will let you know. Thanks for your assistance so far.

@SteveCalder

Please take your time to gather the sample file and share with us. We will further proceed accordingly.