Aspose.Words.Pdf2Word.dll Embedded in MVC Project fails to load after deployment, works locally

gabriel.vega · October 3, 2024, 2:00pm

I have an MVC project on .NET Core. My DDL has embedded Aspose.Words.dll and Aspose.Words.Pdf2Word.dll. I am loading a PDF file using this line:

var document = new Aspose.Words.Document(OriginalStream, pdfLoadOptions);

When I tested my code on localhost, the files loaded successfully. But when the site is deployed, I’m getting this exception:

‘Pdf document reader plugin cannot be loaded. Make sure Aspose.Words.Pdf2Word.dll is referenced in your project and placed next to Aspose.Words.dll’.

These are the lines I use to embed Aspose.Words.

[assembly: Obfuscation(Feature = "embed Aspose.Words.dll", Exclude = false)]
[assembly: Obfuscation(Feature = "embed Aspose.Words.Pdf2Word.dll", Exclude = false)]

If I remove the embedded Aspose.Words.Pdf2Word.dll the error changes, as it says it is not found. Which means that it is embedded correctly but it cannot load it.

Thanks in advance!

Professionalize.Discourse · October 3, 2024, 2:00pm

@gabriel.vega

It seems that the issue you are encountering with the Aspose.Words.Pdf2Word.dll not loading after deployment is likely due to the DLL not being included in the deployment package. This can happen because the DLL is not explicitly referenced in your project, leading to it being stripped during the deployment process.

To resolve this issue, you can take the following steps:

Ensure the DLL is Deployed: First, make sure that Aspose.Words.Pdf2Word.dll is actually present in the deployment directory alongside Aspose.Words.dll. You can manually copy it from the installation folder of Aspose.Words for .NET if necessary.
Add a Reference: Even though you are embedding the DLL, you should also add a reference to Aspose.Words.Pdf2Word.dll in your Visual Studio project. This ensures that it is included in the build and deployment process.
Prevent Stripping: Since Aspose.Words.Pdf2Word.dll is used via reflection, it might be stripped during the deployment. To prevent this, you can add a dummy class to your project that references the PdfDocumentReaderPlugin. Here’s an example of how you can do this:
```
internal class LinkerPleaseInclude
{
    internal void Include()
    {
        Aspose.Words.Pdf2Word.PdfDocumentReaderPlugin pdf2Word = new Aspose.Words.Pdf2Word.PdfDocumentReaderPlugin();
    }
}
```
This code will ensure that the Pdf2Word.dll is not stripped during the deployment process.

By following these steps, you should be able to resolve the loading issue of Aspose.Words.Pdf2Word.dll in your deployed MVC project. If the problem persists, please check the deployment logs for any additional errors or warnings that might provide further insight.

Sources:
[1]: Generate PNG Images from PDF Pages using C# | Deploy on Azure or Use Aspose.Words.Pdf2Word.dll via Reflection | PDF Document Reader Plugin
[2]: Generate PNG Images from PDF Pages using C# | Deploy on Azure or Use Aspose.Words.Pdf2Word.dll via Reflection | PDF Document Reader Plugin

gabriel.vega · October 3, 2024, 2:17pm

Thank you for the response. I already tried all the suggested solutions. Despite these efforts, the issue persists, and the exception still occurs after deployment, while everything works fine locally.

Is there anything else I could try to resolve this, or perhaps a more specific configuration I might be overlooking?

Thanks again for your help!

alexey.noskov · October 3, 2024, 2:20pm

@gabriel.vega Aspose.Words.Pdf2Word dll is loaded via reflection in Aspose.Words code. So most likely after embedding and obfuscation Aspose.Words cannot find the Aspose.Words.Pdf2Word assembly.

gabriel.vega · October 7, 2024, 6:39pm

@alexey.noskov,

Thanks for the answer. Below I attach an example project to better explain the problem.

https://we.tl/t-WXqrZmvl5B

I attached the source code of the sample project setup involves embedding and obfuscating the necessary libraries, including Aspose.Words.dll and Aspose.Words.Pdf2Word.dll, within a custom example DLL called ConverterClassLibrary.dll. I’m using Eazfuscator.NET to manage the obfuscation and embedding process. This library contains the logic for converting PDF to Word using Aspose.Words.

Pdf2WordAppDemo, It’s the .NET Core console application that consumes ConverterClassLibrary.dll to process PDF files by converting them to Word and then to HTML.

When attempting to convert a PDF document using ConverterClassLibrary.dll, the following exception is raised: Aspose.Words.DocumentReaderPluginLoadException: ‘Pdf document reader plugin cannot be loaded. Make sure Aspose.Words.Pdf2Word.dll is referenced in your project and placed next to Aspose.Words.dll’

As a temporary workaround, I am converting the PDF to Word first, then performing a second conversion from Word to HTML. This additional step incurs a significant memory overhead, making the process less efficient than directly converting from PDF to HTML.

I understand that Aspose.Words.Pdf2Word.dll is likely being loaded via reflection within the Aspose.Words codebase, but I would like to know if there is another potential solution for this case.

Thanks for your help.

alexey.noskov · October 8, 2024, 4:19am

@gabriel.vega Thank you for additional information. it looks like my first assumption was right and Aspose.Words cannot locate Aspose.Words.Pdf2Word.dll when it is embedded. Unfortunately, at the moment we cannot suggest you any workaround. We will further investigate the scenario and provide you more information.

We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.

Issue ID(s): WORDSNET-27451

You can obtain Paid Support Services if you need support on a priority basis, along with the direct access to our Paid Support management team.

gabriel.vega · October 24, 2024, 4:21am

Hi Aspose Team,

In my current project, I’m unable to use the embedded Aspose.Words.Pdf2Word.dll, so I’ve implemented the following workaround:

I’m converting a PDF to HTML and then back to PDF using Aspose.Words. Before the conversion, I save the page dimensions and margins like this:

var page = Document.Pages[1];

var pageDetails = new PageDetails
{
    Width = page.PageInfo.Width,
    Height = page.PageInfo.Height,
    //Width = page.MediaBox.Width,
    //Height = page.MediaBox.Height,
    MarginTop = page.PageInfo.Margin.Top,
    MarginBottom = page.PageInfo.Margin.Bottom,
    MarginLeft = page.PageInfo.Margin.Left,
    MarginRight = page.PageInfo.Margin.Right
};

I’ve tried storing the page height and width using both PageInfo and MediaBox. Afterward, I convert the PDF to Word, and the Word file looks well-formatted. I then convert it to HTML, make some edits, and convert it back to Word and finally to PDF.

However, I’m running into two issues:

Page dimensions are lost when generating the final PDF – After converting back to PDF, the page dimensions don’t retain their original values.
Character encoding issues – Despite setting the encoding to UTF-8 when converting Word to HTML, some characters are still not correctly encoded. Here’s the configuration I’m using:

HtmlSaveOptions htmlSaveOptions = new HtmlSaveOptions();
htmlSaveOptions.Encoding = System.Text.Encoding.UTF8;

Thanks for your help!

alexey.noskov · October 24, 2024, 6:04am

@gabriel.vega I am afraid, such complex roundtrip cannot guaranty conversion fidelity, because PDF, HTML and MS Word document are quite different. Conversion between these models cannot be done without loses.
We logged an issue WORDSNET-21713, to fully integrate Pdf2Word.dll into Aspose.Words.dll. So it will not be required to use two dlls to load PDF documents into Aspose.Words DOM.

As a temporary workaround you can consider using Pdf2Word.dll dirrectly:

public WordConverter(Stream originalStream)
{
    try
    {
        var loadOptions = new Aspose.Words.Loading.LoadOptions
        {
            LoadFormat = LoadFormat.Pdf,
        };

        Document = new Aspose.Words.Document();
        var pdfReaderPlugin = new Aspose.Words.Pdf2Word.PdfDocumentReaderPlugin();
        pdfReaderPlugin.Read(originalStream, loadOptions, Document);

    }
    catch (Exception ex)
    {
        throw ex;
    }
}

aspose.notifier · November 11, 2024, 12:05pm

The issues you have found earlier (filed as WORDSNET-21713) have been fixed in this Aspose.Words for .NET 24.11 update also available on NuGet.