I am getting this error when attempting to use Aspose.Pdf to convert a number of PDF documents to plain text within SQL Server:
The way I understand this is that SQLCLR does not allow referenced dlls to access embedded resources. So I was wondering if there is a version of Aspose.Pdf for .Net 2.0 without the use of embedded resources
Msg 6522, Level 16, State 1, Line 3
A .NET Framework error occurred during execution of user-defined routine or aggregate "ExtractTextFromPdf":
System.TypeInitializationException: The type initializer for '.' threw an exception. ---> System.IO.FileNotFoundException: Could not load file or assembly '{4bc43771-9c86-492b-b6fb-5aa535ddb7b4}, PublicKeyToken=3e56350693f7355e' or one of its dependencies. The system cannot find the file specified.
System.IO.FileNotFoundException:
at System.Reflection.Assembly._nLoad(AssemblyName fileName, String codeBase, Evidence assemblySecurity, Assembly locationHint, StackCrawlMark& stackMark, Boolean throwOnFileNotFound, Boolean forIntrospection)
at System.Reflection.Assembly.InternalLoad(AssemblyName assemblyRef, Evidence assemblySecurity, StackCrawlMark& stackMark, Boolean forIntrospection)
at System.Reflection.Assembly.InternalLoad(String assemblyString, Evidence assemblySecurity, StackCrawlMark& stackMark, Boolean forIntrospection)
at System.Reflection.Assembly.Load(String assemblyString)
at ..(Object , ResolveEventArgs )
at System.AppDomain.OnResourceResolveEvent(String resourceName)
at System.Reflection.Assembly._GetResource(String resourceName, UInt64& length, StackCrawlMark& stackMark, Boolean skipSecurityCheck)
at System.Reflection.Assembly.GetManifestResourceStream(String name, StackCrawlMark& stackMark, Boolean skipSecurityCheck)
at System.Reflection.Assembly.GetManifestResourceStream(String name)
at ...cctor()
System.TypeInitializationException:
at ..( )
at ..( )
at ..()
at ...ctor( )
at ..( )
at ..()
at ..( )
at ..()
at ..(String , , , , , , Double , Double , , )
at ..(Int32 , Int32 , , )
at ..(Int32 , )
at ..()
at ..(Queue , , )
at ..( , )
at Aspose.Pdf.Text.TextAbsorber.Visit(Document pdf)
at PdfToText.AsposePdfToText.ExtractText(Byte[] documentBytes)
Instructions to reproduce:
On your DEV box (Win 7 x64, VS 2012)
- Create a new C# library project
- Pick .Net 3.5 in the project properties
- Add a reference to Aspose.Pdf.dll for .Net 3.5 (Version=6.8.0.0, PublicKeyToken=47b2d0fcacdd3eb)
- Note: I also tried this with .Net 2.0 with a corresponding Aspose.Pdf.dll for .Net 2.0 - same error
- Copy-paste code from below
- Compile and build PdfToText project in Release
On your SQL box (Windows 2008 R2 x64, SQL Server 2008 (10.50.2500))
- Create a new folder C:\PdfToText
- Place PdfToText.dll and Aspose.Pdf.Dll assemblies into C:\PdfToText folder
- Open SSMS
- Create a new database called DMS
- Open Query Analyser
- Copy-paste SQL script from below
- Run the script
C# Code :
using System.IO;
using System.Text;
using Aspose.Pdf; //C:\Program Files (x86)\Aspose\Aspose.Pdf for .NET\Bin\net3.5\Aspose.Pdf.dll Version=6.8.0.0, PublicKeyToken=47b2d0fcacdd3eb
using Aspose.Pdf.Text;
using Aspose.Pdf.Text.TextOptions;
using License = Aspose.Pdf.License;
namespace PdfToText
{
public sealed class AsposePdfToText
{
private static readonly License license= new License();
static AsposePdfToText()
{
//license.Embedded = true;
//license.SetLicense("PdfToText.Aspose.Total.lic");
license.SetLicense("");
}
public static string ExtractText(byte[] documentBytes)
{
Stream stream = new MemoryStream(documentBytes);
var document = new Document(stream);
var textAbsorber = new TextAbsorber(new TextExtractionOptions(TextExtractionOptions.TextFormattingMode.Pure));
var extractedText = new StringBuilder();
textAbsorber.Visit(document);
extractedText.Append(textAbsorber.Text);
return extractedText.ToString();
}
}
}
SQL Script:
USE [DMS]
GO
IF OBJECT_ID('[dbo].[ExtractTextFromPdf]') IS NOT NULL
DROP FUNCTION [dbo].[ExtractTextFromPdf];
GO
IF OBJECT_ID('[dbo].[DocumentVersion]') IS NOT NULL
DROP TABLE [dbo].[DocumentVersion];
GO
IF EXISTS (SELECT [name] FROM sys.assemblies WHERE [name] = 'PdfToText')
DROP ASSEMBLY PdfToText;
GO
IF EXISTS (SELECT [name] FROM sys.assemblies WHERE [name] = 'System.Web')
DROP ASSEMBLY [System.Web];
GO
IF EXISTS (SELECT [name] FROM sys.assemblies WHERE [name] = 'System.Windows.Forms')
DROP ASSEMBLY [System.Windows.Forms];
GO
CREATE TABLE [dbo].[DocumentVersion](
[DocumentVersionId] [int] IDENTITY(1,1) NOT NULL PRIMARY KEY,
[Document] [image] NOT NULL,
[FileExtension] [varchar](255) NOT NULL DEFAULT 'PDF',
[Version] [timestamp] NOT NULL,
)
GO
INSERT INTO DocumentVersion (Document)
VALUES (CAST(N'Test Pdf Document With Some Textual Data Should Go Here' as varbinary(max)))
GO
sp_configure @configname=clr_enabled, @configvalue=1
go
RECONFIGURE
go
---------------------------------------------------------------
ALTER DATABASE DMS
SET TRUSTWORTHY ON
GO
CREATE ASSEMBLY [System.Windows.Forms]
AUTHORIZATION dbo
FROM 'C:\Windows\Microsoft.NET\Framework64\v2.0.50727\System.Windows.Forms.dll'
WITH PERMISSION_SET = UNSAFE
GO
CREATE ASSEMBLY [System.Web]
AUTHORIZATION dbo
FROM 'C:\Windows\Microsoft.NET\Framework64\v2.0.50727\System.Web.dll'
WITH PERMISSION_SET = UNSAFE
GO
CREATE ASSEMBLY [PdfToText]
FROM 'c:\PdfToText\PdfToText.dll'
WITH PERMISSION_SET = UNSAFE;
GO
-------------------------------------------------------------------------
CREATE FUNCTION [dbo].[ExtractTextFromPdf](@documentBytes VARBINARY(MAX))
RETURNS nvarchar(MAX)
AS EXTERNAL NAME [PdfToText].[PdfToText.AsposePdfToText].[ExtractText]
GO
--------------------------------------------------------------------------
SELECT
DocumentVersionId,
documentText = dbo.ExtractTextFromPdf(d.Document)
FROM [dbo].[DocumentVersion] d