Free Support Forum - aspose.com

ISO 8859-1 encoded file names

I have an application that uses Aspose.Words to open and read files where the file name comes from a database field. The database is encoded as ISO 8859-1, but VB.Net handles strings as UTF-8, which means there ends up being some garbage characters generated when the database data is converted. Will this be an issue trying to open files? One way to work around the string converstion that was suggested to me is to try using the System.Text.Encoding class to convert the file name strings to a sequence of bytes. Will this work option work with Aspose.Words? Or is there another way you would recomend to deal with the ISO 8859-1 encoded file names?
This message was posted using Aspose.Live 2 Forum

Hi

<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" />

Thanks for your inquiry. Usually string fields in database are stored not encoded. This is needed for fast search in DB. Could you please tell me which kind of database you are using? Also please provide me code example that shows how you get filenames from DB. If string fields in your DB are encoded then maybe you should decode these fields.

Best regards.

I have been told by the DBA the database is SQL Server 2003 and uses ISO-8859-1 encoding. The code I use to pull the file name from the database is below. Some of the file names being pulled from this database end up having garbage characters in the file names where regular punctuation should be, which is causing a problem. For example,
PI Report_Alzheimer’s disease- current and emerging therapies_October 2007.pdf

should actually be
PI Report_Alzheimer’s disease- current and emerging therapies_October 2007.pdf

The garbage characters in the first name come from the file name being converted to UTF-8 when it is placed into the string variable “filestring”, which then in turn causes the code in the file_convert function to fail because it cannot find the correct file name.

Is there a way to get around this other then changing the file name?

Thank you.


CODE:

From the main function:

Dim SQLConn as ADODB.Connection
SQLConn = new ADODB.Connection
SQLConn.ConnectionString = “Provider= SQLOLEDB.1; Data Source = xxx.xxx.xxx.xxx; Initial Catalog = Clients_MedPanel; User ID = userID; Password= password;Persist Security Info = True”

query = “Select ReportID, ReportName, ReportType, Abstract, Summary, Industry, Category1, Category2, Category3, Category4, Category5, Deliverable, Active, Price, DateCreated, DateModified, DateAvailable from Report Where DateModified between '” & modGlobals.lastdate & “’ and '” & Today & “’ and DateAvailable between '” & modGlobals.lastdate & “’ and '” & Today & “’ By 1”

SSrs = SQLConn.Execute(query)
While Not SSrs.EOF
repID = SSrs.Fields(“ReportID”).Value
repName = SSrs.Fields(“ReportName”).Value
reptype = escape_string((SSrs.Fields(“ReportType”).Value))
If IsDBNull(SSrs.Fields(“Abstract”).Value) Then
abst = “”
desc = “”
Else
abst = abst_convert((SSrs.Fields(“Abstract”).Value))
desc = get_description(abst)
End If
abst = escape_string(abst)
desc = escape_string(desc)
’ summary = escape_string((SSrs.Fields(“Summary”).Value))
If IsDBNull(SSrs.Fields(“Industry”).Value) Then
industry = " "
Else
industry = SSrs.Fields(“Industry”).Value
End If
If IsDBNull(SSrs.Fields(“Category1”).Value) Then
cat1 = 0
Else
cat1 = SSrs.Fields(“Category1”).Value
End If
If IsDBNull(SSrs.Fields(“Category2”).Value) Then
cat2 = 0
Else
cat2 = SSrs.Fields(“Category2”).Value
End If
If IsDBNull(SSrs.Fields(“Category3”).Value) Then
cat3 = 0
Else
cat3 = SSrs.Fields(“Category3”).Value
End If
If IsDBNull(SSrs.Fields(“Category4”).Value) Then
cat4 = 0
Else
cat4 = SSrs.Fields(“Category4”).Value
End If
If IsDBNull(SSrs.Fields(“Category5”).Value) Then
cat5 = 0
Else
cat5 = SSrs.Fields(“Category5”).Value
End If
If IsDBNull(SSrs.Fields(“Deliverable”).Value) Then
filestring = " "
Else
filestring = SSrs.Fields(“Deliverable”).Value
End If

If IsDBNull(SSrs.Fields(“Active”).Value) Then
active = False
Else
active = SSrs.Fields(“Active”).Value
End If
If IsDBNull(SSrs.Fields(“Price”).Value) Then
price = 0
Else
price = SSrs.Fields(“Price”).Value
End If
If IsDBNull(SSrs.Fields(“DateAvailable”).Value) Then
datecreated = Format(DateAdd(DateInterval.Year, 1, Today()), “yyyy%-MM%-dd”)
Else
datecreated = Format(DateAdd(DateInterval.Year, 1, SSrs.Fields(“DateAvailable”).Value), “yyyy%-MM%-dd”)
End If
If IsDBNull(SSrs.Fields(“DateModified”).Value) Then
datemodified = Format(DateAdd(DateInterval.Year, 1, Today()), “yyyy%-MM%-dd”)
Else
datemodified = Format(DateAdd(DateInterval.Year, 1, SSrs.Fields(“DateModified”).Value), “yyyy%-MM%-dd”)
End If
url_title = conv_url(repName)
repName = escape_string(repName)
If active = False Then
a_status = “closed”
Else
a_status = “open”
End If
’ convert filestring to html
If LCase(Right(filestring, 3)) = “pdf” Then
html_out = file_convert(filestring)
Else
html_out = “”
End If



Public Function file_convert(ByVal Deliverable As String) As String
Dim filestring As String
Dim dirstring As String
Dim newfilestring As String
Dim FilePath As String
filestring = Deliverable
dirstring = Trim(Mid(Left(filestring, Len(filestring) - 4), 11))
dirstring = Replace(dirstring, “/”, “”)
dirstring = Replace(dirstring, “:”, “”)
FilePath = modGlobals.path & dirstring & “\Report” & Left(filestring, Len(filestring) - 3) & “doc”
If Not System.IO.File.Exists(FilePath) Then
FilePath = Replace(FilePath, “_ “, “_”)
If Not System.IO.File.Exists(FilePath) Then
file_convert = “”
System.IO.File.AppendAllText(”\10.228.166.135\HTML_Reports\ErrorLog.txt”, Today() & " " & filestring & vbCrLf & FilePath & vbCrLf)
Exit Function
End If
End If
Dim doc As Aspose.Words.Document = New Aspose.Words.Document(FilePath)
Dim stream As IO.MemoryStream = New IO.MemoryStream()
doc.SaveOptions.ExportImagesFolder = SAVEPATH
doc.SaveOptions.HtmlExportImagesFolderAlias = SAVEALIAS
doc.SaveOptions.HtmlExportHeadersFooters = False
doc.Save(stream, Aspose.Words.SaveFormat.Html) ’ SaveFormat.Html)
Dim html As String = System.Text.Encoding.UTF8.GetString(stream.GetBuffer())
Dim startBody As String = “”
Dim endBody As String = “”
Dim startIndex As Integer = html.IndexOf(startBody) + startBody.Length
Dim endIndex As Integer = html.IndexOf(endBody) - startIndex
Dim lenth As Integer = html.Length
html = html.Substring(startIndex, endIndex)
newfilestring = conv_url(Left(filestring, Len(filestring) - 4)) & “.php”
System.IO.File.WriteAllText("\10.228.166.135\HTML_Reports" & newfilestring, html)
file_convert = newfilestring
End Function

Hi<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" />

Thanks for additional information. Unfortunately I can’t reproduce this issue on my side. I tried to set collation = Latin1_General and insert “test’test” value into the db field (I use MS SQL server 2005). Then I run the following code.

//Create connction

string connectionString = "server=myhost;database=TestDB;uid=sa;pwd=mypwd;";

System.Data.SqlClient.SqlConnection conn = new System.Data.SqlClient.SqlConnection(connectionString);

//Create DataSet

DataSet ds = new DataSet();

//Create sql command

string commandString = "SELECT FileName FROM TestTable WHERE ID=1";

System.Data.SqlClient.SqlCommand command = new System.Data.SqlClient.SqlCommand(commandString, conn);

//Open connection

conn.Open();

//Read dataset

System.Data.SqlClient.SqlDataReader reader = command.ExecuteReader();

while (reader.Read())

{

string valueStr = reader.GetValue(0).ToString(); //returns correct value

}

conn.Close();

Best regards.

Could the problem have been related to I was using VB.Net and ADO to access the data in the database? Your example appears to be either Java or C# and it does not appear that you are using ADO or ADO.Net to access the database that was created? For now we have worked around the problem by making all file names URL safe, which means they only contain letters, numbers, _, and -

Hi<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" />

I created my example in C# and I used ADO.NET to access to database.

Best regards.