Invalid filename in content-disposition for non us chars

Aspose.Words.Document.Save() to a HttpResponse does not encode the filename properly.

Usualy this is seems not to be much of a problem, as a) most filenames use us-ascii characters anyway, and b) most browsers are relatively robust handling invalid encoded stuff… Still, it is invalid according to the specs.

Suggestion: use the System.Net.Mime.ContentDisposition object to handle encoding

Cheers,
Alex

Hi

Thanks for your request. Could you please attach sample document for testing? I will investigate how this could be resolved.

Best regards.

The code is straightforward. When I save it with a simple, us-ascii file name like this:

Aspose.Words.Document doc = new Aspose.Words.Document(stream);
doc.JoinRunsWithSameFormatting();
doc.SaveOptions.HtmlExportCssStyleSheetType = 
Aspose.Words.CssStyleSheetType.Embedded;
doc.SaveOptions.ExportPrettyFormat = true;
doc.Save(
"test.mht",
Aspose.Words.SaveFormat.Mhtml,
Aspose.Words.SaveType.OpenInBrowser,
Response);

this is the HTTP response sent over the wire:

HTTP/1.1 200 OK
[uninteresting headers deleted]
content-disposition: inline; filename=test.mht

Now, if I save it using e.g. a norwegian file name “Hær.mht” (after the H is the a and e glued together), this is the file name sent over the wire:

content-disposition: inline; filename=H…r.mht

This is obviously not correct.

The remedy is in the Save() procedure not to just put the file name to a content-disposition header, but encode it. The encoding routines are present in the Microsoft class mentioned in the orignal post.

Cheers,
Alex

Hi

Thanks for your request. You can encode file name before putting it into Save method. See the following code:

doc.Save(Server.UrlEncode("Hær.doc"), SaveFormat.Doc, SaveType.OpenInBrowser, Response);

Best regards.

Hi Alexey,

Thanks for your reply. Unfortunately, UrlEncoding is NOT the correct encoding for the filename in content-disposition. See http://tools.ietf.org/html/rfc2184, section 4.

I thought that the .Net ContentDisposition class was smart enough to handle this, but I’ve done some more testing and it appears that it is not.

I’ll find a way around this, but it remains an invalid header that Aspose sends out.

Cheers,
Alex

Hi

Thanks for your request. Please take a look the following article. I think it could be useful for you.

http://www.codeproject.com/KB/aspnet/NonUSASCII.aspx

Best regards.

Alexey,

thanks for the link. The problem is much more complex, as there should be a * at the end of the filename parameter, indicating that the remainder is an encoded word(see RFC 2231).

What should be sent over the wire is something like this:

content-disposition: inline; filename*=?iso-8859-1?Q?H=E6r.mht?=

As you can see, there is no way I can make that happen if I use the Save() function in Aspose that writes that to the HttpResponse.

Cheers,
Alex

Hi

Thank you for additional information. I logged this request in our defect database as issue #7166. I will notify you as soon as it is fixed.

As a workaround, you can use the following code:

MemoryStream stream = new MemoryStream();
doc.Save(stream, SaveFormat.Pdf);
byte[] bytes = stream.GetBuffer();
Response.Clear();
//Specify the document type.
Response.ContentType = "**application/pdf**";
//Other options:
// Response.ContentType = "application/msword"
//Response.ContentType = "text/plain"
//Response.ContentType = "text/html"
//Specify how the document is sent to the browser.
Response.AddHeader("content-disposition", "attachment; filename=test.doc");
//Another option could be:
//Response.AddHeader "content-disposition","inline; filename=MyDocument.doc"; 
//Get data bytes from the stream and send it to the response.
Response.BinaryWrite(bytes);
Response.End();

Best regards.

Hi Alex,

Thank you for your valuable suggestions.

I looked into the problem and it appears that Internet Explorer (I tested on IE6) does not support encoded words. I was not able to get it to display a file name properly. I used your example literally and IE was just returning the name of the aspx page + my extension instead of the document file name.

Also, if we were to implement this, we would refrain from using 8859- character sets because a filename can contain any Unicode characters. We would have to use UTF-8 or UTF-7 (which seems to be not recommended by the encoded word RFC).

The code that Alexey posted you earlier can be used as a workaround, it is essentially the way we do it in our Save method. By implementing writing to the browser yourself you can obtain complete control over the headers sent to the browser.

All this suggests me we should not bother changing anything in Aspose.Words at this stage.

Thanks Roman,

The problem is that “the other browser” does a much better job in following standards than IE :slight_smile:

Notice the little * directly after “filename”. This indicates that what follows is an encoded word. IE does not support this, FireFox and Safari (seem to) do. In stead, IE does support something that seems to be an url encoding (e.g. a space becomes %20), which FF does also support. But that does not solve all encoding woes. The word-encoding mechanism does, if used properly.

It can be done right, but need to support the different browsers in a different way. Tomorrow I’ll write up some code to give you a sample implementation that works both on IE and FF.

Cheers,
Alex

Roman,

Sorry, I’ve come to the conclusion that IE is a poor performer in this area too, I’ve given up. FireFox just works.

Cheers,
Alex

Ok I’ve done it the way it is described in the CodeProject article. Encode using UTF-8 and then hex escape. This works on IE. Most likely works on FireFox too.

The issues you have found earlier (filed as 7166) have been fixed in this update.


This message was posted using Notification2Forum from Downloads module by alexey.noskov.