Free Support Forum - aspose.com

Image quality problem when exporting Word document to HTML

We are evaluating the Aspose.Word control for a document management system on our website. The document management system will allow clients to call up a list of documents we hold relating to their case. Clients can then choose to view a document by clicking the hyperlinked document name from the list.

Currently we just serve the document directly to the client as a file download. We would like to use Aspose.Word to change this so the document is converted to HTML format and served to the user as a webpage. This avoids the problem of the client having to have Microsoft Word installed, and also means we can be sure that the client cannot modify our document themselves.

Testing this has shown a slight problem, however. Many of our Word Documents have embedded TIFF images inside them, as our internal paperless system allows staff to scan hard copies of letters received as TIFF files and then embed them within the Word Document which is then stored in our document management system.

When using Aspose.Word to convert the Word document to HTML the TIFF images do convert, but the resulting PNG image is of a poor quality, which makes the text difficult to read. I've attached a sample document from our system which has two embedded images to this posting.

Is there any way to improve the quality of the output image? I notice that the resulting PNG file poor quality when loaded in Paint Shop Pro, but if you embed the PNG image back into Microsoft Word the image quality is much better. This makes me wonder whether Word is performing some sort of image smoothing or anti-aliasing which Aspose.Word doesn't when exporting the image?

Alternatively is there any way we could choose the output format of images exported using the Aspose.Word control, as it's possible that exporting the image from Word as GIF or JPEG may be more suited for our purposes.

Please don't hesitate to come back if there's any more information I can supply. We've been most impressed with our evaluation of Aspose.Word in terms of performance against price, but the image quality issue is a concern we would need to address before we could place an order.

Regards,

Andrew Dancy
Lovetts plc
http://www.lovetts.co.uk

PS.

The sample code I'm using to generate the HTML page is as follows:

---aspose_test.aspx---

<%@ Page language="cs" %>
<%@ Import Namespace="Aspose.Word" %>

<%
Document doc = new Document(@"c:\paperless_test\8.doc");
doc.Save( Response.OutputStream, SaveFormat.FormatHtml );
%>

Hi Andrew,

Thank you for considering Aspose.

The point is that the TIFF images are not supported by Aspose.Word because they are not documented in MS Word format. It is strange that it works at all. It is probably stored in some other format inside the doc (maybe in addition to TIFF). However, we will investigate how to support TIFFs better and try to improve the quality of the exported images.

Are you still using Aspose.Words? Try latest version, maybe it handles the issue you mentioned. I've never seen TIFF images in MS Word documents actually. If you have a DOC file with a TIFF image in it, please attach it here (only you and Aspose staff can download it).

Roman,

Sorry for the delay in responding - only just seen this post.

Yes we are still using Aspose.Words, although primarily for generation of Word documents in-house. I'm just about to look again at using Aspose.Words in a web environment - in fact I posted just today on this in the Aspose.PDF forum at http://www.aspose.com/Community/forums/post/115573/converting-word-documents-to-pdf-layout-issues.aspx as we could end up converting to PDF to display on the web instead - it all depends which method ends up more reliable.

As for your comment on TIFF files inside a Word document, I attach an example I created a few minutes ago. Since TIFF files are essentially just another form of Bitmap perhaps internally it is just stored as such? I hope it helps you. We've got loads of wierd and wonderful Word documents spanning the last 10 years or so since Word 95 if you ever need more odd examples!

Hello Andrew!<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" />

This image is stored as PNG in the document. Most probably all TIFF images goes this way being embedded in MS Word. When HTML is generated with default options this image is scaled from 200 dpi to 96 dpi. You can play with these two options if you are not satisfied with quality of the images:

HtmlExportImageResolution (default is 96 dpi)

HtmlExportScaleImageToShapeSize (default is true)

Both are members of class SaveOptions. You can access them via Document.SaveOptions property.

I also looked the second thread that is in Aspose.Pdf forum. From our experience it could be hard to support such complicated documents. In general doc2pdf conversion will never be 100% exact due to technical reasons. For instance fonts with the same names are rendered slightly differently in MS Word and Adobe Acrobat. In such a case you might have 22 lines of text on the first page but after conversion there could be 21 or 23. This is just an example but it is from the real life. The more complex document is the more differences will interfere in the resulting PDF even if we supported everything. Some PDF related issues need to be fixed in Aspose.Pdf or with cooperation with them.

In spite of all the difficulties with conversion to PDF we are ready to improve it. Please let us know if you would like to consider any improvements. Files from your collection could be of great help bringing specific examples what is converted incorrectly.

Best regards,