Converting Word document with Images to HTML is slow

Hello there

We use Aspose amongst other things to create “preview” versions of documents. To do this we convert documents to HTML using Aspose so that the converted document can then be displayed in a browser.

For this we use Aspose.Words for .NET (version 21.1.0; we keep our NuGet packages up to date)

The code to do this is trivial (I won’t include code to set the Licences or to trap errors for transparency)

oDoc = New Aspose.Words.Document(sFile)
oDoc.Save(sToFileName, Aspose.Words.SaveFormat.Html)

test.zip (483.2 KB)

The attached document is quite small. 601KB, and only a few pages, three of which contain a page sized image.

It can take 4 seconds for this to be converted, which seems to me excessive for a file of this size, especially considering I’m testing this on a new machine with an i9 processor and 32GB of internal memory… Our customers have much less powerful machines so for them this can become excruciatingly slow.

Is there anything I can do to speed this up? See the attached document for reference.

1 Like

@kidkeogh

It is quite difficult to answer such questions because CPU performance and memory usage all depend on complexity and size of the documents you are loading/generating.

The simplest rule is: the first call of “new Document()” will cause to load all related classes and system buffer instantiation. The static Aspose.Words resources (document styles, fonts, border arts, etc.) are loaded lazily – only when they really needed and after loading they are cached during the session.

Moreover, if you are loading huge Word documents into Aspose.Words’ DOM, more memory would be required. This is because during processing, the document needs to be held wholly in memory. Usually, Aspose.Words needs 10 times more memory than the original document size to build a DOM in the memory.

Hello Tahir,

I’ve already addressed all the points you raise. I’m working on a brand new machine with an intel i9 processor, on Windows 10 64-bit, with 32gb of internal memory. It’s blisteringly fast. And yet it takes 4 seconds to convert a document with 3 pages containing 3 images to html. That just seems excessive. You can see the document I attached. What I am wondering is whether it is possible to improve performance by changing settings during the conversion?

@kidkeogh Your test document contains 4 quite large EMF metafiles (33Mb each). Manipulation with such large images takes time. Also upon conversion to HTML, metafiles are converted to raster by default. If disable conversion of metafiles by setting HtmlSaveOptions.MetafileFormat to HtmlMetafileFormat.EmfOrWmf, saving operation performs 10% faster on my machine. But still it takes time to manipulate large objects.
There is an option LoadOptions.ConvertMetafilesToPng which enables Aspose.Words to convert metafiles to PNG on load and this significantly improves (more than 2 times on my machine) conversion time of your document.

LoadOptions opt = new LoadOptions();
opt.ConvertMetafilesToPng = true;
Document doc = new Document(@"C:\Temp\test.docx", opt);
doc.Save(@"C:\Temp\out.html");
1 Like

@alexey.noskov

Thank you for that. Rest assured that we have already advised our client of better ways to include the data they wished to include in this document; the original document can be found on freely accessible Irish government websites, and the text can be copied as is, rather than screen captured the way they had done in this document.

That said, I tried out your suggestion to set ConvertMetafilesToPng in the load options and I observe a similar speed improvement here. That alone will make our customer a lot happier, even if they choose not to follow the advice :slightly_smiling_face: