Free Support Forum - aspose.com

Exposing an API to alter the HTML import behavior

My company is looking at using Aspose.Words for Java to perform HTML to DOC conversion and we are impressed by your company and your products and we are likely to license Words. However, my biggest concern is that the import process doesn't give us much in the way of flexibility. You only support a subset of the possible style attributes and only on specific HTML tags. We would be at the mercy of whatever your importer does without much in the way of altering its behavior. For example, if we wanted to handle a specific tag or style attribute in a custom way, I don't see a mechanism for doing that.

I'm wondering if you would consider expanding your API to allow custom behavior to execute during the import process. For example, the Document class has the ability to add a NodeChangedEventHandler but it does not provide any context. If, during an HTML import, we could register a listener that could be called as each HTML element is handled and give us the ability to edit or modify the resulting Node, that would go a long way in allowing us to have the flexibility we want. Something like the NodeChangedEventHandler could provide not only the node, but perhaps the HTML tag or style that generated the node (perhaps even a full string representing of the entire tag).

For example, if I want all

  • elements to have a specific paragraph indentation, I could add a listener and then when a Node is created for an
  • , I could modify the attributes of the Node. At the moment, I would have to wait until the import for the whole HTML document is complete and then search through the document for bullets. That may work for this, but in some cases, your importer ignores unsupported or invalid data (such as a color attribute on a tag). That data gets thrown out by the importer with no way to access that data after the conversion where I could possibly do something with it during the import.

    Short of doing that, can you tell me what your plans are going forward about improving your HTML import process and adding support for more styles (including external styles) or html elements?

  • The plans that we have is to do significant work on HTML import so it supports as much HTML and CSS as humanly possible when converting to a Word document.

    There is a slight problem in your case that we are doing this work on the .NET code first and then port to Java. While I hope we will see improved HTML+CSS import in our .NET code this year, it will take a bit longer to appear in Java.

    We will certainly consider the feature that you are talking about, but I cannot promise at this stage.