Can I simply get one question answered please? (InsertHTML 8 month old bug)


#1

There has been an ongoing issue with InsertHTML and how it handles unsupported tags and attributes. InsertHTML simply errors out and throws an exception when it finds something it doesnt understand.

Something changed recently and I need to find out exactly what. I downloaded v3.3.2.0 and it looks like InsertHTML is now working and any unsupported HTML tags/attributes are ignored. However, the exception is still raised, which is fine as long as that's the expected behavior. Can you verify that this is the expected behavior?

I need to know in order to code for an expected behavior. It appears as if Aspose.Word does the following regarding InsertHTML:

  1. Ignores unsupported tags and attributes, but inserts what it does understand.
  2. Raises an exception.

Previously, Aspose.Word simply raised an exception without inserting. What is it actually doing with the current version?

BTW, I've posted this question twice without a sufficient answer or a full understanding of my question...


#2

This is work in progress right now. I don't want to write down what happens with unsupported HTML now because it will be supported when we finish. Unsupported HTML will be stripped and document loaded. Thanks for understanding.


#3

InsertHTML may change, but v3.3.2.0 is released and we're using it. We need to know what occurs within v3.3.2.0 today, not future versions tomorrow...


#4

In this case the behaviour is exactly like you described it. Sorry can’t be more helpful about import of unsupported HTML elements at the moment.


#5

Below is a snippet of code that inserts HTML from the user into a document. In the older versions, we used snippet A. So just to be perfectly clear (we're planning to put this into production this weekend), using the current version, I should go with snippet B, correct?

A. try{m_builder.InsertHtml(htmlText);}catch {m_builder.InsertHtml(CuteEditor.EditorUtility.ExtractPlainTextOutOfHtml(htmlText));}

or

B. try{m_builder.InsertHtml(htmlText);}catch {}


#6

The way it works now in Aspose.Word:

1. Go through all HTML elements and process each.

2. When processing an HTML element, build/modify the Document accordingly. Create new nodes, set formatting etc.

3. If an HTML element is not supported, it is ignored.

4. Aspose.Word retrieves only HTML and CSS attributes that it can understand. It does not enumerate through all attributes on an element.

5. If Aspose.Word requests an attribute it needs (most often CSS or width attribute) and cannot parse it because it is in percent or some unit type not supported by Aspose.Word it throws an exception.

6. When the exception is thrown further HTML processing is stopped. Since the document model is not transactional, whatever changes were already made to the document remain in the document.

Although this might not be perfect import model, I hope it will help for your release.