Insert HTML does not maintain Defined HTML Formatting

While evaluating the latest release to see if we want to upgrade I am finding the InsertHTML method still does not maintain the defined HTML formatting from the source.

For example taking formatted text from word and pasting it into a WYSIWYG editor and stripping the word formatting and using standard HTML, does not insert all items correctly.

When viewing the HTML markup, the markup is correct and can viewed in an HTML file as defined. When you take that same markup and use the InsertHTML method the formatting is lost, specifically where you have bullets that are handled by span tags and css.

Tables, font formatting and general formatting is maintained. The issue is greatly apparent with bullets that get converted to normal text and layout is controlled through HTML styling.

I would expect the defined HTML to be inserted exactly as it appears in the editors or from the source document itself. I have looked through the forums and I do not see a true answer to this question.

Using the text from the QuickTest.doc file and pasting it into the Editor in the project and clicking the process button you can easily see the formatting is not maintained, but appears correctly in the HTML view of the editor.

Demo Project Attached with sample word file that is being used.

Hi Harry,


Thanks for your inquiry.

I’m afraid it’s not quite clear how to reproduce the issue using your sample project. Could you please also attach the source HTML that you are inserting into your document?

Thanks,

Here are the steps:

  • Run the project
  • Open the QuickTest.doc file
  • Select all the text in the word document
  • Paste it into the WYSIWYG Editor using the "Paste From Word" option with the stript front option selection selected
  • Complete the paste process
  • Once the formatted HTML appears in the editor select either Word or PDF click the "Process Text" button

The newly generated document does not maintain the formatting of the HTML that is present in the WYSIWYG editor.

If you take the HTML markup and create a new HTML file it displays correctly in as a web page. I would expect the HTML that is in the Editor to be correctly inserted into the document when using the DocumentBuilder InsertHTML method.

Have you had a chance to review this, using the steps listed?

Hello

Thanks for your request. It seems you are missed to attach DevExpress.Data.v11.1 DLL. Could you please attach it here?

Best regards,

Here it is.

Also I wanted to note that I have tried this with varying WYSIWYG Editors (DevExpress, Tekerik, CKEditor, FreeTextBox and CuteEditor).

Out of all of those, the only one that has the HTML formatting maintained when creating a new document and using InsertHTML is CuteEditor.

If you take the HTML (That is generated from word pasting, as in my example project) from any of the others, then the HTML does not carry over correctly when using InsertHTML.

Also to note, if you take the HTML from any of the above and create a sample htm or html page and convert it directly to PDF using the PDF Generator the htm/html to PDF is always fine. This seems to be a very specific issue to how InsertHTML handles different HTML markup.

Unfortunately I have to use the InsertHTML method to do what I am doing.

Hello

Thank you for additional information. Y
ou should note that Aspose.Words was originally designed to work with MS Word documents. That is why upon processing HTML some features of HTML might be lost. You can find a list of limitation upon exporting/importing to HTML here:

http://www.aspose.com/documentation/.net-components/aspose.words-for-.net/save-in-the-html-format.html

Also you are missed to attach DevExpress.Web.ASPxSpellChecker.v11.1 DLL.

Best regards,

After reading the list, the items I am referring too that do not format correctly, such as bullet numbering are supported. Also why would the inserthtml work with some html and not others?

I will attach the spellchecker later today.

Hi there,

Additionally, it would be easier for us to reproduce the issue if you were to copy the HTML created from your tool and your input, paste it into a text file and attach it to this thread.

Thanks,

Here is sample data from, DevExpress Editor and CuteEditor. In the sample is the HTML generated by both and then a document generated using the HTML with the InsertHTML method.

Hi

Thank you for additional information. As I can see document generated from HTML produced by CuteEditor looks quite good.

With DevExpress HTML the resulting document looks a little worse. There are problems with indents. This occurs because DIVs are using instead of Ps (paragraphs).

Best regards,

Since both are valid HTML elements, why is one being treated different than the other with the InsertHTML method?

Is there a work around with the InsertHTML method? Should this be seen as by design or this an issue that needs to be addressed with the InsertHTML method itself?

Hi

Thanks for your request. Yes, both are valid HTML documents. But as I mentioned earlier Aspose.Words was originally designed to work with MS Word documents. Currently Aspose.Words Takes formatting only from HTML elements that corresponds elements in Word documents. This means, Aspose.Words takes paragraph formatting from P, H1, H2…H9 elements, text formatting from SPAN elements, etc.

DIV actually does not directly correspond any element in Word documents, so Aspose.Words does not take formatting from it.

Sure, we always work on improving our HTML module. Unfortunately, at the moment there is no list of HTML features supported upon importing HTML available in our documentation. Currently, we have only list of features supported upon exporting to HTML

http://www.aspose.com/documentation/.net-components/aspose.words-for-.net/save-in-the-html-format.html

Later we will also provide list of features supported upon importing HTML. Currently you can assume that features that are exported to HTML are also supported upon importing HTML, i.e. for example, if Aspose.Words exports font formatting as inline styles of , it will be able to read these formatting options back into the model.

Best regards,

Thanks for the further explanation. Knowing this I know I have to change out our WYSIWYG editor to one that is more robust and offers more control over the formatting.

Regards,

Harry

The issues you have found earlier (filed as WORDSNET-5557) have been fixed in this .NET update and this Java update.


This message was posted using Notification2Forum from Downloads module by aspose.notifier.
(1)

The issues you have found earlier (filed as WORDSNET-2021) have been fixed in this .NET update and this Java update.


This message was posted using Notification2Forum from Downloads module by aspose.notifier.
(14)