Convert Word DOCX to HTML using Java to Render & Edit in TinyMCE | Remove -aw-import:ignore from Spans with Non-Breaking Space

Hi,

Greetings.

We use Aspose.words Java to convert docx file to html.
We use tinymce to render the html for editing.

Now when the user enters some text in the blank lines area, it doesn’t get reflected in the docx file when we convert the html to docx.

When we further analyzed it, we found that since the user entered the text in the p tag which has span tag which in-turn has -aw-import ignore specified, the text did not get reflected in the docx file.

Please suggest ways we can mitigate this.

Please see image attached that has screenshot of the html

*smaple html with aw-import ignore with user text in it.GIF (59.7 KB)

Best regards
Moses

@mosesm,

When you set HtmlSaveOptions.ExportRoundtripInformation property to ‘true’ (default is true), Aspose.Words exports these custom “-aw-*” CSS properties in HTML as part of round-trip information. Aspose.Words writes this “-aw-import:ignore” when it needs to make certain elements visible in HTML that would otherwise be collapsed and hidden by web browsers e.g. empty paragraphs, space sequences, etc. To workaround this problem you can explicitly disable this HtmlSaveOptions.ExportRoundtripInformation by using the following code:

Document doc = new Document("C:\\Temp\\a.docx");
HtmlSaveOptions opts = new HtmlSaveOptions(SaveFormat.Html);
opts.PrettyFormat = true;
opts.ExportRoundtripInformation = false;
doc.Save("C:\\Temp\\20.8.html", opts);

P.S. Currently we mark only the following elements with “-aw-import:ignore”:

** Sequences of spaces and non-breaking spaces that are used to simulate padding on native list item ( <li> ) elements.*
** Non-breaking spaces that are used to prevent empty paragraphs from collapsing.*

However, note that this list is not fixed and we may add more cases to it in the future.

Also, please note that Aspose.Words write &#xa0; instead of &nbsp; because &nbsp; is not defined in XML. And by default Aspose.Words generate XHTML documents ( i.e. HTML documents that comply with XML rules ).

Thanks @awais.hafeez for the quick turnaround.

Since my usecase needs the roundtrip information, I would not be able to turn it off. If I do, I will lose the blank lines etc when I convert the html back to docx.

The ask: Currently the “Aspose words” does not import the text present in the span tag that has “-aw-import ignore” style specified. Can aspose words please import the text present inside the span tag even when the “-aw-import ignore” style is specified. This will help us resolve the problem we are facing. Please suggest.

Best Regards
Moses

@mosesm,

I think, you can post-process HTML generated by Aspose.Words before loading it to TinyMCE editor. You may remove “-aw-import:ignore” from spans with non-breaking space characters, for example, using text replacement with regular expressions etc. If we can help you with anything else, please feel free to ask.

1 Like

Thanks again for the quick turnaround @awais.hafeez

Let me try your suggestion and check.