Need to remove inline span tag when convert docx to HTML

Hi Team,

As we are using CKEditor in our application and when we load HTML content on it, at that time it’s taking a lot of time load in Editor due to a lot of inline styles.

So just want to check is there any API to minimize the number of span tag in generated HTML or we can wrap most of the content within paragraph “p” tag with style.

Please check attached screenshot and java code which I am using.

Note: We are using the same HTML which was converted Docx to HTML using Aspose java api.
Thanks

query.zip (77.6 KB)

@purusadh

Please note that formatting is applied on a few different levels. For example, let’s consider formatting of simple text. Text in documents is represented by Run element and a Run can only be a child of a Paragraph. You can apply formatting

  1. to Run nodes by using Character Styles e.g. a Glyph Style .
  2. to the parent of those Run nodes i.e. a Paragraph node ( possibly via paragraph Styles ).
  3. you can also apply direct formatting to Run nodes by using Run attributes ( Font ). In this case the Run will inherit formatting of Paragraph Style, a Glyph Style and then direct formatting.

You can call Document.JoinRunsWithSameFormatting method before saving the document to joins runs with same formatting in all paragraphs of the document.

You may use HtmlSaveOptions.CssStyleSheetType as shown below to reduce inline styles. Hope this helps you.

doc.joinRunsWithSameFormatting();
HtmlSaveOptions options = new HtmlSaveOptions();
options.setPrettyFormat(true);
options.setCssStyleSheetType(CssStyleSheetType.EMBEDDED);
doc.save(MyDir + "19.9.html", options);

Thanks Tahir,

Attached sample code is great help for me.

As, I check when we convert docx to html it’s going to add class with tag. like: Body, default, ListParagraph etc.

I have same query for inline style. is it possible to remove inline from html which is not required in our case. like: style=“margin-bottom:12pt” etc.

Please find attached document for more details.

Thank you.
Purushottam

query for inline style.zip (169.5 KB)

@purusadh

You are getting the correct output. This is by design. In the Word document, different formatting can be applied to Paragraph and Run nodes. So, it is not feasible to export every inline style into embedded CSS style.

Thanks Tahir