Lost content when converting from Html to Doc

Hi,


We detected that some content was lost when converting from html to doc using Aspose Words 17.3.0.

Here is the code to reproduce the issue :

public static void main(final String… s) throws Exception {
final String cleanHtml = “

<span style=“width: 175.5pt; display: inline-block; -aw-tabstop-align: left; -aw-tabstop-pos: 207pt;”>HOHOHO

”;

final Document document = new Document();
final DocumentBuilder builder = new DocumentBuilder(document);

builder.insertHtml(cleanHtml);
final OoxmlSaveOptions so = new OoxmlSaveOptions(SaveFormat.DOCX);
so.setUseAntiAliasing(true);
so.setUseHighQualityRendering(true);
document.save(“mydoc.docx”, so);
}

When running this code, I got a document where the word “HOHOHO” cannot be found.
This bug is a very serious issue.

If I manually remove the aspose words styles (aw-tabstop-align & aw-tabstop-pos), then my document is correct but I’m afraid of what could be the consequences of always removing those styles from the html.

Can you help us regarding this ?

Hi there,


Thanks for your inquiry. We have tested the scenario with your sample code and notice the reported issue. We have logged a ticket WORDSJAVA-1530 in our issue tracking system for further investigation and rectification. We will notify you as soon as it is resolved.

We are sorry for the inconvenience.

Best Regards,

Hi,


As the consequences of this issue (lost content) is dramatic for me, I seriously consider to removes those styles from the html until you have a proper fix.

Can you help me to understand what could be the consequences of removing those two styles ?

Thx

Hi there,


Thanks for your feedback. It seems Aspose.Words is not loading Text into DOM with “display: inline-block style;”. As a workaround, you can remove this style to resolve the issue. Hopefully it will help.

However, we will keep you updated about resolution progress of above logged issue.

Best Regards,

@PS-CL

Thanks for your patience. We have investigated above reported issue and found it is not a bug.

The element in your document has a special format telling Aspose.Words that this is a ‘tab stop emulation’ span, as we call it. Such spans are produced by Aspose.Words in order to emulate tab stops of the source document and make round-trip of tab stops possible. When such span is imported back from HTML by Aspose.Words, its contents are ignored and a tab character is inserted into the resulting document instead.

If you need a normal span with arbitrary text then you can remove Aspose.Words’ custom CSS properties (’-aw-tabstop-align’ and ‘-aw-tabstop-pos’).

Best Regards,