Incorrect OpenXml output for texts containing Italic font styles

Hello, In Aspose.Words 23.8 Document.Save(result, SaveFormat.FlatOpc) produces output that MS Word doesn’t recognize properly.
In the attached example Document is created from html, saved using SaveFormat.FlatOpc option and then inserted in MS Word document.
Texts with Italic font style are not displayed properly by MS Word.HTMLtoOpenXmlAspose.zip (12.1 KB)

@AndrewStx As I can see the following code produces a valid MS Word 2007 XML document:

Document doc = new Document(@"C:\Temp\in.html");
doc.Save(@"C:\Temp\out.xml", SaveFormat.FlatOpc);

out.zip (4.9 KB)

If open it in MS Word the content looks correct.

MS Word Range.InsertXML does not insert MS Word 2007 XML document, it inserts a custom XML. So Aspose.Words output is correct.

From my understanding if “MS Word 2007 XML document” is is a valid XML it should be successfully inserted by Range.InsertXML. And it actually was inserted, but Italic font formatting was messed up.

If you create MS Word document this the same content as in code example (attached Text.docx), extract document.xml and style.xml parts, replace corresponding pars in xml provided by Document.Save (attached AdjustedXml.zip) and then insert that xml using Range.InsertXML that MS Word display correct content.
This proves Document.Save produces not correct output.
Text.docx (12.2 KB)
AdjustedXml.zip (6.8 KB)

@AndrewStx As I can see formatting is properly specified in output XML. For example the last bold italic word:

<w:r>
	<w:rPr>
		<w:rStyle w:val="font4" />
		<w:b />
		<w:bCs />
		<w:i />
		<w:iCs />
		<w:lang w:val="en" w:eastAsia="en" />
	</w:rPr>
	<w:t>ItalicAndBOLD</w:t>
</w:r>

and style:

<w:style w:type="character" w:customStyle="1" w:styleId="font4">
	<w:name w:val="font4" />
	<w:basedOn w:val="DefaultParagraphFont" />
	<w:rPr>
		<w:rFonts w:ascii="Calibri" w:eastAsia="Calibri" w:hAnsi="Calibri" w:cs="Calibri" />
		<w:b />
		<w:bCs />
		<w:i />
		<w:iCs />
		<w:color w:val="000000" />
		<w:sz w:val="22" />
		<w:szCs w:val="22" />
	</w:rPr>
</w:style>

As you can see bold and italic formatting is properly specified. Aspose.Words output conforms OOXML specification and is properly shown when open it in MS Word. Unfortunately, it is difficult to say why Range.InsertXML does not interpret it properly.
Could you please explain the purpose of inserting XML into the document using MS Word automation in your application? If it is required to insert HTML content into the document, you can simply using DocumentBuilder.InsertHtml method.

@alexey.noskov, I checked XML output as well. At first glance everything is correct, but as we can see Word doesn’t process in correctly :frowning:.

The real application is implemented as VSTO addin for MS Word, and as part of business logic it updates portions of opened document (which is actively edited by user) when needed.

@AndrewStx
We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.

Issue ID(s): WORDSNET-25850

You can obtain Paid Support Services if you need support on a priority basis, along with the direct access to our Paid Support management team.

I have tested the same scenario without HTML and in this case content looks good. I have used the following code to generate document with formatting:

Document document = new Document();
DocumentBuilder builder = new DocumentBuilder(document);
builder.Write("Text: ");
builder.PushFont();
builder.Italic = true;
builder.Write("Italic ");
builder.PopFont();
builder.PushFont();
builder.Bold = true;
builder.Write("BOLD ");
builder.PopFont();
builder.Write("Regular ");
builder.PushFont();
builder.Italic = true;
builder.Bold = true;
builder.Write("ItalicAndBOLD ");
builder.PopFont();

Also, content is properly inserted if reset styles applied to Run nodes:

// ....
var document = new Document(ms, htmlLoadOptions);
document.Cleanup(cleanupOptions);
                
// Dummy style.
Style style = document.Styles.Add(StyleType.Character, "test");
document.GetChildNodes(NodeType.Run, true).Cast<Run>().ToList()
    .ForEach(r => r.Font.Style = style);
// ....

Hello,
what is the status of mentioned above ticket Issue ID(s): WORDSNET-25850?
Thank you,
Andrew

@AndrewStx We have completed analyzing the issue. And this looks like an MS Word’s bug to me. It seems that MS Word incorrectly resolves toggle property values in direct formatting of text.

When Aspose.Words converts the following HTML to DOCX, it applies “italic” formatting both to direct formatting of text and to formatting of the “font0” style.

<html>
    <style>
        .font0 { 
            font-style: italic
        }
    </style>
    <span class='font0'>Italic</span>
</html>

Text is imported like this:

<w:p>
    <w:r>
        <w:rPr>
            <w:rStyle w:val="font0"/>
            <w:i/>
            <w:iCs/>
        </w:rPr>
        <w:t>Italic</w:t>
    </w:r>
</w:p>

And the style is imported like this:

<w:style w:type="character" w:customStyle="1" w:styleId="font0">
    <w:name w:val="font0"/>
    <w:basedOn w:val="DefaultParagraphFont"/>
    <w:rPr>
        <w:i/>
        <w:iCs/>
    </w:rPr>
</w:style>

The text in the resulting DOCX document generated from this HTML will have the “italic” formatting, because “ECMA-376:2006 Part 4 Section 2.3.2.14 ‘i (Italics)’” says that “… when used as direct formatting, setting this property to true or false shall set the absolute state of the resulting property”.

Since direct formatting specifies the final state of the toggle property, when MS Word inserts this document into another document, it somehow combines direct formatting of this text with its style formatting and inverts direct formatting:

<w:r>
    <w:rPr>
        <w:rStyle w:val="font0"/>
        <w:i w:val="0"/>
        <w:iCs w:val="0"/>
    </w:rPr>
    <w:t>Italic</w:t>
</w:r>

The issue is not yet scheduled for development, since it is not clear whether we should mimic MS Word behavior if it violates the specification.