Hi team, I am trying to convert Docx to XML, but it throw “Exporting fragments of a document in this format is not supported” for both
document.ToString(SaveFormat.WordML) and document.ToString(SaveFormat.FlatOpc)
Attachment is the file I used, is there any way I can convert it to XML?
Test 2.docx (76.6 KB)
Regards,
James
@JamesNguyen Node.ToString
method accepts only SaveFormat.Text
and SaveFormat.Html
. If it is required to convert document to Word 2007 XML, you should use the following code:
Document doc = new Document(@"C:\Temp\in.docx");
string xml;
using (MemoryStream ms = new MemoryStream())
{
doc.Save(ms, SaveFormat.FlatOpc);
xml = Encoding.UTF8.GetString(ms.ToArray());
}
Hi @alexey.noskov thanks for your sample, it work perfectly. with that xml string, is there a way I can convert it back to Docx?
@JamesNguyen Sure you can. Please see the following code that converts DOCX to XML string and then XML string back to DOCX:
Document doc = new Document(@"C:\Temp\in.docx");
// Conver document to Word 2007 XML
string xml;
using (MemoryStream ms = new MemoryStream())
{
doc.Save(ms, SaveFormat.FlatOpc);
xml = Encoding.UTF8.GetString(ms.ToArray());
}
// Convert Word 2007 XML string to DOCX
using (MemoryStream ms = new MemoryStream(Encoding.UTF8.GetBytes(xml)))
{
Document doc1 = new Document(ms);
doc1.Save(@"C:\Temp\out.docx");
}
1 Like
Great support as always, you are star, thanks @alexey.noskov . After converting Docx to XML and back to Docx, it still able to preserve format as original document, so I switch to this format instead of HTML now 
@JamesNguyen Yes, if your goal is simply convert the document to string representation, then Word 2007 XML is much better choice then HTML, since Word 2007 XML is the same DOCX, but in flat OPC representation. So all DOCX features can be preserved in this format.
Yes, as first I went with HTML because it will make the file smaller and easier to read, but because of format issue, XML is my choice now, thanks @alexey.noskov
1 Like
Hi @alexey.noskov I saw that the generated XML is much bigger than HTML with a lot of elements and tags, is there any setting to make the converter generate simpler file?
@JamesNguyen No, I am afraid, there is no way to produce more compact Word 2007 XML. As an option you can try using Word 2003 XML (SaveFormat.WordML
). But you should note that this format does not support all feature of DOCX format.
1 Like