I am trying to take xml input and insert it into a word document. I have an xml tag with xml:space=preserve set, so I would like to retain the leading spaces. When I get the value of the tag in a string and create a Run with that value as the string, I get a Word document where the newlines in the Run are honored, but all leading spaces are ignored. I confirmed in the debugger that the string contains the spaces.
For example:
XML:
<Run nodeID="74" xml:space="preserve">
eam
sms-sdk
spbm
docs
java
JAXWS/samples/javadoc/index.html
ReferenceGuide
index.html
java
JAXWS
lib
samples
build and run scripts
wsdl
pbmService.wsdl
pbm.wsdl
ssoclient
vsphere-ws</Run>
code:
string textToWrite = curElem.Value;
Run newRun = new Run(this.wordDoc, textToWrite);
textPara.AppendChild(newRun);
I have also tried I have tried textToWrite = textToWrite.Replace(' ', ControlChar.SpaceChar);
but that does not seem to have any effect.
Sample word Output:
eam
sms-sdk
spbm
docs
java
JAXWS/samples/javadoc/index.html
ReferenceGuide
index.html
java
JAXWS
lib
samples
build and run scripts
wsdl
pbmService.wsdl
pbm.wsdl
ssoclient
vsphere-ws
@mhimlin the truncation of the white-spaces occur at the moment of the saving, and that happen only if you have a multiline Run with white-spaces after a line break. I’m going to escalate the issue to our development team for analysis, alternatively you can split the text by line-breaks and insert a Run for each line, that will keep the white-spaces:
var xmlString = @"<Run nodeID=""74"" xml:space=""preserve"">
eam
sms-sdk
spbm
docs
java
JAXWS/samples/javadoc/index.html
ReferenceGuide
index.html
java
JAXWS
lib
samples
build and run scripts
wsdl
pbmService.wsdl
pbm.wsdl
ssoclient
vsphere-ws</Run>";
XmlDocument xmlDoc = new XmlDocument();
xmlDoc.LoadXml(xmlString);
Document doc = new Document();
Paragraph p = new Paragraph(doc);
var xmlLines = xmlDoc.InnerText.Split(new string[] { "\r\n" }, StringSplitOptions.None);
foreach (var line in xmlLines)
{
var lineText = line + "\r\n";
p.AppendChild(new Run(doc, lineText));
}
doc.FirstSection.Body.AppendChild(p);
doc.Save("C:\\Temp\\output.docx", SaveFormat.Docx);
output.docx (7.1 KB)
@mhimlin
We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.
Issue ID(s): WORDSNET-25100
You can obtain Paid Support Services if you need support on a priority basis, along with the direct access to our Paid Support management team.
@mhimlin I have closed WORDSNET-25100 as not a bug. Actually the leading whitespaces are preserved in the output DOCX document:
<w:p><w:r><w:cr /><w:t>
eam</w:t><w:cr /><w:t>
sms-sdk</w:t><w:cr /><w:t>
spbm</w:t><w:cr /><w:t>
docs</w:t><w:cr /><w:t>
java</w:t><w:cr /><w:t>
JAXWS/samples/javadoc/index.html</w:t><w:cr /><w:t>
ReferenceGuide</w:t><w:cr /><w:t>
index.html</w:t><w:cr /><w:t>
java</w:t><w:cr /><w:t>
JAXWS</w:t><w:cr /><w:t>
lib</w:t><w:cr /><w:t>
samples</w:t><w:cr /><w:t>
build and run scripts</w:t><w:cr /><w:t>
wsdl</w:t><w:cr /><w:t>
pbmService.wsdl</w:t><w:cr /><w:t>
pbm.wsdl</w:t><w:cr /><w:t>
ssoclient</w:t><w:cr /><w:t>
vsphere-ws</w:t></w:r></w:p>
The problem is not with whitespaces actually, but with \r\n
line breaks. Normally there should be be line break and line feed characters in the Run’s text in MS Word documents. In your case you can simply replace \r\n
line breaks with \v
(soft line break) to get the expected output:
var xmlString = @"<Run nodeID=""74"" xml:space=""preserve"">
eam
sms-sdk
spbm
docs
java
JAXWS/samples/javadoc/index.html
ReferenceGuide
index.html
java
JAXWS
lib
samples
build and run scripts
wsdl
pbmService.wsdl
pbm.wsdl
ssoclient
vsphere-ws</Run>";
XmlDocument xmlDoc = new XmlDocument();
xmlDoc.LoadXml(xmlString);
Document doc = new Document();
Paragraph p = new Paragraph(doc);
p.AppendChild(new Run(doc, xmlDoc.InnerText.Replace(ControlChar.CrLf, ControlChar.LineBreak)));
doc.FirstSection.Body.AppendChild(p);
doc.Save("C:\\Temp\\out.docx", SaveFormat.Docx);
out.docx (7.1 KB)
In this case the inner XML representation of the document will be the following:
<w:p>
<w:r>
<w:br />
<w:t xml:space="preserve"> eam</w:t>
<w:br />
<w:t xml:space="preserve"> sms-sdk</w:t>
<w:br />
<w:t xml:space="preserve"> spbm</w:t>
<w:br />
<w:t xml:space="preserve"> docs</w:t>
<w:br />
<w:t xml:space="preserve"> java</w:t>
<w:br />
<w:t xml:space="preserve"> JAXWS/samples/javadoc/index.html</w:t>
<w:br />
<w:t xml:space="preserve"> ReferenceGuide</w:t>
<w:br />
<w:t xml:space="preserve"> index.html</w:t>
<w:br />
<w:t xml:space="preserve"> java</w:t>
<w:br />
<w:t xml:space="preserve"> JAXWS</w:t>
<w:br />
<w:t xml:space="preserve"> lib</w:t>
<w:br />
<w:t xml:space="preserve"> samples</w:t>
<w:br />
<w:t xml:space="preserve"> build and run scripts</w:t>
<w:br />
<w:t xml:space="preserve"> wsdl</w:t>
<w:br />
<w:t xml:space="preserve"> pbmService.wsdl</w:t>
<w:br />
<w:t xml:space="preserve"> pbm.wsdl</w:t>
<w:br />
<w:t xml:space="preserve"> ssoclient</w:t>
<w:br />
<w:t xml:space="preserve"> vsphere-ws</w:t>
</w:r>
</w:p>
Alternatively, you can use DocumentBuilder to insert text. In this case \r\n
line breaks will be interpreted as proper paragraph breaks:
Document doc = new Document();
DocumentBuilder builder = new DocumentBuilder(doc);
builder.Write(xmlDoc.InnerText);
doc.Save("C:\\Temp\\out_db.docx", SaveFormat.Docx);
The issues you have found earlier (filed as WORDSNET-25100) have been fixed in this Aspose.Words for .NET 23.4 update also available on NuGet.