Issue in Aspose Word 18.7

Hi,

Previously we were using Aspose Word 17.4 and it was working fine but due to recent upgrade to 18.7 version, we are seeing this issue. I know the latest version is 18.10, but we are trying to avoid that route. because then we have to test each and everything in our product for quality assurance purposes.

We have a process where we insert Custom XML ( that contains the data ) into the word document and then save it as a docx. As a result of a merging process, Word replaces the content controls with the XML data. Sometimes this XML data contains carriage returns/line feed. for example:

  <DataParts>
    <DataPartText type="Text">123456 any street


	City, Any State, 12345</DataPartText>
    <DataPartText>Walter</DataPartText>
  </DataParts>

When we save the document as DOCX using Aspose 17.4, document.xml looks like :

<w:t>123456 any street</w:t>
<w:br />
<w:br />
<w:br />
<w:t>City, Any State, 12345</w:t>

But when we save the document using Aspose 18.7, document.xml doesn’t have “BR”:

<w:t>123456 any street City, Any State, 12345</w:t>

Looks like the newer version of Aspose Word ignores the carriage return/Line feed character in the XML. Is there any work around for this ?

What is Document.xml:

Ignore this section if you already know what is document.xml. Here are the steps to get the document.xml

1- change document extension from .DOCX to .ZIP
2- Extract the ZIP file.
3- Open the folder. Goto “word”, this folder should have document.xml

@imran.khan1,

Thanks for your inquiry. To ensure a timely and accurate response, please ZIP and attach the following resources here for testing:

  • Your simplified input Word document
  • XML data file
  • Aspose.Words 18.7 generated output DOCX file showing the undesired behavior
  • Your expected DOCX Word document. We will investigate the structure of your expected document as to how you want your final output be generated like. You can create expected document by using Aspose.Words 17.4.
  • Please also create a standalone simple console application (source code without compilation errors) that helps us to reproduce your problem on our end and attach it here for testing. Please do not include Aspose.Words.dll files in it to reduce the file size.

As soon as you get these pieces of information ready, we will start investigation into your issue and provide you more information. Thanks for your cooperation.

I have made it simpler, instead of generating docx and looking into document.xml. I have generated PDF files using 2 different versions of ASPOSE.WORD. The PDF file generated using 18.7 version doesn’t have carriage return\line feed. I have attached following files:

  1. MultilineData.zip - Source Code that reads “AsposeMultiline.docx” file, replaces the custom XML with “AsposeMultiline_Data.xml” and generates PDF file.

  2. AsposeMultiline.docx - Word Template ( Input Word Document )

  3. AsposeMultiline_Data.xml - XML data file

  4. Output_AsposeMultiline_17_4.pdf - PDF file generated using Aspose Word 17.4 DLL ( Expected Result )

  5. Output_AsposeMultiline_18_7.pdf - PDF file generated using Aspose Word 18.7 DLL

AsposeMultiline.zip (81.6 KB)

Please let me know if you need anything else.
Thanks
Imran

@imran.khan1,

We tested the scenario and have managed to reproduce the same problem on our end. For the sake of correction, we have logged this problem in our issue tracking system. The ID of this issue is WORDSNET-17609. We will further look into the details of this problem and will keep you updated on the status of correction. We apologize for your inconvenience.

Thank you for the quick response.

Until we get the fix, is there any work around for that ?

Also, any idea when this will be fixed ? any timeline ?

Thanks
Imran

@imran.khan1,

Unfortunately, your issue is not resolved yet. This issue is currently pending for analysis and is in the queue. Once the analysis of the issue is completed and root cause is determined, we may then be able to share estimates or workaround with you. We apologize for your inconvenience.

@imran.khan1,

Regarding WORDSNET-17609, we have completed the analysis of your issue and come to a conclusion that this issue is actually not a bug in latest versions of Aspose.Words. Please see the following analysis details:

You generate document by using the “DocumentFormat.OpenXml” and “System.Xml” functionality. Aspose.Words just opens the resulted document (result of your “MergeTemplate” method) and converts it to PDF. When you open generated document in the MS Word you will see that text is represented by single line. Please see this document clientCodeGenereated.zip (11.7 KB). So, current behavior is not a bug and there is not a regression.

Aspose Words 17.4 behavior changed with WORDSNET-15134. Problematic document contains “StructuredDocumentTag” with the “SdtType.PlainText” type. So, when it is necessary to put multi-line text, then for “StructuredDocumentTag” has to be set “Multiline” property to “true”.

Possible solutions for your case:

  1. Change the template (use multi-line “StructuredDocumentTag”).
    Expected the following tag in the “sdtPr” element of the markup:

<w:text w:multiLine="true"/>

Second, Update problematic “StructuredDocumentTag” at run time:

var pdfDoc = new Document(stream);
// Retrieve required "StructuredDocumentTag" and update its setting.
StructuredDocumentTag std =
  (StructuredDocumentTag)doc.FirstSection.Body.GetChild(NodeType.StructuredDocumentTag, 0, true);
std.Multiline = true;
pdfDoc.Save("C:\\Temp\\Output_AsposeMultiline_18_10.pdf");

So, you must use multi-line structured document tag if he wants to preserve /r /n characters.

So it means, none of the old templates will work with the multi-line data ? Unless we convert all those templates to use "multi-line StructuredDocumentTag ?

about the 2nd step, we have to scan the whole document to get that tag. As these templates are desgined by users and data is generated dynamically. How would we know where “StructuredDocumentTag” is ?

@imran.khan1,

Thanks for your inquiry. We are working over your query and will get back to you soon.

@imran.khan1,

Templates will work in the manner like MS Word does.

This is one of the solutions, but not a step of a solution. The code below performs merge and takes in attention that content is multi-line.

const string customNamespace = @"http://www.trizetto.com/tcs/data";
Document doc = new Document(@"AsposeMultiline.docx");
byte[] mergeData = File.ReadAllBytes(@"AsposeMultiline_Data.xml");
// Determine that data to merge are multi line.
bool isMultiline = Encoding.ASCII.GetString(mergeData).Contains(Environment.NewLine);

NodeCollection sdts = doc.FirstSection.Body.GetChildNodes(NodeType.StructuredDocumentTag, true);
foreach (StructuredDocumentTag sdt in sdts)
{
    if (sdt.XmlMapping.CustomXmlPart.Schemas.IndexOf(customNamespace) < 0)
        continue;

    // Determine, that the template was not updated.
    if (!Encoding.ASCII.GetString(sdt.XmlMapping.CustomXmlPart.Data).Contains("{{Data1}}"))
        continue;

    sdt.XmlMapping.CustomXmlPart.Data = mergeData;
    sdt.Multiline = isMultiline;
}

doc.Save(@"AsposeMultiline_AW_mereged.docx");

Hope, this helps.

So, these are actually 2 separate solutions, either we modify the word template to use “multi-line” tag OR add some code at the time of PDF generation to set multiline tag. Is my understanding correct ?

We are more leaning towards solution #1.

About my comments :

So it means, none of the old templates will work with the multi-line data ? Unless we convert all those templates to use "multi-line StructuredDocumentTag ?

What I meant is unless we have that multi-line tag in the template, this won’t work. So to make multiline data work in all previous templates, we have to add this tag to all those templates, correct ?

Also, can we use the multiline tag all the time? even if the data we are providing is not multiline ?

Thanks
Imran

@imran.khan1,

Thanks for your questions. We are looking into these and will keep you posted on further updates.

Any update on this ?

@imran.khan1,

The answer is Yes,

Yes, it is how MS Word behaves.

Yes.

About the second question. Sorry, what I wanted to ask was:

So it means, none of the old templates will work without the multi-line tag? If that’s the case, we have to modify all our old templates and insert the multi-line tag. Is my understand correct?

@imran.khan1,

Yes. You must use multi-line structured document tag if you want to preserve /r /n characters.