When copying special headers from an input docx file to a destination file, tabs get inserted and numberings get ruined

guenther.zwetti · December 6, 2021, 8:33pm

Hello!

After a very good hint from @sergey.lobanov we made a big step forward. Now section copy and copy operations for different node types really work fine helping us to generate a docx output file out of one (or more) docx input files (I only added code for just one input file as the error can also be reproduced for a very simple use case).

Unfortunately we found another interesting input file that results in an unexpected result shown here.error.png (6.6 KB)

I attached input and output docx file and also the complete code for generating output file.

Could you please give us an advice what’s going wrong here and how we could fix this kind of problem?

Thanks a lot for your help in advance,
Kind regards, Günther

headerNumbering.zip (32.8 KB)

guenther.zwetti · December 6, 2021, 8:42pm

Additional info: I use the Aspose words library (Aspose words for Java, version 21.10) to create output docx files from input docx files. In a real world example the input docx file contains placeholders which will be replaced by content retrieved from an underlying database, but this works fine and is not part of the problem reported.

Also see When copying sections and nodes from an input docx file to a destination file, line numberings get ruined and additional paragraphs are added for the first issue reported by me and possibly related to this topic. Nevertheless the new problem can only be reproduced using numberings or bullet points in the header of a docx file.

In such a case tabs will automatically be inserted and in some other more complex use cases with such a header it leads to a big mess in the body of the docx file.

sergey.lobanov · December 7, 2021, 12:51am

@guenther.zwetti,
This issue is related to different compatibility options of your input and output documents. To get the desired result please use CompatibilityOptions.OptimizeFor method as shown below:

outputDocument.getCompatibilityOptions().optimizeFor(MsWordVersion.WORD_2016);

guenther.zwetti · December 7, 2021, 8:30am

@sergey.lobanov perfect!
First question: Is there a special reason why you set it to 2016? I set it to MsWordVersion.WORD_2019 and for my currently known use cases this seems to work (it works fine for at least 2016 and 2019).
Or should we always set the compatibility options to 2019?

And just one more question: Is there a way to check the input documents in order to find out the version? If so, we would set this version as the compatibility version of the output document. Or don’t you think this to be a good idea?

I tried out srcDoc.getBuiltInDocumentProperties().getVersion() >> 16 to get version out of input document but always get 16 by saving my input template file with Office 365 (even if I save it using “Save as Word 97-2003”). Is there any known way to set output document compatibility mode depending on input template MS word version or does this not make sense?

sergey.lobanov · December 7, 2021, 4:19pm

@guenther.zwetti,

The 2019 version is the newer one, so there may appear some problems with editing such document in previous versions of MS Word. In case of your issue, the layout of your document looks fine for both 2016 and 2019 versions, so you can use both of them.
The getBuiltInDocumentProperties().getVersion() property represents the version number of an application that created the document. If you create a DOC file using Office 365, the version property will be 16.
This way seems to work fine for the DOCX documents, because their compatibility options depend on version of an application, they were created in.

guenther.zwetti · December 7, 2021, 4:21pm

Thanks again for the quick response and the perfect support, @sergey.lobanov!

guenther.zwetti · December 17, 2021, 7:38pm

importMode.zip (140.6 KB)

Hi again, @sergey.lobanov!

Most of our tests are finished and nearly 99% of our tests succeeded.
Unfortunately there’s one more problem I’m not able to solve.

As you can see in my attached example I have one input file only using font Arial 10.

By using ImportMode KEEP_DIFFERENT_STYLES the first part of my output document suddenly uses Arial 11 instead of Arial 10 and the other part works as expected.

By using ImportMode KEEP_SOURCE_FORMATTING the first part of my output document works as expected whereas the second part suddenly uses Times New Roman 10 instead of Arial 10.

So, two questions arises:
(1) Is there any way to create an output file that is 100% correct by using the attached input file? I really have no idea what’s going wrong here and how to fix it.
(2) As I am creating a totally new output file for me the import format KEEP_SOURCE_FORMATTING seems to be the only correct import format to be used as there’s no “concurring” file forcing me to use KEEP_DIFFERENT_STYLES. Is this correct or do I interpret the names of those two formats in a wrong way?

Additional info: There’s no difference using MsWordVersion.WORD_2016 or MsWordVersion.WORD_2019.

Thanks for your highly appreciated help,
Kind regards, Günther

sergey.lobanov · December 18, 2021, 2:32am

@guenther.zwetti,
Please use the following code example to use the font settings from the source document in the destination document instead of default font settings, to get the desired result:

Document doc = new Document();

Document newDoc = new Document("C:\\Temp\\input\\input.docx");

Paragraph firstPara = (Paragraph)doc.getChild(NodeType.PARAGRAPH, 0, true);
Paragraph firstParaNew = (Paragraph)newDoc.getChild(NodeType.PARAGRAPH, 0, true);
firstPara.getParagraphFormat().getStyle().getFont().setName(firstParaNew.getParagraphFormat().getStyle().getFont().getName());
firstPara.getParagraphFormat().getStyle().getFont().setSize(firstParaNew.getParagraphFormat().getStyle().getFont().getSize());
doc.removeAllChildren();

doc.appendDocument(newDoc, KEEP_SOURCE_FORMATTING);

doc.save("C:\\Temp\\out.docx");

guenther.zwetti · December 18, 2021, 12:16pm

Thanks a lot again, @sergey.lobanov. As always your tipps perfectly work! Thanks a lot!
Just one more question: What’s the difference between ImportMode KEEP_DIFFERENT_STYLES and KEEP_SOURCE_FORMATTING? By taking over inputs from an input document and writing to an (initially empty) output document, I’d always use KEEP_SOURCE_FORMATTING. So when do I need to use KEEP_DIFFERENT_STYLES? KR, Günther

sergey.lobanov · December 20, 2021, 2:24am

@guenther.zwetti,
When using KEEP_SOURCE_FORMATTING, if a matching style already exists in the destination document, the source style formatting is expanded into direct Node attributes and the style is changed to Normal. The drawback of using KEEP_SOURCE_FORMATTING is that if you perform several imports, you could end up with many styles in the destination document and that could make using consistent style formatting in Microsoft Word difficult for this document.

Using KEEP_DIFFERENT_STYLES option allows to reuse destination styles if the formatting they provide is identical to the styles in the source document. If the style in destination document is different from the source then it is imported.