When copying sections and nodes from an input docx file to a destination file, line numberings get ruined and additional paragraphs are added

guenther.zwetti · November 23, 2021, 10:47am

Hello everybody!

I use the Aspose words library (Aspose words for Java, version 21.10) to create output docx files from input docx files. In a real world example the input docx file contains placeholders which will be replaced by content retrieved from an underlying database, but this works fine and is not part of the problem reported.

The problem I want to report and which I’d need to get resolved seems to be a very simple one.
Nevertheless I could not find any answer why the generated output file does not excactly look like the input file and why the library (or my program, I don’t know ;-)) seems to destroy numbered lists after inserting section breaks. Also I can find additional paragraphs I’ve never inserted programmatically.

So I attached a very simple unit test and 3 input docx files and I also attached the ouput docx files (which can/will be generated by the unit test) and some short explanations on the errors detected within some png files.
linenumbering.zip (257.4 KB)

Could you please take a look to my code and to the error output generated and could you please check what’s going wrong here?

Thanks a lot for your answer in advance,
Kind regards, Günther

sergey.lobanov · November 24, 2021, 4:02am

@guenther.zwetti,
A blank document contains one section, one body and one paragraph. You need to call the “RemoveAllChildren” in the beginning of your program method to remove all those nodes, and end up with a document node with no children. That will fix the problem with additional paragraphs.

Document outputDocument = new Document();
outputDocument.removeAllChildren();

The problem with line numbering appears because of first condition of your loop. Please check updated loop:

for (int currentSectionIndex = 0; currentSectionIndex < inputTemplateNodes.size(); currentSectionIndex++) {
	// loop over all nodes of current input section and add them to body of current output section
	for (int inputNodeIndex = 0; inputNodeIndex < inputTemplateNodes.get(currentSectionIndex)
			.size(); inputNodeIndex++) {
		Node inputNode = inputTemplateNodes.get(currentSectionIndex).get(inputNodeIndex);
		NodeImporter importer = new NodeImporter(inputTemplateDocument, outputDocument,
				ImportFormatMode.KEEP_SOURCE_FORMATTING);
		Node destinationNode = importer.importNode(inputNode, true);
		if (outputDocument.getSections().get(currentSectionIndex) != null)
			outputDocument.getSections().get(currentSectionIndex).getBody().getChildNodes().insert(inputNodeIndex,
					destinationNode);
		else
		{
			outputDocument.getSections().add(new Section(outputDocument));
			outputDocument.getSections().get(currentSectionIndex).getChildNodes().insert(0, new Body(outputDocument));
			outputDocument.getSections().get(currentSectionIndex).getBody().getChildNodes().insert(inputNodeIndex,
					destinationNode);
		}
	}
}

The source of problem with extra line break after page break is not quite clear. But the reason may be in the format of the page break: in the input document page break and line break stay in one line, but when processing with your code, they are considered like two different nodes, and placed in the output document separately from each other.

guenther.zwetti · November 24, 2021, 8:39am

Wow!!! Thanks a lot @sergey.lobanov for this quick and quite perfect answer, great support!

There’s just one thing that still doesn’t work with section breaks:
IMHO the code part

outputDocument.getSections().add(new Section(outputDocument));

inserting a new section does not consider the type of section break from my input document. So I always get the type “section break - next page” even if the input document contains a section break of another type. Could you please tell me how I can take over that important attribute from my input file (my first solution with importNode of a section does not seem to work, seems to be responsible for the wrong line numbering and can not be used as you already suggested). Please see the output document (already generated with your suggested and quite perfect solution). Thanks for your answer in advance! kr, günther
wrong_section_break_type.png (182.5 KB)

sergey.lobanov · November 25, 2021, 1:49am

@guenther.zwetti,
Please check the following updated loop:

for (int currentSectionIndex = 0; currentSectionIndex < inputTemplateNodes.size(); currentSectionIndex++) {
	// loop over all nodes of current input section and add them to body of current output section
	for (int inputNodeIndex = 0; inputNodeIndex < inputTemplateNodes.get(currentSectionIndex)
			.size(); inputNodeIndex++) {
		Node inputNode = inputTemplateNodes.get(currentSectionIndex).get(inputNodeIndex);
		NodeImporter importer = new NodeImporter(inputTemplateDocument, outputDocument,
				ImportFormatMode.KEEP_SOURCE_FORMATTING);
		Node destinationNode = importer.importNode(inputNode, true);

		if (outputDocument.getSections().get(currentSectionIndex) == null)
		{
			Section newSection = inputTemplateDocument.getSections().get(currentSectionIndex);
			newSection.removeAllChildren();
			outputDocument.getSections().insert(currentSectionIndex, importer.importNode(newSection, true));

			outputDocument.getSections().get(currentSectionIndex).getChildNodes().add(new Body(outputDocument));

		}
		outputDocument.getSections().get(currentSectionIndex).getBody().getChildNodes().insert(inputNodeIndex,
				destinationNode);
	}
}

The idea of this code is similar to your first solution. The difference is you don’t import the content of the imported section into document, but remove its children first. Then you insert this blank section to your output document to import the section’s break type.

guenther.zwetti · November 25, 2021, 12:22pm

@sergey.lobanov great answer, great support, seems to work fine!
I’ll do a few more tests, but as far as i can say by now this seems to be the perfect solution. Thanks a lot!