Preserve list numbering wile joining document with KEEP_SOURCE_FORMATTING using Java

einkaufidv · November 29, 2019, 10:58am

When appending RTF documents that include an ordered list with lowercase letters then the list of the imported RTF document is continued. However, I expected the list to restart for each imported document.

Template_with_list.zip (11.7 KB)

When appending the same document several times (at least 3 times), then it lookes like this:

Some text
a) Item 1
b) Item 2
c) Item 3

Some text.
d) Item 1
e) Item 2
f) Item 3

Some text.
g) Item 1
h) Item 2
i) Item 3

instead of:

Some text
a) Item 1
b) Item 2
c) Item 3

Some text.
a) Item 1
b) Item 2
c) Item 3

Is there a way to fix that problem?

I used the following method for appending RTF documents:

Document dstDocument = new Document(new ByteArrayInputStream(byteArrayOfDstFile));
Document srcDocument = new Document(new ByteArrayInputStream(byteArrayOfSrcFile));

srcDocument.getFirstSection().getPageSetup().setSectionStart(SectionStart.NEW_PAGE);
srcDocument.getFirstSection().getHeadersFooters().linkToPrevious(false);
srcDocument.getFirstSection().getPageSetup().setRestartPageNumbering(true);

srcDocument.getSections().forEach(section -> {

    Node importNode = dstDocument.importNode(section, true, ImportFormatMode.KEEP_SOURCE_FORMATTING);

    dstDocument.appendChild(importNode);

});

SaveOptions saveOptions = SaveOptions.createSaveOptions(SaveFormat.RTF);
dstDocument.save(outputStream, saveOptions);

return outputStream.toByteArray();

Many thanks in advance.

tahir.manzoor · November 29, 2019, 2:06pm

@einkaufidv

In your case, we suggest you following solution.

After inserting documents multiple times. Iterate over lists of document and set the value of List.IsRestartAtEachSection property to true.
Save the document with OoxmlCompliance higher then Ecma376_2006.

Please check the following code snippet. Hope this helps you.

List list = dstDocument.getLists().get(0);

// Set true to specify that the list has to be restarted at each section.
list.isRestartAtEachSection(true);

OoxmlSaveOptions options = new OoxmlSaveOptions();
options.setCompliance(OoxmlCompliance.ISO_29500_2008_TRANSITIONAL); 

dstDocument.save(MyDir + "output.docx",options);

einkaufidv · December 12, 2019, 12:14pm

Unfortunately your suggestion didn’t work for me. If I also concatenate RTF documents with numbered lists then another problem occured. I obtain a document where all lists are of the same type.

It looks like:

Some text
a) Item 1
b) Item 2
c) Item 3

Some text.
a) Item 1
b) Item 2
c) Item 3

instead of:

Some text.
a) Item 1
b) Item 2
c) Item 3

Some text.
1) Item 1
2) Item 2
3) Item 3

Is there maybe another solution?

tahir.manzoor · December 12, 2019, 4:04pm

@einkaufidv

To ensure a timely and accurate response, please attach the following resources here for testing:

Your input Word document.
Please attach the output Word file that shows the undesired behavior.
Please attach the expected output Word file that shows the desired behavior.
Please create a simple Java application ( source code without compilation errors ) that helps us to reproduce your problem on our end and attach it here for testing.

As soon as you get these pieces of information ready, we will start investigation into your issue and provide you more information. Thanks for your cooperation.

PS: To attach these resources, please zip and upload them.

einkaufidv · December 13, 2019, 8:54am

RTF_documents.zip (44.1 KB)

Thanks for your reply. I attached the 2 documents, which I haved used for concatenation. As destination document I have used “Template_with_list1.rtf”., which contains a list with small letters. Then I appended the the same document to the destination document and afterwards I appended two times the document “Template_with_list2.rtf”, which contains a numbered list. As output I expected a document with two lists containing letters (each list starting with a, b, c) and two lists containing numbers (each list starting with 1, 2, 3). Instead I obtained a document with lists that all containing letters, but at least they all start with a, b, c. I used the following source code for concatenating RTF documents.

Document dstDocument = new Document(new ByteArrayInputStream(byteArrayOfDstFile));
Document srcDocument = new Document(new ByteArrayInputStream(byteArrayOfSrcFile));

srcDocument.getFirstSection().getPageSetup().setSectionStart(SectionStart.NEW_PAGE);
srcDocument.getFirstSection().getHeadersFooters().linkToPrevious(false);
srcDocument.getFirstSection().getPageSetup().setRestartPageNumbering(true);

// Keep track of the lists that are created.
HashMap<Integer, List> newLists = new HashMap<>();

// Iterate through all paragraphs in the document.
for (Paragraph para : (Iterable<Paragraph>)srcDocument.getChildNodes(NodeType.PARAGRAPH, true)) {

    if (para.isListItem()) {
        int listId = para.getListFormat().getList().getListId();

        // Check if the destination document contains a list with this ID already. If it does then this may
        // cause the two lists to run together. Create a copy of the list in the source document instead.
        if (dstDocument.getLists().getListByListId(listId) != null) {

            List currentList;
            // A newly copied list already exists for this ID, retrieve the stored list and
            // use it on the current paragraph.
            if (newLists.containsKey(listId)) {
                currentList = newLists.get(listId);
            } else {
                // Add a copy of this list to the document and store it for later reference.
                currentList = srcDocument.getLists().addCopy(para.getListFormat().getList());
                currentList.isRestartAtEachSection(true);
                newLists.put(listId, currentList);
            }

            // Set the list of this paragraph to the copied list.
            para.getListFormat().setList(currentList);
        }
    }
}

srcDocument.getSections().forEach(section -> {

    Node importNode = dstDocument.importNode(section, true, ImportFormatMode.KEEP_SOURCE_FORMATTING);
    dstDocument.appendChild(importNode);
});

SaveOptions saveOptions = SaveOptions.createSaveOptions(SaveFormat.RTF);
dstDocument.save(outputStream, saveOptions);

return outputStream.toByteArray();

If you need further information please let me know.

Best regards and many thanks in advance

tahir.manzoor · December 13, 2019, 6:47pm

@einkaufidv

The shared code example join only two documents. Please share the complete code example that you are using to generate the ‘actual_output.rtf’. Thanks for your cooperation.

einkaufidv · December 16, 2019, 9:16am

I use the above mentioned code block as method, which is called several times.

public static byte[] concatenateDocuments(byte[] destination, byte[] source) {
    try (ByteArrayOutputStream outputStream = new ByteArrayOutputStream()) {    
        ...
        return outputStream.toByteArray();
    } catch (Exception ex) {
        ...
    }
}

For test reasons you can use the following code for concatenation. You just need to adapt the path where the test files are located and also where you want to save the output file.

ConcatenationMain.java.zip (2.3 KB)

tahir.manzoor · December 16, 2019, 5:04pm

@einkaufidv

We have tested the scenario and have managed to reproduce the same issue at our side. For the sake of correction, we have logged this problem in our issue tracking system as WORDSNET-19694. You will be notified via this forum thread once this issue is resolved.

We apologize for your inconvenience.

tahir.manzoor · January 1, 2020, 8:01pm

@einkaufidv

It is to inform you that the issue which you are facing is actually not a bug in Aspose.Words. So, we have closed this issue (WORDSNET-19694) as ‘Not a Bug’.

Please use ImportFormatOptions.KeepSourceNumbering property as shown below to get the desired output.

Document dstDoc = new Document(MyDir + "Template_with_list1.rtf");
Document srcDoc1 = new Document(MyDir + "Template_with_list1.rtf");
Document srcDoc2 = new Document(MyDir + "Template_with_list2.rtf");

ImportFormatOptions options = new ImportFormatOptions();
options.setKeepSourceNumbering(true);

dstDoc.appendDocument(srcDoc1, ImportFormatMode.USE_DESTINATION_STYLES, options);
dstDoc.appendDocument(srcDoc2, ImportFormatMode.USE_DESTINATION_STYLES, options);

dstDoc.save(MyDir + "19.12.rtf");

einkaufidv · January 7, 2020, 10:58am

Thank you for your effort to find a solution. Indeed, it’s working now. The lists are restarting with each section as expected. That is great news. However, I observed another issue when merging documents that contain headers and footers. I attached the documents that I have used for testing. They both have headers and footers that contain text in Arial and also a symbol (small dot). After merging document 1 and document 2 and then appending document 2 again. Formatting of headers and footers have changed. The text is now in Times New Roman and also the symbol gets lost. However, the symbol is present in the last section of the output file. Which seems very strange. I have used the following source code:

Document dstDoc = new Document(MyDir + "Document_with_list_header_and_footer1.rtf");
Document srcDoc1 = new Document(MyDir + "Document_with_list_header_and_footer2.rtf");
Document srcDoc2 = new Document(MyDir + "Document_with_list_header_and_footer2.rtf");

srcDoc1.getFirstSection().getPageSetup().setSectionStart(SectionStart.NEW_PAGE);
srcDoc1.getFirstSection().getHeadersFooters().linkToPrevious(false);
srcDoc1.getFirstSection().getPageSetup().setRestartPageNumbering(true);
srcDoc1.getFootnoteOptions().setRestartRule(FootnoteNumberingRule.RESTART_SECTION);

srcDoc2.getFirstSection().getPageSetup().setSectionStart(SectionStart.NEW_PAGE);
srcDoc2.getFirstSection().getHeadersFooters().linkToPrevious(false);
srcDoc2.getFirstSection().getPageSetup().setRestartPageNumbering(true);
srcDoc2.getFootnoteOptions().setRestartRule(FootnoteNumberingRule.RESTART_SECTION);

ImportFormatOptions options = new ImportFormatOptions();
options.setKeepSourceNumbering(true);

dstDoc.appendDocument(srcDoc1, ImportFormatMode.KEEP_SOURCE_FORMATTING,
                    importFormatOptions);
dstDoc.appendDocument(srcDoc2, ImportFormatMode.KEEP_SOURCE_FORMATTING,
                    importFormatOptions);

SaveOptions saveOptions = SaveOptions.createSaveOptions(SaveFormat.RTF);
dstDoc.save(MyDir + "output-file.rtf", saveOptions);

RTF_documents.zip (107.0 KB)

tahir.manzoor · January 7, 2020, 6:10pm

@einkaufidv

Please note that Aspose.Words mimics the behavior of MS Word. If you append the documents using MS Word, you will get the same output.

einkaufidv · January 9, 2020, 11:12am

That’s true, but if I use the following method

srcDocument.getSections().forEach(section -> {

    Node importNode = dstDocument.importNode(section, true, ImportFormatMode.KEEP_SOURCE_FORMATTING);
    dstDocument.appendChild(importNode);
});

for merging RTF documents the formatting of the headers and footnotes doesn’t get lost. The text of the headers and footers stays in Arial as expected. However, since it’s not possible to use ImportFormatOptions as method parameter when using this method, I observe the issue with ordered lists as mentioned above.

Indeed, if I use the method

ImportFormatOptions options = new ImportFormatOptions();
options.setKeepSourceNumbering(true);
dstDoc.appendDocument(srcDoc, ImportFormatMode.KEEP_SOURCE_FORMATTING, options);

merging of documents with orderd lists works fine. Unfortunately, using this method means, that the font of headers and footers may change.

Could it be possible to support ImportFormatOptions as method parameter for the importNode() function? This could help me a lot. Or is there another possible way so that the font of headers and footers doen’t get lost after merging documents?

tahir.manzoor · January 9, 2020, 6:39pm

@einkaufidv

The behavior of ImportNode and AppendDocument method should be same when ImportFormatMode is KeepSourceFormatting. For the sake of correction, we have logged this problem in our issue tracking system as WORDSNET-19796 . You will be notified via this forum thread once this issue is resolved.

We apologize for your inconvenience.

einkaufidv · February 4, 2020, 8:04am

Is there any news concerning WORDSNET-19796 ?

tahir.manzoor · February 4, 2020, 2:31pm

@einkaufidv

Currently, your issue is under analysis phase. Once we complete the analysis of this issue, we will then provide you the ETA of this issue. Thanks for your patience.

aspose.notifier · April 22, 2020, 9:06am

The issues you have found earlier (filed as WORDSNET-19796) have been fixed in this Aspose.Words for .NET 20.4 update and this Aspose.Words for Java 20.4 update.

tahir.manzoor · May 30, 2020, 4:04pm

@einkaufidv

Regarding WORDSNET-19796, please use the following code example to get the desired output.

Document dstDoc = new Document(MyDir + "Document_with_list_header_and_footer1.rtf");
Document srcDoc1 = new Document(MyDir + "Document_with_list_header_and_footer2.rtf");
Document srcDoc2 = new Document(MyDir + "Document_with_list_header_and_footer2.rtf");

ImportFormatOptions options = new ImportFormatOptions();
options.KeepSourceNumbering = true;
NodeImporter nodeImporter1 = new NodeImporter(srcDoc1, dstDoc, ImportFormatMode.KeepSourceFormatting, options);

foreach (Section section in srcDoc1.Sections)
{
    Node importNode = nodeImporter1.ImportNode(section, true);
    dstDoc.AppendChild(importNode);
}

NodeImporter nodeImporter2 = new NodeImporter(srcDoc2, dstDoc, ImportFormatMode.KeepSourceFormatting, options);

foreach (Section section in srcDoc2.Sections)
{
    Node importNode = nodeImporter2.ImportNode(section, true);
    dstDoc.AppendChild(importNode);
}

dstDoc.Save(MyDir + "output using importNode and NodeImporter.rtf");