Appending multiple WORD documents and preserving paragraph list identifiers with ImportFormatOptions.setKeepSourceNumbering

tkalactomo · December 7, 2022, 12:32pm

Greeting,
I have the task of merging several WORD documents that are actually equal in their content
(or are originating from same template).

When merging such WORD documents, I explicitly use ImportFormatOptions and setKeepSourceNumbering(false) so that paragraphs that are lists and have the same list id keep their numbering.
The same happens for all paragraphs that are second and later in order within the document, the rule does not apply to the first paragraph.

I use Aspose Words version 22.10 and the current fix is that I added a dummy list paragraph that is “invisible” to the first place in the document, has font size 1 and color white so that each subsequent paragraph list keeps the correct list instance id.

I am attaching the code with which the above situation can be reproduced (the path to the files must be changed):

package hr.spi.logic.lcfmw.converters;

import com.aspose.words.*;
import org.apache.commons.io.FileUtils;

import java.io.*;
import java.net.URISyntaxException;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.util.ArrayList;
import java.util.List;

public class AsposeWordsAppendForumTest {

    private String outputDirectoryLocation = "D:\\temp\\";

    public static void main(String[] args) throws Exception {
        AsposeWordsAppendForumTest test = new AsposeWordsAppendForumTest();
        test.test();
    }

    public void test() throws Exception {
        loadLicence();

        testWorks();
        testWorksNot();
    }

    public void testWorks() throws Exception {
        byte[] wordBytes = getTestData("word/asposeForum/TestWorks.docx");

        FileUtils.writeByteArrayToFile(new File(outputDirectoryLocation + "TestWorks_merged.docx"), wordBytes);
    }

    public void testWorksNot() throws Exception {
        byte[] wordBytes = getTestData("word/asposeForum/TestWorksNot.docx");

        FileUtils.writeByteArrayToFile(new File(outputDirectoryLocation + "TestWorksNot_merged.docx"), wordBytes);
    }

    private byte[] getTestData(String fileLocation) throws Exception {
        byte[] sourceDoc = loadResource(this.getClass(), fileLocation);

        List<byte[]> inputDocuments = new ArrayList<>();
        inputDocuments.add(loadResource(this.getClass(), fileLocation));
        inputDocuments.add(loadResource(this.getClass(), fileLocation));
        inputDocuments.add(loadResource(this.getClass(), fileLocation));

        byte[] wordBytes = assembleDocuments(sourceDoc, inputDocuments);

        return wordBytes;
    }

    public byte[] assembleDocuments(byte[] sourceDoc, List<byte[]> inputDocuments) throws Exception {
        Document srcDoc;

        try (ByteArrayInputStream sourceIS = new ByteArrayInputStream(sourceDoc)) {
            srcDoc = new Document(sourceIS);

            for (byte[] input : inputDocuments) {
                try (ByteArrayInputStream inputStream = new ByteArrayInputStream(input)) {
                    Document inputDoc = new Document(inputStream);

                    ImportFormatOptions importFormatOptions = new ImportFormatOptions();
                    importFormatOptions.setKeepSourceNumbering(false);

                    srcDoc.appendDocument(inputDoc, ImportFormatMode.KEEP_DIFFERENT_STYLES, importFormatOptions);
                }
            }
        }

        try (ByteArrayOutputStream outStream = new ByteArrayOutputStream()) {
            srcDoc.save(outStream, SaveFormat.DOCX);
            return outStream.toByteArray();
        }
    }

    private void loadLicence() throws Exception {
        String licenceString = "dummyLicenceStringNotWorking";
        License license = new License();

        try (ByteArrayInputStream stream = new ByteArrayInputStream(licenceString.getBytes())) {
            license.setLicense(stream);
        }
    }

    public byte[] loadResource(Class<?> klazz, String path) throws IOException, URISyntaxException {
        return Files.readAllBytes(Paths.get(klazz.getClassLoader().getResource(path).toURI()));
    }
}

The file TestWorks.docx (13.9 KB)
is merged multiple times in the specified way into one file and it contains a “dummy” paragraph on line 3.
By combining it, the desired state is obtained TestWorks_merged.docx (11.7 KB):

Test with „dummy“ list paragraph

1	This is invisible
URBROJ: 238-27-06/04-22-11111	Should continue on next page

URBROJ: 238-27-06/04-22-1	Should continue on next page
URBROJ: 238-27-06/04-22-2	Should continue on next page
URBROJ: 238-27-06/04-22-3	Should continue on next page

URBROJ: 238-27-06/04-22-762	Should continue on next page
URBROJ: 238-27-06/04-22-763	Should continue on next page

URBROJ: 238-27-06/04-22-762	Should continue on next page

<<Next page>>

1	This is invisible
URBROJ: 238-27-06/04-22-11112	Should continue on next page

URBROJ: 238-27-06/04-22-4	Should continue on next page
URBROJ: 238-27-06/04-22-5	Should continue on next page
URBROJ: 238-27-06/04-22-6	Should continue on next page

URBROJ: 238-27-06/04-22-764	Should continue on next page
URBROJ: 238-27-06/04-22-765	Should continue on next page

URBROJ: 238-27-06/04-22-763	Should continue on next page

The file TestWorksNot.docx (14.0 KB)
is merged multiple times in the specified way into one file and it does not contain a “dummy” paragraph.
The expected state from my side is that the merged WORD document, given that ImportFormatOptions was used and setKeepSourceNumbering(false), should contain on the next page all paragraphs with the corresponding instance of the list from the original document, including first paragraph and all later TestWorksNot_merged.docx (11.7 KB):

Test without „dummy“ list paragraph
 

URBROJ: 238-27-06/04-22-11111	Should continue on next page

URBROJ: 238-27-06/04-22-1	Should continue on next page
URBROJ: 238-27-06/04-22-2	Should continue on next page
URBROJ: 238-27-06/04-22-3	Should continue on next page

URBROJ: 238-27-06/04-22-762	Should continue on next page
URBROJ: 238-27-06/04-22-763	Should continue on next page

URBROJ: 238-27-06/04-22-762	Should continue on next page

<<Next page>>

URBROJ: 238-27-06/04-22-11111	Should continue on next page

URBROJ: 238-27-06/04-22-4	Should continue on next page
URBROJ: 238-27-06/04-22-5	Should continue on next page
URBROJ: 238-27-06/04-22-6	Should continue on next page

URBROJ: 238-27-06/04-22-764	Should continue on next page
URBROJ: 238-27-06/04-22-765	Should continue on next page

URBROJ: 238-27-06/04-22-763	Should continue on next page

Please help me how to get the desired state or is it a shortcoming within the ASPOSE library?

Thank you in advance and have a nice day!

alexey.noskov · December 7, 2022, 1:49pm

@tkalactomo Thank you for reporting the problem to us. I have managed to reproduce it on my side. For a sake of correction it has been logged as WORDSNET-24688. We will keep you posted and let you know once it is resolved or we have more information for you.