List labels not getting reatained in splitted document

I am splliting a large document into smaller document using document.extractPages() with below code:

private List<Document> splitDocumentWithAspose(Document document, int noOfSplits, int totalPageCount) throws Exception {
        log.info("splitDocumentWithAspose - split document into {} sub-document", noOfSplits);
        List<Document> documents = new ArrayList<>();
        int splitDocPageLimit = 10;
        for (int split = 0; split < noOfSplits; split++) {
            int startIndex = split * splitDocPageLimit;
            int endIndex = startIndex + splitDocPageLimit;
            int count = totalPageCount > endIndex ? splitDocPageLimit : (totalPageCount - startIndex);
            Document splitDoc = document.extractPages(startIndex, count);
            log.debug("splitDocumentWithAspose - adding splitted document at position {} into documentList", split);
            documents.add(splitDoc);
        }
        log.info("splitDocumentWithAspose - total number of split documents are: {} ", lazy(documents::size));
        return documents;
    }

while iterating the paragraph in splitted document and fetching the list label, I am not getting the same label as in original document. Using below code to fetch the list label :

if (paragraph.isListItem()) {
    String listLabel = paragraph.getListLabel().getLabelString();
    log.info("the text label : {}", listLabel);
}

@SATISHSATYAEESH

Cause

The issue you’re experiencing with list labels not being retained in the split documents may stem from how the extractPages method handles the document structure, particularly with lists and their formatting. When splitting a document, the context of the list may not be preserved correctly, leading to missing or incorrect labels in the resulting sub-documents.

Solution

To ensure that list labels are retained in the split documents, you can try the following approaches:

  1. Check List Formatting: Ensure that the list formatting is correctly applied in the original document. Sometimes, if the list is not properly defined, it may not carry over during the extraction process.

  2. Use ImportFormatOptions: When importing nodes from the original document to the new document, consider using an ImportFormatOptions object to maintain the source formatting, including lists. This can help preserve the list structure and labels.

  3. Update Fields: After splitting the document, you may need to update the fields in the new documents to refresh the list labels. This can be done using the Document.updateFields() method.

Code

Here’s an example of how you might implement the above suggestions:

private List<Document> splitDocumentWithAspose(Document document, int noOfSplits, int totalPageCount) throws Exception {
    List<Document> documents = new ArrayList<>();
    int splitDocPageLimit = 10;

    for (int split = 0; split < noOfSplits; split++) {
        int startIndex = split * splitDocPageLimit;
        int endIndex = startIndex + splitDocPageLimit;
        int count = totalPageCount > endIndex ? splitDocPageLimit : (totalPageCount - startIndex);
        
        Document splitDoc = document.extractPages(startIndex, count);
        
        // Update fields to refresh list labels
        splitDoc.updateFields();
        
        documents.add(splitDoc);
    }
    return documents;
}

Next Steps

  1. Implement the above code and test to see if the list labels are retained in the split documents.
  2. If the issue persists, check the original document for any inconsistencies in list formatting.
  3. Review the Aspose documentation for any updates or changes related to list handling in the version you are using.

Please proceed with these steps, and let me know if you need further assistance!

@SATISHSATYAEESH Could you please attach your problematic input and output documents here for testing? We will check the issue and provide you more information.

Here I attached the document:
613079.docx (2.5 MB)

@ritikrajjalu Thank you for additional information. There are 159 pages in your document. Could you please specify on which page the problem can be observed?

Hi @alexey.noskov

PFB the latest document and splitted document with screenshot where list label changed.

Latest Code Snippet:

public List<Document> splitDocumentWithAspose(Document document, int noOfSplits, int totalPageCount) throws Exception {
        log.info("splitDocumentWithAspose - split document into {} sub-document", noOfSplits);
        List<Document> documents = new ArrayList<>();
        for (int split = 0; split < noOfSplits; split++) {
            int startIndex = split * AsposePropertySetterConstants.SPLIT_DOC_PAGE_LIMIT;
            int endIndex = startIndex + AsposePropertySetterConstants.SPLIT_DOC_PAGE_LIMIT;
            int count = totalPageCount > endIndex ? AsposePropertySetterConstants.SPLIT_DOC_PAGE_LIMIT : (totalPageCount - startIndex);
            Document splitDoc = document.extractPages(startIndex, count);
            var fileName = String.format("/home/satishgupta/split_%s.docx", split+1);
            splitDoc.save(fileName);
            log.debug("splitDocumentWithAspose - adding splitted document at position {} into documentList", split);
            documents.add(splitDoc);
        }
        log.info("splitDocumentWithAspose - total number of split documents are: {} ", lazy(documents::size));
        return documents;
    }

parameters:

noOfSplits = 3 
SPLIT_DOC_PAGE_LIMIT = 30
totalPageCount = 61

In the attached original document, the heading with text " Omitted Services" has a list label “3”, but in the split document, it has changed to list label “2”(refer split_2.docx document). Due to this, the levels of the consecutive paragraphs have changed.

Please find below the original document, the split documents, and screenshots from the original document and split_2, where the issue is observed.

split_3.docx (37.1 KB)

split_2.docx (81.4 KB)

split_1.docx (88.3 KB)

OriginalDocument.docx (134.0 KB)
Original:


Split:

@SATISHSATYAEESH
We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.

Issue ID(s): WORDSNET-28504

You can obtain Paid Support Services if you need support on a priority basis, along with the direct access to our Paid Support management team.

@alexey.noskov We are using Aspose Java library. The internal bug raised looks to be of .NET. Please check at your end once.

@AdityaSirion .NET version of Aspose.Words is the main version, so all fixes are first implemented in .NET version and then ported to Java.

1 Like