Aspose Words Evaluation License : Word elements not being recognized

I see there might be some issue with recognizing numbered lists in Aspose.Words for Java.
For example I have attached the NDA
NDA 01.docx (24.3 KB)
If you open this document, you may observe that there are numbered lists and some of them even have a hierarchy for a non-numbered ordered list. However, Aspose.Words does not work for both of those lists and instead return ? for that missing item. Now I tried to add the whole warning interface with it to see what was actually happening and here are those logs

WARNING: 65536 source:9
        Description: Import of element 'shapedefaults' is not supported in Docx format by Aspose.Words.
WARNING: 65536 source:9
        Description: Import of element 'shapedefaults' is not supported in Docx format by Aspose.Words.
WARNING: 65536 source:9
        Description: Import of element 'extraClrSchemeLst' is not supported in Docx format by Aspose.Words.

Makes me wonder is this not actually supported by Aspose.Words or am I missing some configuration. I have tried all sorts of load options with MSW Versions so far and none of them seem to work. Moreover, I also tried saving this document in newer versions of Word with Save As and that did not work either.

Here is the extracted text

This Non-Disclosure Agreement (Agreement) is entered into on the date of the last signature set forth below (the ?Effective Date?), between [Company Name], [company address] and WHEREAS, the parties wish to enter into discussions for the purpose [insert description of purpose, such as: of initiating a collaboration and developing research projects of mutWHEREAS, each party may have proprietary interests such as patentable subject matter not yet covered by a patent application, other intellectual property, or other interests whicWHEREAS, in connection with the Project, each party may disclose to the other certain proprietary technical, procedural, or business information which the disclosing party desireAll information disclosed by one party to the other to evaluate the Project and/or Purpose that is designated in writing as ?Confidential? at the time of disclosure or if disclosed orally is designated in writing as ?Confidential? within fifteen (15) days of disclosure is ?Confidential Information.?  Confidential Information does not include information The receiving party agrees to disclose Confidential Information only to their respective employees, agents, or representatives who have been determined to have a need to know and have been advised of their obligation to comply with the terms of this Agreement.  To the extent allowed by the law applicable to the receiving party, the receiving party will bThe receiving party shall take such steps as may be reasonably necessary to prevent disclosure of the Confidential Information to third parties, but shall apply at least the sameThe receiving party will return or destroy Confidential Information provided by the disclosing party upon termination of the Agreement.  The receiving party?s designated representative may maintain one copy of all Confidential Information for the purpose of addressing any claim that may be brought under this Agreement and to comply with any other legal or recordkeeping requirements, and neither party will be obligated to destroy any Confidential Information that is stored electronically on back-up systems or computer hard drivesConfidential Information shall not be provided in any form by the receiving party to any third party without the prior permission of the disclosing party, unless otherwise required by law. In the event receiving party is required to disclose any Confidential Information of disclosing party pursuant to any law or governmental or judicial authority, process or order, receiving party shall provide prompt notice thereof to disclosing party in order that disclosing party can assess its right to seek a protective order or injunctive relief, or otherwise contest disclosure. In the event that such a protective order or other remedy is not obtained, or the disclosing party waives their right to obtain such an orThe receiving party expressly acknowledges that the disclosing party owns the Confidential Information they disclose, and that the transmission by the disclosing party of their Confidential Information (or any third party?s Confidential Information entrusted to the disclosing party) shall not be construed to grant the receiving party any patent, know-how, copyright, trade secret, trademark, or other intellectual property rights in, or arising from, the Confidential Information disclosed. If any such rights are to be granted to tThe disclosing party represents and agrees (i) it has the right to share its Confidential Information with the receiving party, (ii) the receiving party is authorized to use Confidential Information it receives from the disclosing party for the Purpose, and (iii) to the extent allowed by the law applicable to the disclosing party, the disclosing party wiThe parties agree to comply with all applicable laws and regulations including U.S. export control. The disclosing party agrees to notify the receiving party in writing prior to providing receiving party with access to any export regulated information and materials.  Such notification shall include all associated classification numbers.  The receiving paThe term of this Agreement shall begin on the date of the last signature on this Agreement and expire after one (1) year, unless terminated earlier by a party with ten (10) days written notice.  The obligations and restrictions of the receiving party under this Agreement shall continue for a period of three (3) years from the date of termination of this This Agreement shall supersede and prevail over any other prior arrangements, either oral or written, as to the Confidential Information received under this Agreement.  This AgreDate:   ] of Regents of the University of Wisconsin System on Behalf of the University of Wisconsin - Madison date listed below. writing signed by the parties.

Issues:

  1. Punctuations are missing
  2. Numbered lists are missing and their hierarchies as well (but only if they are explicitly those lists, since if they are within the text itself then they show up just fine, for example look at points (i) …)
  3. Text breaks suddenly on multiple lines.

Here’s the code for reference. Meanwhile I will take a look at the API docs.

   LoadOptions loadOptions = new LoadOptions();
        // loadOptions.setMswVersion(MsWordVersion.WORD_2019);
   loadOptions.setWarningCallback(new DocumentLoadingWarningCallback());
   Document existingDocument = new Document(myDir + "NDA 01.docx", loadOptions);
   String documentText = existingDocument.getText();
   System.out.println(documentText);

@ianjwilson

It seems you are experiencing issues with Aspose.Words not recognizing numbered lists and their hierarchies in your document, as well as some missing punctuation and unexpected line breaks. The warnings you are seeing indicate that certain elements in the DOCX format are not supported by Aspose.Words, which could be contributing to the problems you’re facing.

Here are a few suggestions to help you troubleshoot and potentially resolve these issues:

  1. Load Options Configuration: Ensure that you are using the correct load options. You can specify the version of Microsoft Word that Aspose.Words should emulate when loading the document. For example, you can set it to WORD_2019 or WORD_2007 depending on the version you believe the document was created with. This can sometimes help with compatibility issues.

    LoadOptions loadOptions = new LoadOptions();
    loadOptions.setMswVersion(MsWordVersion.WORD_2019); // or WORD_2007
    loadOptions.setWarningCallback(new DocumentLoadingWarningCallback());
    Document existingDocument = new Document("path/to/NDA 01.docx", loadOptions);
    
  2. Warning Callback: You are already using a warning callback to capture any issues during document loading. Make sure to review the warnings carefully, as they can provide insights into what elements are not being processed correctly. The warnings you mentioned about unsupported elements like shapedefaults and extraClrSchemeLst suggest that there may be formatting in the document that Aspose.Words cannot interpret.

  3. Document Structure: If the numbered lists are not being recognized, check the structure of the document in Microsoft Word. Sometimes, lists may not be formatted correctly, or they may be nested in a way that Aspose.Words does not handle well. Try simplifying the list structure or reformatting it in Word before saving it again.

  4. Text Extraction: If you are extracting text and encountering issues with punctuation and line breaks, consider using the getText() method in conjunction with other methods that might preserve formatting better, such as getRange() or iterating through paragraphs and extracting text individually.

  5. Testing with Different Documents: If possible, test with other DOCX files that contain numbered lists to see if the issue is specific to the NDA document or if it occurs with other documents as well. This can help determine if the problem lies with the document itself or with the Aspose.Words library.

If these suggestions do not resolve the issue, you may want to reach out to Aspose support with the specific document and the warnings you are encountering for further assistance.

Sources:
[1]: ExLoadOptions.java

Well the closes to what I need is the getSections by doing this

        String documentText = existingDocument.getSections().get(0).getBody().toString(SaveFormat.TEXT);

however, doing that only solves getting the full text problem and not the getting numbered lists problem.

I found a round about way to do this from the API docs,

public static void processDocuments(Document emptyDocument, DocumentBuilder builder, String myDir, String artifactsDir) throws Exception {
    LoadOptions loadOptions = new LoadOptions();
    loadOptions.setMswVersion(MsWordVersion.WORD_2007);
    loadOptions.setWarningCallback(new DocumentLoadingWarningCallback());
    
    Document existingDocument = new Document(myDir + "NDATEST.docx", loadOptions);
    
    // The current approach doesn't preserve list formatting
    String documentText = existingDocument.getSections().get(0).getBody().toString(SaveFormat.TEXT);
    System.out.println("Plain text conversion (doesn't preserve lists):");
    System.out.println(documentText);
    
    // Extract ordered lists by traversing document nodes
    System.out.println("\nExtracted Lists:");
    extractLists(existingDocument);
}

private static void extractLists(Document doc) throws Exception {
    // Use NodeCollection to get all paragraphs
    NodeCollection paragraphs = doc.getChildNodes(NodeType.PARAGRAPH, true);
    
    for (Paragraph para : (Iterable<Paragraph>) paragraphs) {
        if (para.isListItem()) {
            // Check if it's an ordered list (numbered)
            ListFormat listFormat = para.getListFormat();
            if (listFormat.getListLevel().getNumberStyle() != NumberStyle.BULLET) {
                // This is a numbered list item
                String listNumber = listFormat.getListLabel().getLabelString();
                String listText = para.getText();
                System.out.println(listNumber + " " + listText);
            }
        }
    }
}



and I get this
Extracted Lists:
0. All information disclosed by one party to the other to evaluate the Project and/or Purpose that is designated in writing as ?Confidential? at the time of disclosure or if disclosed orally is designated in writing as ?Confidential? within fifteen (15) days of disclosure is ?Confidential Information.? Confidential Information does not include information which:
0. was known by the receiving party prior to receipt from the disclosing party;
0. is generally available in the public domain or thereafter becomes available to the public through no act of the receiving party;
0. is independently discovered by an employee, agent, or representative of the receiving party who had no knowledge of the Confidential Information disclosed; or
0. is made available to the receiving party as a matter of lawful right by a third party.
0. The receiving party agrees to disclose Confidential Information only to their respective employees, agents, or representatives who have been determined to have a need to know and have been advised of their obligation to comply with the terms of this Agreement. To the extent allowed by the law applicable to the receiving party, the receiving party will be liable for any breach of this agreement by any of its employees, agents, affiliates or representatives that receive access to the Confidential Information.
0. The receiving party shall take such steps as may be reasonably necessary to prevent disclosure of the Confidential Information to third parties, but shall apply at least the same level of security as is afforded to the receiving party?s own confidential information.
0. The receiving party will return or destroy Confidential Information provided by the disclosing party upon termination of the Agreement. The receiving party?s designated representative may maintain one copy of all Confidential Information for the purpose of addressing any claim that may be brought under this Agreement and to comply with any other legal or recordkeeping requirements, and neither party will be obligated to destroy any Confidential Information that is stored electronically on back-up systems or computer hard drives after a file is deleted, but any such electronic information will continue to be subject to the terms of confidentiality under this Agreement.
0. Confidential Information shall not be provided in any form by the receiving party to any third party without the prior permission of the disclosing party, unless otherwise required by law. In the event receiving party is required to disclose any Confidential Information of disclosing party pursuant to any law or governmental or judicial authority, process or order, receiving party shall provide prompt notice thereof to disclosing party in order that disclosing party can assess its right to seek a protective order or injunctive relief, or otherwise contest disclosure. In the event that such a protective order or other remedy is not obtained, or the disclosing party waives their right to obtain such an order or remedy, the receiving party may furnish only such portions of Confidential Information as, pursuant to the advice of counsel, are required to be disclosed.
0. Confidential Information will be used only to evaluate the Project and/or Purpose.
0. The receiving party expressly acknowledges that the disclosing party owns the Confidential Information they disclose, and that the transmission by the disclosing party of their Confidential Information (or any third party?s Confidential Information entrusted to the disclosing party) shall not be construed to grant the receiving party any patent, know-how, copyright, trade secret, trademark, or other intellectual property rights in, or arising from, the Confidential Information disclosed. If any such rights are to be granted to the receiving party, such grant shall be expressly set forth in a separate written instrument.
0. The disclosing party represents and agrees (i) it has the right to share its Confidential Information with the receiving party, (ii) the receiving party is authorized to use Confidential Information it receives from the disclosing party for the Purpose, and (iii) to the extent allowed by the law applicable to the disclosing party, the disclosing party will be liable for any breach by the disclosing party of the representations in subparts (i) and (ii).
0. This Agreement shall be governed by and construed in accordance with the laws of the State of Wisconsin.
0. The parties agree to comply with all applicable laws and regulations including U.S. export control. The disclosing party agrees to notify the receiving party in writing prior to providing receiving party with access to any export regulated information and materials. Such notification shall include all associated classification numbers. The receiving party reserves the right to refuse receipt of any information or materials that are subject to export controls.
0. The term of this Agreement shall begin on the date of the last signature on this Agreement and expire after one (1) year, unless terminated earlier by a party with ten (10) days written notice. The obligations and restrictions of the receiving party under this Agreement shall continue for a period of three (3) years from the date of termination of this Agreement.
0. This Agreement shall supersede and prevail over any other prior arrangements, either oral or written, as to the Confidential Information received under this Agreement. This Agreement constitutes the entire agreement between the parties relative to this subject matter and shall not be amended, except in a writing signed by the parties.

however well that is not accurate

@ianjwilson Sure, lists are supported by Aspose.Words. Please see our documentation to learn how to work with lists:
https://docs.aspose.com/words/java/working-with-lists/

List item labels are not stored in the document and are updated on the fly. To update list labels in Aspose.Words you should call Document.updateListLabels method. After calling this method list item label string can be accessed using Paragraph.ListLabel properly:

Document doc = new Document("C:\\Temp\\in.docx");
doc.updateListLabels();
for(Paragraph p : (Iterable<Paragraph>)doc.getChildNodes(NodeType.PARAGRAPH, true))
{
    if(p.isListItem())
        System.out.println(p.getListLabel().getLabelString());
}

If you simply need to get text representation of the document you can use the following code:

Document doc = new Document("C:\\Temp\\in.docx");
doc.updateListLabels();
String documentText = doc.toString(SaveFormat.TEXT);
System.out.println(documentText);

Sure I will try that out, thanks.

Yes that works, I guess I went too deep into the docs.

1 Like