I see there might be some issue with recognizing numbered lists in Aspose.Words for Java.
For example I have attached the NDA
NDA 01.docx (24.3 KB)
If you open this document, you may observe that there are numbered lists and some of them even have a hierarchy for a non-numbered ordered list. However, Aspose.Words does not work for both of those lists and instead return ?
for that missing item. Now I tried to add the whole warning interface with it to see what was actually happening and here are those logs
WARNING: 65536 source:9
Description: Import of element 'shapedefaults' is not supported in Docx format by Aspose.Words.
WARNING: 65536 source:9
Description: Import of element 'shapedefaults' is not supported in Docx format by Aspose.Words.
WARNING: 65536 source:9
Description: Import of element 'extraClrSchemeLst' is not supported in Docx format by Aspose.Words.
Makes me wonder is this not actually supported by Aspose.Words or am I missing some configuration. I have tried all sorts of load options with MSW Versions so far and none of them seem to work. Moreover, I also tried saving this document in newer versions of Word with Save As
and that did not work either.
Here is the extracted text
This Non-Disclosure Agreement (Agreement) is entered into on the date of the last signature set forth below (the ?Effective Date?), between [Company Name], [company address] and WHEREAS, the parties wish to enter into discussions for the purpose [insert description of purpose, such as: of initiating a collaboration and developing research projects of mutWHEREAS, each party may have proprietary interests such as patentable subject matter not yet covered by a patent application, other intellectual property, or other interests whicWHEREAS, in connection with the Project, each party may disclose to the other certain proprietary technical, procedural, or business information which the disclosing party desireAll information disclosed by one party to the other to evaluate the Project and/or Purpose that is designated in writing as ?Confidential? at the time of disclosure or if disclosed orally is designated in writing as ?Confidential? within fifteen (15) days of disclosure is ?Confidential Information.? Confidential Information does not include information The receiving party agrees to disclose Confidential Information only to their respective employees, agents, or representatives who have been determined to have a need to know and have been advised of their obligation to comply with the terms of this Agreement. To the extent allowed by the law applicable to the receiving party, the receiving party will bThe receiving party shall take such steps as may be reasonably necessary to prevent disclosure of the Confidential Information to third parties, but shall apply at least the sameThe receiving party will return or destroy Confidential Information provided by the disclosing party upon termination of the Agreement. The receiving party?s designated representative may maintain one copy of all Confidential Information for the purpose of addressing any claim that may be brought under this Agreement and to comply with any other legal or recordkeeping requirements, and neither party will be obligated to destroy any Confidential Information that is stored electronically on back-up systems or computer hard drivesConfidential Information shall not be provided in any form by the receiving party to any third party without the prior permission of the disclosing party, unless otherwise required by law. In the event receiving party is required to disclose any Confidential Information of disclosing party pursuant to any law or governmental or judicial authority, process or order, receiving party shall provide prompt notice thereof to disclosing party in order that disclosing party can assess its right to seek a protective order or injunctive relief, or otherwise contest disclosure. In the event that such a protective order or other remedy is not obtained, or the disclosing party waives their right to obtain such an orThe receiving party expressly acknowledges that the disclosing party owns the Confidential Information they disclose, and that the transmission by the disclosing party of their Confidential Information (or any third party?s Confidential Information entrusted to the disclosing party) shall not be construed to grant the receiving party any patent, know-how, copyright, trade secret, trademark, or other intellectual property rights in, or arising from, the Confidential Information disclosed. If any such rights are to be granted to tThe disclosing party represents and agrees (i) it has the right to share its Confidential Information with the receiving party, (ii) the receiving party is authorized to use Confidential Information it receives from the disclosing party for the Purpose, and (iii) to the extent allowed by the law applicable to the disclosing party, the disclosing party wiThe parties agree to comply with all applicable laws and regulations including U.S. export control. The disclosing party agrees to notify the receiving party in writing prior to providing receiving party with access to any export regulated information and materials. Such notification shall include all associated classification numbers. The receiving paThe term of this Agreement shall begin on the date of the last signature on this Agreement and expire after one (1) year, unless terminated earlier by a party with ten (10) days written notice. The obligations and restrictions of the receiving party under this Agreement shall continue for a period of three (3) years from the date of termination of this This Agreement shall supersede and prevail over any other prior arrangements, either oral or written, as to the Confidential Information received under this Agreement. This AgreDate: ] of Regents of the University of Wisconsin System on Behalf of the University of Wisconsin - Madison date listed below. writing signed by the parties.
Issues:
- Punctuations are missing
- Numbered lists are missing and their hierarchies as well (but only if they are explicitly those lists, since if they are within the text itself then they show up just fine, for example look at points (i) …)
- Text breaks suddenly on multiple lines.
Here’s the code for reference. Meanwhile I will take a look at the API docs.
LoadOptions loadOptions = new LoadOptions();
// loadOptions.setMswVersion(MsWordVersion.WORD_2019);
loadOptions.setWarningCallback(new DocumentLoadingWarningCallback());
Document existingDocument = new Document(myDir + "NDA 01.docx", loadOptions);
String documentText = existingDocument.getText();
System.out.println(documentText);