Aspose words trial version for java not giving accurate results for word count

Hi,
we are trying to evaluate aspose words for getting word count which is critical functionality in our product,
although when i use maven dependencies provided for trial it gives inaccurate results.

here is the dependency and code that i have used

<dependency>
	<groupId>com.aspose</groupId>
	<artifactId>aspose-total</artifactId>
	<version>22.12</version>
	<type>pom</type>
</dependency>
public void getCountfromAsposeDependency(@RequestParam("file") MultipartFile file) {
    Document document = new Document(file.getInputStream());
    //Document document = new Document("thinkning.doc");
    document.updateWordCount(true);
        BuiltInDocumentProperties properties = document.getBuiltInDocumentProperties();

    System.out.println(" no of words in doc " +properties.getWords());
    System.out.println(properties.getLines());
    System.out.println(properties.getPages());
}

note: we have tried aspose words as well it gives same result, word count difference seems to be hhuge

@Nikhil1988 In evaluation mode Aspose.Words has few limitations:

  • An evaluation watermark and text are injected into the document, which can affect words count.
  • The maximum document size is limited to several hundreds of paragraphs, which can also affect the result.

If you would like to test Aspose.Words without evaluation version limitations, you can request a temporary 30-days license and apply it as it is described here:
https://docs.aspose.com/words/java/licensing/

In your code you use BuiltInDocumentProperties .getPages() this value not always is updated and can contain incorrect value. You should use Document.getPageCount() property to get pages count calculated by Aspose.Words layout engine.

If you still encounter the problems, please attach your test document here for our reference, we will check it and provide you more information.

Hi, thanks for info, as such we are most interested in word count which is returned by properties.getWords()

@Nikhil1988 Please let us know if Aspose.Words licensed mode also returns incorrect words count and attach your test document. We will check the issue and provide you more information.

Hi @alexey.noskov , we have loaded the licence and it seems to give correct results , however we noticed api doesn’t consider footnotes and endnotes, is there any way we can include these in word count as well ? as this also part of our Word count requirement

@alexey.noskov is there any work around/solution to include endnotes and footnotes in word count ?

@Nikhil1988 You can enable "Include textboxes, footer and endnotes" option in MS Word. The mentioned checkbox value is stored in settings.xml as <w:doNotIncludeSubdocsInStats/> tag (if unchecked). Aspose.Words takes this flag in account when Document.UpdateWordCount method is called. But unfortunately, there is no public API to get or set this flag. The feature request is logged as WORDSNET-24556. We will let you know once it is resolved.

As a temporary workaround, you can copy footnotes content into the mail document’s body to make Document.UpdateWordCount to count the words.

Hi @alexey.noskov

Thanks for providing the work around, as of now we checked the accuracy of WC by saving settings for end note and foot note in word docs which seems to be matching our requirement although we would need the code handle to enable this setting before we return the calculated word count.

Also currently same work around doesn’t seem to be working for .rtf files because even if we save the setting , setting doesn’t really gets saved and when we reopen the rtf file we see that check box is still unchecked, due to which aspose library is returning word count without end note and foot note for rtf docs.

do you have any work around for rtf files so that we can validate the accuracy for rtf files as well ?

@Nikhil1988 It looks like RTF format does not have such ability to store this option in the document. As a temporary workaround, if it is required to include footnotes, you can copy footnotes content into the main document’s body to make Document.UpdateWordCount to count the words. If it is required to exclude footnotes, you can simply remove them from the document before calling Document.UpdateWordCount.

The issues you have found earlier (filed as WORDSNET-24556) have been fixed in this Aspose.Words for .NET 23.2 update also available on NuGet.