Hi,
we are trying to evaluate aspose words for getting word count which is critical functionality in our product,
although when i use maven dependencies provided for trial it gives inaccurate results.
public void getCountfromAsposeDependency(@RequestParam("file") MultipartFile file) {
Document document = new Document(file.getInputStream());
//Document document = new Document("thinkning.doc");
document.updateWordCount(true);
BuiltInDocumentProperties properties = document.getBuiltInDocumentProperties();
System.out.println(" no of words in doc " +properties.getWords());
System.out.println(properties.getLines());
System.out.println(properties.getPages());
}
note: we have tried aspose words as well it gives same result, word count difference seems to be hhuge
In your code you use BuiltInDocumentProperties .getPages() this value not always is updated and can contain incorrect value. You should use Document.getPageCount() property to get pages count calculated by Aspose.Words layout engine.
If you still encounter the problems, please attach your test document here for our reference, we will check it and provide you more information.
@Nikhil1988 Please let us know if Aspose.Words licensed mode also returns incorrect words count and attach your test document. We will check the issue and provide you more information.
Hi @alexey.noskov , we have loaded the licence and it seems to give correct results , however we noticed api doesn’t consider footnotes and endnotes, is there any way we can include these in word count as well ? as this also part of our Word count requirement
@Nikhil1988 You can enable "Include textboxes, footer and endnotes" option in MS Word. The mentioned checkbox value is stored in settings.xml as <w:doNotIncludeSubdocsInStats/> tag (if unchecked). Aspose.Words takes this flag in account when Document.UpdateWordCount method is called. But unfortunately, there is no public API to get or set this flag. The feature request is logged as WORDSNET-24556. We will let you know once it is resolved.
As a temporary workaround, you can copy footnotes content into the mail document’s body to make Document.UpdateWordCount to count the words.
Thanks for providing the work around, as of now we checked the accuracy of WC by saving settings for end note and foot note in word docs which seems to be matching our requirement although we would need the code handle to enable this setting before we return the calculated word count.
Also currently same work around doesn’t seem to be working for .rtf files because even if we save the setting , setting doesn’t really gets saved and when we reopen the rtf file we see that check box is still unchecked, due to which aspose library is returning word count without end note and foot note for rtf docs.
do you have any work around for rtf files so that we can validate the accuracy for rtf files as well ?
@Nikhil1988 It looks like RTF format does not have such ability to store this option in the document. As a temporary workaround, if it is required to include footnotes, you can copy footnotes content into the main document’s body to make Document.UpdateWordCount to count the words. If it is required to exclude footnotes, you can simply remove them from the document before calling Document.UpdateWordCount.