As you probobly know, Microsoft Word does not provide accurate word statistics. We pay our workers based on productivty, hence we need a reliable tool to provide word statistics.
Currently, we use PractiCount from http://www.practiline.com/
Key statistics we get are:
words
characters with spaces
characters without spaces
lines
pages
Also needed are the ability to count text in headers/footers, textboxes, footnotes, include returns but exclude non-typed characters, e.g. bullets.
This software is not very efficient and doesn’t fit into our workflow model. Is there a away to get these statistics from Aspose Words? As stated above, we cannot rely on Microsoft counts. Thanks
Thanks for your request. Using Aspose.Words you can get the following statistics information:
words
characters with spaces
characters without spaces
pages
Unfortunately, there is no way to calculate number of lines in the document. You can only get BuiltInDocumentProperties.Lines updated by MS Word.
Regarding other statistics information, here is code that allows you to get it.
// Open document
Document doc = new Document(@"Test159\in.doc");
// Update Number of words in the document
doc.UpdateWordCount();
// Get statistics information
Console.WriteLine("Pages: {0}", doc.PageCount);
Console.WriteLine("Words: {0}", doc.BuiltInDocumentProperties.Words);
Console.WriteLine("Characters with spaces : {0}", doc.BuiltInDocumentProperties.CharactersWithSpaces);
Console.WriteLine("Characters without spaces : {0}", doc.BuiltInDocumentProperties.Characters);
Regarding counting text in document elements (Headers, textboxes etc.), I think there is two ways to achieve this.
Import a particular node to an empty document and get statistics information of this document as shown above.
Thanks for your response, but this solution does not work for me since I cannot rely on Microsoft counts which have been proven to be unreliable. I’ll will assume that, in this case, Aspose cannot meet my requirement. Thank You
When you call UpdateWordCount method, Aspose.Words recalculates number of Characters, Words and Paragraphs. Therefore, values of these BuildIndocuemntProperties will be in actual state.
When you call Document.PageCount, Aspose.Words layouts a document into pages using out own rendering engine and returns number of pages
Aaron, unfortunately, you are right – currently, there is no way to calculate number of lines in the document using Aspose.Words. This is the issue WORDSNET-1978 in our defect database. You will be notified as soon as it is resolved.
I have written a quick C# program to get the builtin document properties of an existing document. For now I would be happy simply reading the document properties calculated by Microsoft word.
Unfortunately, Aspose does report characters and pages correctly from the existing builtin document properties, but lines is either 0 or an incorrect number. Is there a way I can simply use Aspose to read the current number of lines as reported by Microsft Word correctly?
Thanks for your request. Aspose.Words just reads a value of Document Built-in Properties as is. Most likely Lines property was not updated and that’s why Aspose.Words returns 0.
You can try open your document in MS Word, scroll down to the end of the document and save it. In this case, MS Word will update lines count and Aspose.Words will return correct value.
As I mentioned earlier, Aspose.Words does not calculate lines count. Aspose.Words just reads value stored in the document. If you just open/save your document in MS Word, document statistic should be updated and Aspose.Words will return correct values.
I discovered that if I open the word document in MS Word 2007, then save it, then open it again in Aspose, that Aspose reports the same line count. The trick will be how to figure out how to do this since we do not want to put MS Word 2007 on the server.
Aaron, unfortunately, I cannot suggest you any workaround of this issue at the moment. I already linked your request to the appropriate issue. You will be notified as soon as Aspose.Words is able to calculate lines count.
The requested feature is available starting from 18.2.0 version of Aspose.Words. To update lines count in the document built in properties, you should use the following code:
Document doc = new Document(fileName);
doc.UpdateWordCount(true);
Console.WriteLine(doc.BuiltInDocumentProperties.Lines);