Word Document Statistics

As you probobly know, Microsoft Word does not provide accurate word statistics. We pay our workers based on productivty, hence we need a reliable tool to provide word statistics.

Currently, we use PractiCount from http://www.practiline.com/

Key statistics we get are:

  1. words
  2. characters with spaces
  3. characters without spaces
  4. lines
  5. pages

Also needed are the ability to count text in headers/footers, textboxes, footnotes, include returns but exclude non-typed characters, e.g. bullets.

This software is not very efficient and doesn’t fit into our workflow model. Is there a away to get these statistics from Aspose Words? As stated above, we cannot rely on Microsoft counts. Thanks

Hi

Thanks for your request. Using Aspose.Words you can get the following statistics information:

  • words
  • characters with spaces
  • characters without spaces
  • pages

Unfortunately, there is no way to calculate number of lines in the document. You can only get BuiltInDocumentProperties.Lines updated by MS Word.

Regarding other statistics information, here is code that allows you to get it.

// Open document
Document doc = new Document(@"Test159\in.doc");

// Update Number of words in the document
doc.UpdateWordCount();

// Get statistics information
Console.WriteLine("Pages: {0}", doc.PageCount);
Console.WriteLine("Words: {0}", doc.BuiltInDocumentProperties.Words);
Console.WriteLine("Characters with spaces : {0}", doc.BuiltInDocumentProperties.CharactersWithSpaces);
Console.WriteLine("Characters without spaces : {0}", doc.BuiltInDocumentProperties.Characters);

Regarding counting text in document elements (Headers, textboxes etc.), I think there is two ways to achieve this.

  1. Import a particular node to an empty document and get statistics information of this document as shown above.

  2. Create your own code to calculate statistics information. For example, I think you can try using DocumentVisitor:
    Class DocumentVisitor | Aspose.Note for .NET API Reference

Hope this helps.

Best regards.

Thanks for your response, but this solution does not work for me since I cannot rely on Microsoft counts which have been proven to be unreliable. I’ll will assume that, in this case, Aspose cannot meet my requirement. Thank You

Hi Aaron,

When you call UpdateWordCount method, Aspose.Words recalculates number of Characters, Words and Paragraphs. Therefore, values of these BuildIndocuemntProperties will be in actual state.

When you call Document.PageCount, Aspose.Words layouts a document into pages using out own rendering engine and returns number of pages

Best regards.

Looks like we’re 80% there, but what about calculating lines? As of now, Aspose doesn’t seem to have a good solution

Aaron, unfortunately, you are right – currently, there is no way to calculate number of lines in the document using Aspose.Words. This is the issue WORDSNET-1978 in our defect database. You will be notified as soon as it is resolved.

Best regard

I have written a quick C# program to get the builtin document properties of an existing document. For now I would be happy simply reading the document properties calculated by Microsoft word.

Unfortunately, Aspose does report characters and pages correctly from the existing builtin document properties, but lines is either 0 or an incorrect number. Is there a way I can simply use Aspose to read the current number of lines as reported by Microsft Word correctly?

Still searching for a solution. Thanks

Hi

Thanks for your request. Aspose.Words just reads a value of Document Built-in Properties as is. Most likely Lines property was not updated and that’s why Aspose.Words returns 0.

You can try open your document in MS Word, scroll down to the end of the document and save it. In this case, MS Word will update lines count and Aspose.Words will return correct value.

Best regards.

Any help would be greatly appreciated. Thanks.

Here are the statistics reported by MS Word 2007 using WordCount function:

Characters: 991 / Characters With Spaces: 1187 / Words; 206 / Pages:1 / Lines 40

Here are the statistics reported by Aspose 5.2.2.0 using code below and attached file:

Characters: 991 / Characters With Spaces: 1187 / Words; 206 / Pages:1 / Lines 8

namespace asposeCounting
{
    class countWordDocument
    {
        public static Boolean generateCounts()
        {
            checkLicense();
            doCounts();
            return true;
        }

        public static bool checkLicense()
        {
            try
            {
                Aspose.Words.License ThisLicense = new Aspose.Words.License();
                ThisLicense.SetLicense("Aspose.Words.lic");
                return true;
            }
            catch
            {
                DialogResult res = MessageBox.Show("Please notify DocuMed. Aspose lisense has expired.", "MTWeb Internet", MessageBoxButtons.OK, MessageBoxIcon.Information);
                return false;
            }
        }

        public static Boolean doCounts()
        {
            // Open "Archive" Directory
            DirectoryInfo EnvoyLiteArchive = new DirectoryInfo(Program.archiveDir);
            foreach (FileInfo nextFile in EnvoyLiteArchive.GetFiles())
            {
                getWordStatistics(nextFile.FullName);
            }

            return true;
        }

        public static string getWordStatistics(string wordDocPath)
        {
            String results = "No data";
            try
            {
                Document doc = new Document(wordDocPath);
                int s5 = Convert.ToInt32(doc.BuiltInDocumentProperties.Lines.ToString());
                doc.UpdateWordCount();
                // get document statistics from Aspose. Will fail if document is opened.
                int s1 = Convert.ToInt32(doc.BuiltInDocumentProperties.Characters.ToString());
                int s2 = Convert.ToInt32(doc.BuiltInDocumentProperties.CharactersWithSpaces.ToString());
                int s3 = Convert.ToInt32(doc.BuiltInDocumentProperties.Words.ToString());
                int s4 = Convert.ToInt32(doc.BuiltInDocumentProperties.Pages.ToString());
                // int s5 = Convert.ToInt32(doc.BuiltInDocumentProperties.Lines.ToString());
                int s6 = Convert.ToInt32(doc.BuiltInDocumentProperties.Lines.ToString());
                results = wordDocPath + "\r\n";
                results += "Characters= " + s1 + "\r\n";
                results += "CharactersWithSpaces=" + s2 + "\r\n";
                results += "Words=" + s3 + "\r\n";
                results += "Pages=" + s4 + "\r\n";
                results += "Lines Before Reset=" + s5 + "\r\n";
                results += "Lines After Reset=" + s6 + "\r\n";
                DialogResult res = MessageBox.Show("Word counts=" + results + " ", "asposeCounting", MessageBoxButtons.OK, MessageBoxIcon.Information);
                return results;
            }
            catch
            {
                return results;
            }
        }
    }
}

Hi Aaron,

As I mentioned earlier, Aspose.Words does not calculate lines count. Aspose.Words just reads value stored in the document. If you just open/save your document in MS Word, document statistic should be updated and Aspose.Words will return correct values.

Best regards.

I discovered that if I open the word document in MS Word 2007, then save it, then open it again in Aspose, that Aspose reports the same line count. The trick will be how to figure out how to do this since we do not want to put MS Word 2007 on the server.

Aaron, unfortunately, I cannot suggest you any workaround of this issue at the moment. I already linked your request to the appropriate issue. You will be notified as soon as Aspose.Words is able to calculate lines count.

Best regards.

The requested feature is available starting from 18.2.0 version of Aspose.Words. To update lines count in the document built in properties, you should use the following code:

Document doc = new Document(fileName);
doc.UpdateWordCount(true);
Console.WriteLine(doc.BuiltInDocumentProperties.Lines);

https://reference.aspose.com/words/net/aspose.words.document/updatewordcount/methods/1