Character Count

Hi,
I am trying to do a character count using DocumentVisitor as I need header/footer and textboxes information too.
I am encountering an issue where the header/footer isbeing “visited” (counted) more than 1 time. Is this normal? How to stop it?
*** Section Start ***
***** Header Footer Start *****
***** Field Start *****
***** Field Separator *****
***** Field End *****
***** Field Start *****
***** Field Separator *****
***** Field End *****
***** Header Footer End *****
—> Here starts again
***** Header Footer Start *****
***** Field Start *****
***** Field Separator *****
***** Field End *****
***** Header Footer End *****
***** Header Footer Start *****
***** Header Footer End *****
*** Body Started ***
*** Body Ended ***
*** Section End ***
Thanks
Leonardo

Hi Leonardo,

Thanks for your inquiry.

Most likely there is more than one type of header footer belonging to that section .For example you can have a primary header, header first, odd header.

Have you tried using the built-in property for counting the number of characters in the document? Please see the code below.

doc.UpdateWordCount();

int characterCount = doc.BuiltInDocumentProperties.Characters;

If you are still having troubles, could please attach your document here for testing?

Thanks,

Adam,
Yes, I think that is the issue. Find out about that using your Document Explorer sample.
Also, I cant use that property as that count doesnt count headers/footers, textboxes, etc. and i need those being counted too.
Do you have any other solution that I may not be aware of?
Thanks
Leonardo

Hi Leonardo,

Thanks for this additional information.

Ah I missed that part sorry. Yes this is definitly possible. However it’s a bit hard to give advise currently as it’s not fully clear exactly how you would like the output to be. For example,

  • Do you want to count the characters in all header footers regardless if they are currently shown or only the ones shown (e.g if there is a first header being used)?
  • Do you want to count the characters in the header footers only once for as they appear or for how each time they appear on a page (e.g if you have 10 pages with the same header footer).

If you assist us with these queries then we can help you further.

Thanks

Adam,
I want to count as they are shown in Word.
Scenario 1. A document has a regular header and a first page header. The document is only 1 page. In this case it would only count the characters in the first page header.
Scenario 2. A document has a regular header and a firat page header. The document has 3 pages. It would count the first page header and the regular header 2 times
So basically, I want to count them each time they appear in the page
Does this make sense?
Thanks
Leonardo

Hi Leonardo,

Thanks for your inquiry.

This is a harder task to achieve when you take into account the different header and footer types. I think I got some code working that should produce the desired results.

Please see the code below:

CharacterCounter counter = new CharacterCounter();
doc.Accept(counter);
int characterCount = counter.CharacterCount;
public class CharacterCounter : DocumentVisitor
{
    public override VisitorAction VisitSectionStart(Section section)
    {
        if (mBuilder == null)
            mBuilder = new DocumentBuilder((Document)section.Document);
        // We have to find out how many pages there are in the section and what pages they are in the overall document.
        mBuilder.MoveTo(section.Body.FirstParagraph);
        Field sectionPages = mBuilder.InsertField("SECTIONPAGES", null);
        mBuilder.Document.UpdatePageLayout();
        mBuilder.Document.UpdateFields();
        mCurrentSectionPageStart = mCurrentSectionPageEnd + 1;
        mCurrentSectionPageEnd = mCurrentSectionPageStart + int.Parse(sectionPages.Result) - 1;
        sectionPages.Remove();
        return VisitorAction.Continue;
    }
    public override VisitorAction VisitHeaderFooterStart(HeaderFooter headerFooter)
    {
        PageSetup parentPageSetup = headerFooter.ParentSection.PageSetup;
        // Count the text in the header footer based on the HeaderFooter type and how many pages in the section.
        int headerOrFooterLength = GetLengthOfNodeText(headerFooter);
        switch (headerFooter.HeaderFooterType)
        {
            case HeaderFooterType.HeaderFirst:
            case HeaderFooterType.FooterFirst:
                if (parentPageSetup.DifferentFirstPageHeaderFooter)
                    mCharacterCount += headerOrFooterLength;
                break;
            case HeaderFooterType.HeaderPrimary:
            case HeaderFooterType.FooterPrimary:
                int pageCount = parentPageSetup.OddAndEvenPagesHeaderFooter ? CountOddPages() : CurrentSectionPages;
                // If there is a different first page that replaced the odd page then subtract one from the page count.
                if (parentPageSetup.DifferentFirstPageHeaderFooter && IsNumberOdd(mCurrentSectionPageStart))
                    pageCount--;
                mCharacterCount += headerOrFooterLength * pageCount;
                break;
            case HeaderFooterType.HeaderEven:
            case HeaderFooterType.FooterEven:
                if (parentPageSetup.OddAndEvenPagesHeaderFooter)
                {
                    int evenPageCount = CountEvenPages();
                    // If there is a different first page that replaced the even page then subtract one from the page count.
                    if (parentPageSetup.DifferentFirstPageHeaderFooter && !IsNumberOdd(mCurrentSectionPageStart))
                        evenPageCount--;
                    mCharacterCount += headerOrFooterLength * evenPageCount;
                }
                return VisitorAction.Continue;
        }
        return VisitorAction.SkipThisNode;
    }
    public override VisitorAction VisitRun(Run run)
    {
        // Count all text in the document body.
        mCharacterCount += GetLengthOfNodeText(run);
        return VisitorAction.Continue;
    }
    public int CharacterCount
    {
        get
        {
            return mCharacterCount;
        }
    }
    private int CurrentSectionPages
    {
        get
        {
            return mCurrentSectionPageEnd - mCurrentSectionPageStart + 1;
        }
    }
    private int GetLengthOfNodeText(Node node)
    {
        return node.ToTxt().Trim().Length;
    }
    private bool IsNumberOdd(int num)
    {
        return num % 2 != 0;
    }
    private int CountOddPages()
    {
        int count = 0;
        for (int i = mCurrentSectionPageStart; i <= mCurrentSectionPageEnd; i++)
        {
            if (IsNumberOdd(i))
                count++;
        }
        return count;
    }
    private int CountEvenPages()
    {
        int count = 0;
        for (int i = mCurrentSectionPageStart; i <= mCurrentSectionPageEnd; i++)
        {
            if (!IsNumberOdd(i))
                count++;
        }
        return count;
    }
    private int mCurrentSectionPageStart = 0;
    private int mCurrentSectionPageEnd = 0;
    private int mCharacterCount = 0;
    private DocumentBuilder mBuilder;
}

There are two limitations:

  • doc.UpdatePageLayout is used to calculate how many pages there are in a section. Currently this is called for each section encountered which might make the process slow for large documents. Can speed this up by calculating this all at once.
  • Linked headers and footers are not counted.

Thanks,