We're sorry Aspose doesn't work properply without JavaScript enabled.

Free Support Forum - aspose.com

Character Count

Hi,

I am trying to do a character count using DocumentVisitor as I need header/footer and textboxes information too.

I am encountering an issue where the header/footer isbeing "visited" (counted) more than 1 time. Is this normal? How to stop it?

*** Section Start ***<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" />

*** Header Footer Start ***

*** Field Start ***

*** Field Separator ***

*** Field End ***

*** Field Start ***

*** Field Separator ***

*** Field End ***

*** Header Footer End ***

---> Here starts again

*** Header Footer Start ***

*** Field Start ***

*** Field Separator ***

*** Field End ***

*** Header Footer End ***

*** Header Footer Start ***

*** Header Footer End ***

*** Body Started ***

*** Body Ended ***

*** Section End ***

Thanks

Leonardo

Hi Leonardo,


Thanks for your inquiry.

Most likely there is more than one type of header footer belonging to that section .For example you can have a primary header, header first, odd header.

Have you tried using the built-in property for counting the number of characters in the document? Please see the code below.

<span style=“font-size:
10.0pt;font-family:“Courier New”;mso-no-proof:yes”>doc.UpdateWordCount();<o:p></o:p>

int characterCount = doc.BuiltInDocumentProperties.Characters;


If you are still having troubles, could please attach your document here for testing?

Thanks,

Adam,

Yes, I think that is the issue. Find out about that using your Document Explorer sample.

Also, I cant use that property as that count doesnt count headers/footers, textboxes, etc. and i need those being counted too.

Do you have any other solution that I may not be aware of?

Thanks

Leonardo

Hi Leonardo,


Thanks for this additional information.

Ah I missed that part sorry. Yes this is definitly possible. However it’s a bit hard to give advise currently as it’s not fully clear exactly how you would like the output to be. For example,

  • Do you want to count the characters in all header footers regardless if they are currently shown or only the ones shown (e.g if there is a first header being used)?
  • Do you want to count the characters in the header footers only once for as they appear or for how each time they appear on a page (e.g if you have 10 pages with the same header footer).
If you assist us with these queries then we can help you further.

Thanks

Adam,

I want to count as they are shown in Word.

Scenario 1. A document has a regular header and a first page header. The document is only 1 page. In this case it would only count the characters in the first page header.

Scenario 2. A document has a regular header and a firat page header. The document has 3 pages. It would count the first page header and the regular header 2 times

So basically, I want to count them each time they appear in the page

Does this make sense?

Thanks

Leonardo

Hi Leonardo,


Thanks for your inquiry.

This is a harder task to achieve when you take into account the different header and footer types. I think I got some code working that should produce the desired results.

Please see the code below:

<span style=“font-size:
10.0pt;font-family:“Courier New”;color:#2B91AF;mso-no-proof:yes”>CharacterCounter<span style=“font-size:10.0pt;font-family:“Courier New”;mso-no-proof:yes”> counter = new CharacterCounter();<o:p></o:p>

doc.Accept(counter);

int characterCount = counter.CharacterCount;

public class CharacterCounter : DocumentVisitor

{

public override VisitorAction VisitSectionStart(Section section)

{

if (mBuilder == null)

mBuilder = new DocumentBuilder((Document)section.Document);

// We have to find out how many pages there are in the section and what pages they are in the overall document.

mBuilder.MoveTo(section.Body.FirstParagraph);

Field sectionPages = mBuilder.InsertField("SECTIONPAGES", null);

mBuilder.Document.UpdatePageLayout();

mBuilder.Document.UpdateFields();

mCurrentSectionPageStart = mCurrentSectionPageEnd + 1;

mCurrentSectionPageEnd = mCurrentSectionPageStart + int.Parse(sectionPages.Result) - 1;

sectionPages.Remove();

return VisitorAction.Continue;

}

public override VisitorAction VisitHeaderFooterStart(HeaderFooter headerFooter)

{

PageSetup parentPageSetup = headerFooter.ParentSection.PageSetup;

// Count the text in the header footer based on the HeaderFooter type and how many pages in the section.

int headerOrFooterLength = GetLengthOfNodeText(headerFooter);

switch (headerFooter.HeaderFooterType)

{

case HeaderFooterType.HeaderFirst:

case HeaderFooterType.FooterFirst:

if (parentPageSetup.DifferentFirstPageHeaderFooter)

mCharacterCount += headerOrFooterLength;

break;

case HeaderFooterType.HeaderPrimary:

case HeaderFooterType.FooterPrimary:

int pageCount = parentPageSetup.OddAndEvenPagesHeaderFooter ? CountOddPages() : CurrentSectionPages;

// If there is a different first page that replaced the odd page then subtract one from the page count.

if(parentPageSetup.DifferentFirstPageHeaderFooter && IsNumberOdd(mCurrentSectionPageStart))

pageCount--;

mCharacterCount += headerOrFooterLength * pageCount;

break;

case HeaderFooterType.HeaderEven:

case HeaderFooterType.FooterEven:

if (parentPageSetup.OddAndEvenPagesHeaderFooter)

{

int evenPageCount = CountEvenPages();

// If there is a different first page that replaced the even page then subtract one from the page count.

if (parentPageSetup.DifferentFirstPageHeaderFooter && !IsNumberOdd(mCurrentSectionPageStart))

evenPageCount--;

mCharacterCount += headerOrFooterLength * evenPageCount;

}

return VisitorAction.Continue;

}

return VisitorAction.SkipThisNode;

}

public override VisitorAction VisitRun(Run run)

{

// Count all text in the document body.

mCharacterCount += GetLengthOfNodeText(run);

return VisitorAction.Continue;

}

public int CharacterCount

{

get { return mCharacterCount; }

}

private int CurrentSectionPages

{

get { return mCurrentSectionPageEnd - mCurrentSectionPageStart + 1; }

}

private int GetLengthOfNodeText(Node node)

{

return node.ToTxt().Trim().Length;

}

private bool IsNumberOdd(int num)

{

return num % 2 != 0;

}

private int CountOddPages()

{

int count = 0;

for (int i = mCurrentSectionPageStart; i <= mCurrentSectionPageEnd; i++)

{

if (IsNumberOdd(i))

count++;

}

return count;

}

private int CountEvenPages()

{

int count = 0;

for (int i = mCurrentSectionPageStart; i <= mCurrentSectionPageEnd; i++)

{

if (!IsNumberOdd(i))

count++;

}

return count;

}

private int mCurrentSectionPageStart = 0;

private int mCurrentSectionPageEnd = 0;

private int mCharacterCount = 0;

private DocumentBuilder mBuilder;

}


There are two limitations:
  • doc.UpdatePageLayout is used to calculate how many pages there are in a section. Currently this is called for each section encountered which might make the process slow for large documents. Can speed this up by calculating this all at once.
  • Linked headers and footers are not counted.

Thanks,