Hi,
I am trying to do a character count using DocumentVisitor as I need header/footer and textboxes information too.
I am encountering an issue where the header/footer isbeing “visited” (counted) more than 1 time. Is this normal? How to stop it?
*** Section Start ***
***** Header Footer Start *****
***** Field Start *****
***** Field Separator *****
***** Field End *****
***** Field Start *****
***** Field Separator *****
***** Field End *****
***** Header Footer End *****
—> Here starts again
***** Header Footer Start *****
***** Field Start *****
***** Field Separator *****
***** Field End *****
***** Header Footer End *****
***** Header Footer Start *****
***** Header Footer End *****
*** Body Started ***
*** Body Ended ***
*** Section End ***
Thanks
Leonardo
Hi Leonardo,
Thanks for your inquiry.
Most likely there is more than one type of header footer belonging to that section .For example you can have a primary header, header first, odd header.
Have you tried using the built-in property for counting the number of characters in the document? Please see the code below.
doc.UpdateWordCount();
int characterCount = doc.BuiltInDocumentProperties.Characters;
If you are still having troubles, could please attach your document here for testing?
Thanks,
Adam,
Yes, I think that is the issue. Find out about that using your Document Explorer sample.
Also, I cant use that property as that count doesnt count headers/footers, textboxes, etc. and i need those being counted too.
Do you have any other solution that I may not be aware of?
Thanks
Leonardo
Hi Leonardo,
Thanks for this additional information.
Ah I missed that part sorry. Yes this is definitly possible. However it’s a bit hard to give advise currently as it’s not fully clear exactly how you would like the output to be. For example,
- Do you want to count the characters in all header footers regardless if they are currently shown or only the ones shown (e.g if there is a first header being used)?
- Do you want to count the characters in the header footers only once for as they appear or for how each time they appear on a page (e.g if you have 10 pages with the same header footer).
If you assist us with these queries then we can help you further.
Thanks
Adam,
I want to count as they are shown in Word.
Scenario 1. A document has a regular header and a first page header. The document is only 1 page. In this case it would only count the characters in the first page header.
Scenario 2. A document has a regular header and a firat page header. The document has 3 pages. It would count the first page header and the regular header 2 times
So basically, I want to count them each time they appear in the page
Does this make sense?
Thanks
Leonardo
Hi Leonardo,
Thanks for your inquiry.
This is a harder task to achieve when you take into account the different header and footer types. I think I got some code working that should produce the desired results.
Please see the code below:
CharacterCounter counter = new CharacterCounter();
doc.Accept(counter);
int characterCount = counter.CharacterCount;
public class CharacterCounter : DocumentVisitor
{
public override VisitorAction VisitSectionStart(Section section)
{
if (mBuilder == null)
mBuilder = new DocumentBuilder((Document)section.Document);
// We have to find out how many pages there are in the section and what pages they are in the overall document.
mBuilder.MoveTo(section.Body.FirstParagraph);
Field sectionPages = mBuilder.InsertField("SECTIONPAGES", null);
mBuilder.Document.UpdatePageLayout();
mBuilder.Document.UpdateFields();
mCurrentSectionPageStart = mCurrentSectionPageEnd + 1;
mCurrentSectionPageEnd = mCurrentSectionPageStart + int.Parse(sectionPages.Result) - 1;
sectionPages.Remove();
return VisitorAction.Continue;
}
public override VisitorAction VisitHeaderFooterStart(HeaderFooter headerFooter)
{
PageSetup parentPageSetup = headerFooter.ParentSection.PageSetup;
// Count the text in the header footer based on the HeaderFooter type and how many pages in the section.
int headerOrFooterLength = GetLengthOfNodeText(headerFooter);
switch (headerFooter.HeaderFooterType)
{
case HeaderFooterType.HeaderFirst:
case HeaderFooterType.FooterFirst:
if (parentPageSetup.DifferentFirstPageHeaderFooter)
mCharacterCount += headerOrFooterLength;
break;
case HeaderFooterType.HeaderPrimary:
case HeaderFooterType.FooterPrimary:
int pageCount = parentPageSetup.OddAndEvenPagesHeaderFooter ? CountOddPages() : CurrentSectionPages;
// If there is a different first page that replaced the odd page then subtract one from the page count.
if (parentPageSetup.DifferentFirstPageHeaderFooter && IsNumberOdd(mCurrentSectionPageStart))
pageCount--;
mCharacterCount += headerOrFooterLength * pageCount;
break;
case HeaderFooterType.HeaderEven:
case HeaderFooterType.FooterEven:
if (parentPageSetup.OddAndEvenPagesHeaderFooter)
{
int evenPageCount = CountEvenPages();
// If there is a different first page that replaced the even page then subtract one from the page count.
if (parentPageSetup.DifferentFirstPageHeaderFooter && !IsNumberOdd(mCurrentSectionPageStart))
evenPageCount--;
mCharacterCount += headerOrFooterLength * evenPageCount;
}
return VisitorAction.Continue;
}
return VisitorAction.SkipThisNode;
}
public override VisitorAction VisitRun(Run run)
{
// Count all text in the document body.
mCharacterCount += GetLengthOfNodeText(run);
return VisitorAction.Continue;
}
public int CharacterCount
{
get
{
return mCharacterCount;
}
}
private int CurrentSectionPages
{
get
{
return mCurrentSectionPageEnd - mCurrentSectionPageStart + 1;
}
}
private int GetLengthOfNodeText(Node node)
{
return node.ToTxt().Trim().Length;
}
private bool IsNumberOdd(int num)
{
return num % 2 != 0;
}
private int CountOddPages()
{
int count = 0;
for (int i = mCurrentSectionPageStart; i <= mCurrentSectionPageEnd; i++)
{
if (IsNumberOdd(i))
count++;
}
return count;
}
private int CountEvenPages()
{
int count = 0;
for (int i = mCurrentSectionPageStart; i <= mCurrentSectionPageEnd; i++)
{
if (!IsNumberOdd(i))
count++;
}
return count;
}
private int mCurrentSectionPageStart = 0;
private int mCurrentSectionPageEnd = 0;
private int mCharacterCount = 0;
private DocumentBuilder mBuilder;
}
There are two limitations:
- doc.UpdatePageLayout is used to calculate how many pages there are in a section. Currently this is called for each section encountered which might make the process slow for large documents. Can speed this up by calculating this all at once.
- Linked headers and footers are not counted.
Thanks,