Page numbers in headers or footers are not identified correctly

Hello,

I’m looking into an issue where page numbers are not being detected as expected in the DocumentVisitor. Attached you will find a sample program and a sample document that demonstrates the issues.

When I run the application, here is the output that I get:

*Visiting a Section
HEADER_EVEN FieldCode Results:
FieldCode Result: 4

HEADER_PRIMARY FieldCode Results:
FieldCode Result: 5

FOOTER_EVEN FieldCode Results:
FieldCode Result: 4

FOOTER_PRIMARY FieldCode Results:
FieldCode Result: 5

HEADER_FIRST FieldCode Results:
FieldCode Result: 1

FOOTER_FIRST FieldCode Results:
FieldCode Result: 1*

What I expected was the following:

*Visiting a Section
HEADER_EVEN FieldCode Results:
FieldCode Result: 2

HEADER_PRIMARY FieldCode Results:
FieldCode Result: 3

FOOTER_EVEN FieldCode Results:
FieldCode Result: 2

FOOTER_PRIMARY FieldCode Results:
FieldCode Result: 3

HEADER_FIRST FieldCode Results:
FieldCode Result: 1

FOOTER_FIRST FieldCode Results:
FieldCode Result: 1*

The fieldcode value/result corresponds with the page number entry in each header/footer entry. In the example document, there is only one section. Also, the section has odd, even, and first page headers and footers. It appears that it only correctly determines the page number for the page number on the first page’s header and footer. For the EVEN headers and footers, it seems that it gives me the largest even page number in the section. Likewise, for ODD headers and footers, it seems that it only gives me the largest ODD numbered page in the section.

My question is what is the correct behavior? I was assuming/expecting it to be as follows. For the ODD header/footer, I was expecting to get a value equal to the smallest ODD numbered page (excluding the first page, since that’s covered by HEADER_FIRST/FOOTER_FIRST). For the EVEN header/footer, I was expecting to get a value equal to the smallest EVEN numbered page.

Any advice, suggestions, questions are much appreciated! Please let me know if anything here is unclear and whether or not this is a defect or expected behavior.

I tested using Aspose 15.2.0.
The zip file is an Eclipse project that has an input folder with the test document that was used to describe the problem above.

Hi Chase,

Thanks for your inquiry. Aspose.Words returns the correct values for page field. If the input document have 11 pages, the output will be as follow:

Visiting a Section
HEADER_EVEN FieldCode Results:
FieldCode Result: 10

HEADER_PRIMARY FieldCode Results:
FieldCode Result: 11

FOOTER_EVEN FieldCode Results:
FieldCode Result: 10

FOOTER_PRIMARY FieldCode Results:
FieldCode Result: 11

HEADER_FIRST FieldCode Results:
FieldCode Result: 1

FOOTER_FIRST FieldCode Results:
FieldCode Result: 1

*apatter:

What I expected was the following:
Visiting a Section
HEADER_EVEN FieldCode Results:
FieldCode Result: 2

HEADER_PRIMARY FieldCode Results:
FieldCode Result: 3

FOOTER_EVEN FieldCode Results:
FieldCode Result: 2

FOOTER_PRIMARY FieldCode Results:
FieldCode Result: 3

HEADER_FIRST FieldCode Results:
FieldCode Result: 1

FOOTER_FIRST FieldCode Results:
FieldCode Result: 1*

Could you please share the detail about expected values? These values seems to be fixed values.

If you want the output as you shared, you can hard code these values. For HeaderFirst and FooterFirst, the page field value is 1. For FooterEven and HeaderEven, the page field value is 2. For HeaderPrimary and FooterPrimary, the page field value is 3.

Hi Tahir,

Thank you!

I was expecting that since I’m working on an application that uses the visitor to print out the contents in a new HTML format. It will visit the HEADER/FOOTERs at the beginning of the section, before the actual content, therefore I was expecting/hoping that the page numbers in the even and primary HEADERs/FOOTERSs would correspond to that of the first page (that is even or odd, excluding page 1). I appears to use the values of the last pages of the owning section.

Since there is only a single HEADER/FOOTER (of each type) that appears in each section, I think whether it uses the first page’s number or last page’s number could go either way… (i.e. I don’t think there is a right or wrong answer here). I suppose it could be useful to have an API to control whether it gives me the numbers of the first pages (like I expected) or the last pages (actual result).

I’m having a little difficulty explaining this (apologizes if this is confusing) so here is an example of what the application produces (roughly speaking):

This is the primary header. Page 51. (field code result)
This is the converted text of the paragraph nodes in the BODY node of the DOCUMENT node…
More content…

As you can see, it would be nice to get Page #1 so that it is roughly congruent with what a reader of the resulting document would expect, though perhaps there is an argument that it should be 51 as well.

I understand your suggestion, but what if there are multiple sections in the document? I don’t think it would work in that case. If there is a method to determine the first page number of the section, then I should be able to compute everything I need. Perhaps I could insert a page number field code as the first node in the first paragraph of each section to determine this value? I’ll give it a try

Thanks,
Chase

Hi Chase,

Thanks for your inquiry.
Please note that MS Word document is flow document and does not contain
any information about its layout into lines and pages. Therefore,
technically there is no “Page” concept in Word document. Pages are
created by Microsoft Word on the fly.

Aspose.Words uses our own Rendering Engine to layout documents into pages. The Aspose.Words.Layout namespace provides
classes that allow to access information such as on what page and where
on a page particular document elements are positioned, when the document
is formatted into pages. Please read about LayoutCollector and
LayoutEnumerator from here:
https://reference.aspose.com/words/net/aspose.words.layout/layoutcollector/
https://reference.aspose.com/words/net/aspose.words.layout/layoutenumerator/

Yes, you can use DocumentBuilder.InsertField method to insert the page field at specific location and get the page number.

In your case, I suggest you please use the LayoutEnumerator.PageIndex property to get the 1-based index of a page which contains the current entity. Please check the following code example for your kind reference. Hope this helps you.

Document doc = new Document(MyDir + "in.docx");
LayoutCollector layoutCollector = new LayoutCollector(doc);
LayoutEnumerator layoutEnumerator = new LayoutEnumerator(doc);
foreach (Section section in doc.Sections)
{
    var renderObject = layoutCollector.GetEntity(section.Body.FirstParagraph);
    layoutEnumerator.Current = renderObject;
    int page = layoutEnumerator.PageIndex;
    Console.WriteLine(page);
}

Hey Tahir,

Thank you for the suggestion! I was able to get what I needed from it. The issue is resolved now.

Thanks,
Chase

Hi Chase,

Thanks for your feedback. Please feel free to ask if you have any question about Aspose.Words, we will be happy to help you.