Using version 17.7 of Aspose.Words for .NET when calling the Document.GetText() method it does not return the text of the docx in the appropriate order as it would read when opened in a docx editor (Word, Google Docs, LibreOffice, etc.)
Given a two page docx file, with Different First Page Header set to true, it reads as follows:
1st page header page 1
--------- page break ---------
2nd page header page 2
The GetText() method returns the string
"2nd page header 1st page header page 1 page 2"
This issue further affects the IReplacingCallback Interface when searching for particular substring match which has multiple occurrences. E.g. in the sample docx above, I want to replace only the 1st occurrence of the word header which would appear on Page 1, instead the replacement is made on Page 2.
The issue also exists for documents in which headers may have Link to Previous set to false.
An excerpt of the OOXML standard for rendering headers/footers can be found here.