.NET Words - Document.GetText string returns headers/body/footers out of order

Using version 17.7 of Aspose.Words for .NET when calling the Document.GetText() method it does not return the text of the docx in the appropriate order as it would read when opened in a docx editor (Word, Google Docs, LibreOffice, etc.)

Given a two page docx file, with Different First Page Header set to true, it reads as follows:

1st page header

page 1

--------- page break ---------

2nd page header

page 2

The GetText() method returns the string "2nd page header 1st page header page 1 page 2"

This issue further affects the IReplacingCallback Interface when searching for particular substring match which has multiple occurrences. E.g. in the sample docx above, I want to replace only the 1st occurrence of the word header which would appear on Page 1, instead the replacement is made on Page 2.

The issue also exists for documents in which headers may have Link to Previous set to false.

An excerpt of the OOXML standard for rendering headers/footers can be found here.

@gmcintyre,

Thanks for your inquiry. Please ZIP and attach sample Word document you are getting this problem with here for testing. We will investigate the issue on our end and provide you more information.

Best regards,
Awais Hafeez

headers.zip (11.5 KB)

Awais,

The .docx is attached in the uploaded .zip file.

I’ve also tried the ToString() method and ToTxt() method which gives the same result.

Thank you

@gmcintyre,

Thanks for your inquiry. We have logged your requirement in our issue tracking system. The ID of this issue is WORDSNET-15624. Our product team will further look into the details of this problem and we will keep you updated on the status of this issue. We apologize for your inconvenience.

Best regards,
Awais Hafeez

I see this issue was closed with WORDSNET-15624. Is this fix now available in a newer version of Aspose?

@gmcintyre,

Yes, the issue is now fixed. We will include the fix in next 17.11 version of Aspose.Words. We will release next version by the end of this week. You will be notified via this thread as soon as next version containing the fix of your issue will be available.

@gmcintyre,

The issues you have found earlier (filed as WORDSNET-15616) have been fixed in this Aspose.Words for .NET 17.11 update and this Aspose.Words for Java 17.11 update.

Please also check the following articles:

@gmcintyre,

Regarding WORDSNET-15624, just to update you that we had changed the behavior of Range.Replace Methods starting from the 17.11 versions of Aspose.Words for .NET and Aspose.Words for Java APIs.

Now headers and footers of a section in Word document are processed in the following order:

  • If Section.PageSetup.DifferentFirstPageHeaderFooter is ‘True’:
    • First header
    • First footer
    • Even header
    • Even footer
    • Primary header
    • Primary footer
  • If Section.PageSetup.DifferentFirstPageHeaderFooter is ‘False’:
    • Primary header
    • Primary footer
    • Even header
    • Even footer