UpdatePageLayout vs UpdatePageNumbers

Hi,

we noticed an issue with the UpdatePageLayout method in some scenarios. We usually open a word document and modify it, inserting some headers and footers and such stuff. As we once had an issue with wrong page numbers in the table of content we added to call the UpdatePageLayout before rendering to pdf. I really can´t remember if this was something suggested in any other forum topic found via google or if we found out by ourselves reading the docs :roll_eyes:

This method formats a document into pages and updates the page number related fields in the document such as PAGE, PAGES, PAGEREF and REF. The up-to-date page layout information is required for a correct rendering of the document to fixed-page formats.

However, there is some inconsistent behaviour. I attached you an example docx file which has already included header and footer files. It has 2 pages of content. When rendering it to pdf via word it somehow is reduced to one page - in the PDF but also inside the document after PDF conversion. I made a video of that behaviour.

If you render the document to PDF via Aspose, this behaviour is the same (which is good as we need the PDF output most close to word PDF output).
If you render the document to PDF with Aspose, calling UpdatePageLayout and rendering again, it still looks same, in both output files.
But if you call UpdatePageLayout before rendering to PDF, it looks different to the word output as the PDF is not “reduced” to one page.

First I thought it might be logical there is something changed, but then I figured out the second of the above cases and was confused why the second output still is good. Then I read the docs in more detail:

This method is automatically invoked when you first convert a document to PDF, XPS, image or print it. However, if you modify the document after rendering and then attempt to render it again - Aspose.Words will not update the page layout automatically. In this case you should call UpdatePageLayout() before rendering again.

I am confused now and have the following questions and probably bugs (or the docs are just wrong?):

  1. Why is it necessary to call UpdatePageLayout manually to update the table of contents if it should be invoked automatically if you render a word file to PDF the first time?
  2. Why is it a difference when calling the UpdatePageLayout method manually before first rendering if it is invoked automatically anyways.

I attached you some input documents and example code and the above mentioned video.
files.zip (3.2 MB)

Kind regards,
Daniel

@Serraniel,

Thanks for your inquiry. Please use latest version of Aspose.Words for .NET 18.8 to get the correct output.

You do not need to call Document.UpdatePageLayout method. This method is automatically invoked when you first convert a document to PDF, XPS, image or print it. However, if you modify the document after rendering and then attempt to render it again - Aspose.Words will not update the page layout automatically. In this case you should call UpdatePageLayout before rendering again.

Document.UpdatePageLayout method does not update the TOC field. To update this field, please call Document.UpdateFields method.

Hi @tahir.manzoor,

I have another question regarding your last sentence:

Document.UpdatePageLayout method does not update the TOC field. To update this field, please call Document.UpdateFields method.

Is there a way to avoid updating the chapter names in the toc? In word itself you have the option to just update the page numbers if you right click the toc and click update fileds button.

Scenario: One of our customers has a document with chapter names which are more than one line long. They manually adjust the formatting of the toc by adding line breaks in the chapter names. Our software now updates headers and footers, so the document is changed and we have to update the page numbers in the toc. By calling UpdateFields the names are also been reset to the default without those manual line breaks. How would you update the documents field but keeping the manual formatting of the table of content?

I again prepared a little scenario for you for demonstration:

  1. in.docx -> The document without manual changes in the toc
  2. in.pdf -> in.docx converted to pdf using Word 2016
  3. in_modified.docx -> The document with manual changes in the toc (line breaks and tabs)
  4. in_modified.pdf -> in_modified.docx converted to pdf using Word 2016
  5. out_modified.pdf -> in_modified.docx converted to pdf using Aspose
var doc = new Document(@"S:\in_modified.docx");
doc.Save(@"S:\out_modified.pdf");
  1. out_modified_updated.pdf -> in_modified.docx converted to pdf using Aspose after UpdateFields was used
doc = new Document(@"S:\in_modified.docx");
doc.UpdateFields();
doc.Save(@"S:\out_modified_updated.pdf");

The gif demnonstrates the function word offers to update the toc by updating only page numbers but no chapter captions. How can we achieve this behaviour to update the page numbers in toc (and other fields in inserted headers / footer) but keep the manual formatting in the toc?

Kind regards,
Daniel
example.zip (1.4 MB)

@Serraniel,

Thanks for your inquiry. You need to get the TOC field and call FieldToc.UpdatePageNumbers method. This method updates the page numbers for items in this table of contents.

To update the fields inside header/footer, please get the fields using HeaderFooter.Range.Fields property, iterate over each field and update it.

Thanks for your reply. I guess I got it working:

foreach (var fs in doc.GetChildNodes(NodeType.FieldStart, true).AsEnumerable().Cast<FieldStart>().Where(fs => fs.FieldType == FieldType.FieldTOC))
    ((FieldToc) fs.GetField()).UpdatePageNumbers();

@Serraniel,

Yes, you can use this code to get the desired output. You may use following line of code to update the TOC page numbers.

doc.Range.Fields.Cast<Field>().Where(f => f.Type == FieldType.FieldTOC).ToList().ForEach(f => ((FieldToc)f.).UpdatePageNumbers());

Please feel free to ask if you have any question about Aspose.Words, we will be happy to help you.