Problem with working with tables and page numbers of DOC / DOCX documents in ASPOSE.WORDS for JAVA

beralex · January 10, 2020, 5:37pm

Good afternoon!
There was a problem with working with headers and footers.

Objective: determine the vertical alignment of the text in the table cell in the footer, starting from the second page of the document (as an example, the text in the 3 cell of the footer)
Problem: the cell indicates that the text is centered, but ASPOSE.WORDS for JAVA returns VerticalAlignment = “TOP” (see VERTICAL_ALIGNMENT_INCORRECT.jpg).
Question: what can this problem be connected with?
Objective: to determine the correctness of the number of the current page indicated in the 3rd cell of the footer, starting from the second page of the document
Problem:
- When checking the file (the file itself should not be modified using ASPOSE.WORDS for JAVA), the MERGEFORMAT field appears and the total number of pages is always returned as the current page number (for all pages in the document) (see DOC_WITH_MERGEFORMAT.jpg )
- If you change (re-save the document) using ASPOSE.WORDS for JAVA, then the MERGEFORMAT field disappears and the current page number is returned correctly (see DOC_WITHOUT_MERGEFORMAT).
Question: how to get around the problem with the MERGEFORMAT field without changing the source file itself?

Source document: EXAMPLE_DOC.docx

Please help with the solution of these problems. All documents are attached in the archive Example_files.zip (138.0 KB)

awais.hafeez · January 11, 2020, 7:13am

We tested the scenario with latest version of Aspose.Words for Java i.e. 20.1 and have managed to reproduce the same problem on our end. For the sake of correction, we have logged this problem in our issue tracking system. The ID of this issue is WORDSJAVA-2291. We will further look into the details of this problem and will keep you updated on the status of correction. We apologize for your inconvenience.

Regarding the second scenario you mentioned, we performed following steps on our end.

Re-saved “EXAMPLE_DOC.docx” by using the following code:

Java Code:

Document doc = new Document("E:\\Example_files1\\EXAMPLE_DOC.docx");
doc.save("E:\\Example_files1\\awjava-20.1.docx");

Executed the following code on both the original “EXAMPLE_DOC.docx” document and the output “awjava-20.1.docx” document

Java Code:

Document doc = new Document("E:\\Temp\\Example_files1\\awjava-20.1.docx");
HeaderFooter primaryFooter = doc.getFirstSection().getHeadersFooters().getByHeaderFooterType(HeaderFooterType.FOOTER_PRIMARY);
Table targetTable = primaryFooter.getTables().get(0);
Cell targetCell = targetTable.getFirstRow().getLastCell();
System.out.println(targetCell.getText());

In both cases, we got the following output printed on Java console:

Problem with working with tables and page numbers of DOC / DOCX documents in ASPOSE.WORDS for JAVA

So, it seems that Aspose.Words for Java preserves all the content during re-saving your DOCX file. Or please provide complete steps that we can follow on our end to observe this behavior. Thanks for your cooperation.

beralex · January 13, 2020, 5:02pm

@awais.hafeez,
Good afternoon!

We are interested in how to get the correct current page number. In your example, the current page number is 10 (out of 10 pages). And so for every page. Those. the current page number is always 10 (the total number of pages in the document).

Those. we want to get directly the text that is indicated in the right cell of the footer in the form: “Current page number” / “Total number of pages”

In this case, for each page, the “Current Page Number” always returns the Total Number of Pages.

We need to get the text for the page that is indicated in the right cell of the footer as “Current page number”.

Can you help us with this problem? Perhaps the problem lies precisely in Mergeformat.

awais.hafeez · January 14, 2020, 5:49am

@beralex,

For Word documents, when you open them in MS Word and as it starts rendering the content of pages in its editor, the values of PAGE and NUMPAGES fields are calculated by MS Word on the fly. So, it will show you values of PAGE field (which is in footer story) as 2, 3, 4 … 10. While the NUMPAGES field value is displayed as 10 across all pages.

Now, when you convert your DOCX to PDF format for example by using Aspose.Words, the numbering remains correct i.e. 2/10, 3/10, 4/10 … 10/10. So, there does not seem to be an issue in Aspose.Words here. Please let me know if I can be of any further assistance.

beralex · January 14, 2020, 10:06am

@awais.hafeez,
Good afternoon!

Converting to PDF does not interest us. It is important for us to obtain information from a file in the Word format for solving the tasks described above. Converting to PDF will not help us in this case, because is too costly to run the code.

You said in the “PAGE” field the values 1,2, 3, …, 10 are returned depending on the current page. But the problem is that in our case, this (PAGE) field ALWAYS returns the value 10 (regardless of the current page).

Why is this happening? How can we get the correct current page number without converting the document to PDF?
Please help us solve these problems.

awais.hafeez · January 15, 2020, 5:08am

@beralex,

I have copied the Table containing PAGE and NUMPAGES fields from the Footer Story of “EXAMPLE_DOC.docx” into Body Story of a new Word document five times into five separate Pages and attached the document here for your reference:

in.zip (10.1 KB)

Now, when you run the following code, it will give you numbers from 1, 2 to 5.

Document doc = new Document("E:\\Temp\\Example_files\\in.docx");

for (Field field : doc.getRange().getFields()){
    if (field.getType() == FieldType.FIELD_PAGE){
        FieldPage fieldPage = (FieldPage) field;
        fieldPage.update();
        System.out.println(fieldPage.getResult());
    }
}

In your original document, since the fields are present in Footer story, MS Word repeats the whole content of Footer (while calculating the field values on the fly) during layouting Pages.

You can also calculate the Page numbers of any node in Body Story by using the LayoutCollector.getStartPageIndex and LayoutCollector.getEndPageIndex methods.

awais.hafeez · January 28, 2020, 8:54am

@beralex,

Regarding WORDSJAVA-2291, we have completed the work on your issue and most likely will close this issue with “Not a Bug” status. Please check below the analysis details:

The issue with vertical alignment is actually not a bug; you are retrieving numerical value of text alignment correctly, e.g.

cell.getCellFormat().getVerticalAlignment()

Then, converting integer value of alignment into human-readable name with aid of VerticalAlignment.getName(). But, VerticalAlignment class specifies vertical alignment of a floating shape, text frame or a floating table. Also, it includes INSIDE, OUTSIDE, etc values which are just not applicable to text inside a table cell.

Please use CellVerticalAlignment class instead, e.g.:

for (Cell cell : targetTable.getFirstRow().getCells())
    System.out.println(CellVerticalAlignment.getName(cell.getCellFormat().getVerticalAlignment()));

After the change we get the following output:

CENTER
CENTER
CENTER

Also, we have edited the DOCX file via MS Word, tried different vertical alignments right in the document. Everything works fine on our end.