Convert Save As DOCX to PDF using Java & Mimic MS Word 2016 behavior to Insert Page Breaks Correctly

Hello!

I use aspose.words 20.2 to convert docx files to pdf:

Document document = new Document(“input.docx”);
document.save(“output.pdf”);

pageBreak.zip (54.5 KB)

In the attached zip you can see, that in den resuling pdf the text “Page 2 part 1” now is in page one.
In contrast, the resulting pdf is fine, if I use Microsoft Word (2016) for the conversion.

@dvtdaten,

We tested the scenario and have managed to reproduce the same problem on our end. For the sake of correction, we have logged this problem in our issue tracking system. The ID of this issue is WORDSNET-20090. We will further look into the details of this problem and will keep you updated on the status of correction. We apologize for your inconvenience.

Can you please also “Save As” this Word document to PDF and XPS formats by using MS Word 2016 on your end, ZIP and attach them here for further testing?

@dvtdaten,

Please also convert your “input.docx” document to PDF and XPS formats by using Save As command of MS Word 2016 on your end, ZIP and attach the PDF & XPS files here for further testing. Thanks for your cooperation.

Hello, here is it: pageBreak.zip (255.0 KB)

@dvtdaten,

Thanks for sharing these documents with us. We will keep you posted on any further updates and let you know when this issue will be resolved in future.

@dvtdaten,

The mismatch between MS Word PDF/XPS output on your side and Aspose.Words’ layout occurs because in your output MS Word does not respect cell hideMark property for some reason.

The row above the one with “Page 2 part 1” text has a different Aspose.Words output and MS Word output you provided. In that row, column 2 cell has five empty paragraphs. The cell has hideMark property. In presence of the property, Aspose.Words collapses the last empty paragraph that correspond to the cell break. So the cell height in Aspose.Words is computed from 4 empty lines. In your XPS output, five empty lines are present in the canvas corresponding to the problematic cell. Because of that, the row before “Page 2 par 1” is taller in your document’s layout and the next row is pushed to page 2.

Aspose.Words layout, however, matches MS Word 2019 layout on my machine, both in the editor and on PDF/XPS export and print preview. So far the only way to reproduce the your output from MS Word on our end is to place the cursor on the problematic cell break right on opening the document. In that case, MS Word probably considers that the last paragraph is being edited and hence is not empty. If the document is saved to PDF from MS Word in that state, the output matches PDF/XPS you provided.

We have seen similar behavior before in MS Word. When cursor is placed in the collapsed line, indeed, MS Word may keep line visible. But, we regard this as unexpected behavior as it allows two different outputs for the same document. Aspose.Words is a server component and it does not have the concept of “cursor”, hence we should assume that its position should not affect output.

We have now completed the work on this issue and concluded to close this issue as “Won’t fix”. We would not be able to do any changes in Aspose.Words API as it mimics MS Word 2019.

1 Like

Thank you for the detailed explanation of the problem!