GetStartPageIndex does not work right

I have the docx file abc.docx (36.8 KB)

var originalDoc = new Aspose.Words.Document(dataDir + "abc.docx");
var table = (Aspose.Words.Tables.Table)doc.LastSection.GetChild(NodeType.Table, 0, true);
LayoutCollector collector = new LayoutCollector(originalDoc);
int firstPage= collector.GetStartPageIndex(table .FirstRow);
int secondPage  = collector.GetStartPageIndex(table .LastRow);

The secondPage should be number 1 because the last row of table still in the first page
But when I debug the secondPage is 2
image.png (41.7 KB)

@TanPham
We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.

Issue ID(s): WORDSNET-25275

You can obtain Paid Support Services if you need support on a priority basis, along with the direct access to our Paid Support management team.

Aspose.Words.Saving.PdfSaveOptions pso = new Aspose.Words.Saving.PdfSaveOptions();
pso.Compliance = PdfCompliance.Pdf17;
originalDoc.Save(dataDir + "abc.pdf", pso);

and then when I try to save it as PDF file the last 6 rows are moved to the second page abc.pdf (54.0 KB)

@TanPham Thank you for additional information. Yes, we have noticed this as well.
Please note, the same layout engine is used by LayoutCollector and for rendering document to fixed page formats, such as PDF.

Actually im facing the problem is
I tried to get page count, the data I have render in output should be have total page is 1 but i dont know why when I use Document.Page.Count it always is 2
I think have something wrong effect to the structure of document

@TanPham Also, the same layout engine is used for calculation of number of pages in the document. So it is expected that for your document returned page count is 2.
By the way, MS Word on my side also shows two pages in your document.

right, but in my code I do not render the line break on the second page, I am not sure why have that . I just render the last row of the table then I save the file
and I don’t know why when convert to pdf it will moved 8 last rows to next page

this is PDF file I used convert on website Word to PDF | Convert Your Doc to PDF Online for Free
image.png (182.0 KB)

it do not move 8 last rows to the next page

@TanPham There is no concept of page in MS word documents, since they are flow documents. The consumer applications, like MS Word or OpenOffice, reflows the document content into pages on the fly. The document layout depends, for example, on the fonts available for the consumer application, since different fonts might have different metrics, that might affect document layout.
In your document, if open it in MS Word, an empty paragraph is pushed to the second page:

Note, in MS Word documents, the table cannot be the last node, there is always a paragraph after the table. If reduce the size of this empty paragraph to zero, the document is rendered as expected:

Document doc = new Document(@"C:\Temp\in.docx");
doc.LastSection.Body.LastParagraph.ParagraphBreakFont.Size = 0;
doc.Save(@"C:\Temp\out.pdf");

then about the problem convert DOCX to PDF why aspose and other tool have different output, and the page of last row of table is 2 , please help me investigate this, thanks you

@TanPham Because MS word documents are flow documents and does not contain any information about the document layout. Each tool has it’s own layout engine that build that reflows the document into pages. This causes the difference.

how about the function GetStartPageIndex(table.LastRow) , the result is 2, does it right?

@TanPham As I already mentioned the same layout engine is used by LayoutCollector and for rendering document to fixed page formats, such as PDF. So if the last row is rendered on the second page when you convert your document to PDF using Aspose.Words, it is expected that GetStartPageIndex for this row will return 2 too.

The issue have been solved thanks you for your support @alexey.noskov

1 Like