Get Remaining Page Size when Last Section in Word Document Spans Multiple Pages using C# .NET

@akoundal,

I’ve used the above function to get remaining page size, which works great. However, when the last section spans multiple pages, it fails to accurately calculate the remaining page space. is there a way to adapt this function to use the last page instead of the last section?

@zackforbing,

Unfortunately, your question is not clear enough therefore we request you to please elaborate your inquiry further by providing complete details of your usecase/scenario. Please also provide screenshot(s) and sample Word document for our reference. This will help us to understand your scenario, and we will be in a better position to address your concerns accordingly.

sure. from the previous article, using this function (adapted from our own usage):

double getRemainingPageHeight(DocumentBuilder builder) {
  Document doc = builder.getDocument();
  LayoutCollector lc = new LayoutCollector(doc);
  LayoutEnumerator le = new LayoutEnumerator(doc);
  ParagraphCollection paragraphs = doc.getLastSection().getBody().getParagraphs();
  double height = 0;
  for (Paragraph para : paragraphs) {
    le.setCurrent(lc.getEntity(para));
    double paragraphHeight = le.getRectangle().getHeight();
    height += paragraphHeight;
  }
  return Math.ceil(height);

when i create a document, I’d expect that for every page that has the same amount of content, this function would give the same numbers, but from this example, you can see that isn’t true:

Document doc = new Document();
DocumentBuilder builder = new DocumentBuilder(doc);

builder.writeln("first page");
String offset1 = Double.toString(getRemainingPageHeight(builder));
builder.writeln(offset1);
builder.insertBreak(BreakType.PAGE_BREAK);
builder.writeln("second page");
String offset2 = Double.toString(getRemainingPageHeight(builder));
builder.writeln(offset2);
builder.insertBreak(BreakType.PAGE_BREAK);
builder.writeln("third page");
String offset3 = Double.toString(getRemainingPageHeight(builder));
builder.writeln(offset3);

doc.save("offsets should be the same.doc");

but when I open the doc, I can see that the first offset is 38.0, the second is 66.0, and the third is 93.0. I could use BreakType.SECTION_BREAK_NEW_PAGE to create a new Section, but we are using these for another purpose in our implementation. is there a way to explicitly find the remaining height on the page without each page being a discrete Section?

also, apologies, but I’m looking for a solution for aspose.words for java. I couldn’t find any posts about this issue in java.

in addition, if you could explain how to use this to calculate table heights as well, I’d be grateful.

@zackforbing,

Please check the following Java code should calculate/print total heights of all the paragraphs in each page:

Document doc = new Document();
DocumentBuilder builder = new DocumentBuilder(doc);

builder.writeln("first page");
builder.insertBreak(BreakType.PAGE_BREAK);

builder.writeln("second page");
builder.writeln("second page");
builder.writeln("second page");
builder.insertBreak(BreakType.PAGE_BREAK);

builder.writeln("third page");
builder.writeln("third page");
builder.writeln("third page");

LayoutCollector lc = new LayoutCollector(doc);
LayoutEnumerator le = new LayoutEnumerator(doc);

ParagraphCollection paragraphs = doc.getLastSection().getBody().getParagraphs();
for(int pageIndex = 1; pageIndex <= doc.getPageCount(); pageIndex++) {
    double height = 0;
    for (Paragraph para : paragraphs) {
        if (lc.getStartPageIndex(para) == pageIndex) {
            le.setCurrent(lc.getEntity(para));
            double paragraphHeight = le.getRectangle().getHeight();
            height += paragraphHeight;
        }
    }
    System.out.println(Math.ceil(height));
}

@zackforbing,

I think, you can meet this requirement by using the following Java code:

Document doc = new Document("E:\\Temp\\Table.docx");

Table table = doc.getLastSection().getBody().getTables().get(0);
Cell first = table.getFirstRow().getFirstCell();
Cell last = table.getLastRow().getLastCell(); // Use the Cell which has most Paragraphs

LayoutCollector lc = new LayoutCollector(doc);
LayoutEnumerator le = new LayoutEnumerator(doc);

le.setCurrent(lc.getEntity(first.getLastParagraph()));
double top = le.getRectangle().getY();

le.setCurrent(lc.getEntity(last.getLastParagraph()));
double bottom = le.getRectangle().getY();

System.out.println(lc.getNumPagesSpanned(table));
if (lc.getNumPagesSpanned(table) == 0) // Whole Table resides in One Page? 
    System.out.println(Math.ceil(bottom - top));

@awais.hafeez,

thanks for your replies! I think this takes care of the issue.

@awais.hafeez,

thanks for all your assistance so far! I’m getting some strange behavior that doesn’t account for titles very well.

the title is added with the following code:

ParagraphFormat format = builder.getParagraphFormat();
format.clearFormatting();
format.setStyle(builder.getDocument().getStyles().get(headerStyle));
builder.writeln(sectionName);

and I adapted the code for the offset calculator function like so:

private double getParagraphOffsetModifier(DocumentBuilderExtension builder) throws Exception {
Document doc = builder.getDocument();
LayoutCollector lc = new LayoutCollector(doc);
LayoutEnumerator le = new LayoutEnumerator(doc);
builder.getDocument().updatePageLayout();
// grab every paragraph in the current section
ParagraphCollection paragraphs = doc.getLastSection().getBody().getParagraphs();
double offset = 0;
for (int i = 1; i <= doc.getPageCount(); i++) {
  for (Paragraph para : paragraphs) {
    // if paragraph is on the current page and has content, add its height to offset
    if (lc.getStartPageIndex(para) == doc.getPageCount()
        && !(para.toString(SaveFormat.TEXT).trim().equals("")
            || para.getChildNodes().getCount() == 0)) {
      le.setCurrent(lc.getEntity(para));
      double paragraphHeight = le.getRectangle().getHeight();
      offset += paragraphHeight;
    }
  }
}
return Math.ceil(offset);

it accounts just fine for one line titles. but when the title extends onto a second line, you can see in the pdf that the image is too large and flows off the edge of the page. is there anything in this code that can be manipulated to achieve a similar outcome as the one line title?
one line title.pdf (275.3 KB)
two line title.pdf (275.8 KB)

@zackforbing,

The PDF documents “one line title.pdf” and “two line title.pdf” that you attached in your previous post were actually generated by using old Aspose.Words for Java 19.6 version. To verify, please open your PDF files with Adobe Acrobat Reader. Go to File | Properties and see under “Advanced” section the value of “PDF Producer” is Aspose.Words for Java 19.6.

Please upgrade to the latest (20.6) version of Aspose.Words for Java and see how it goes on your end?

In case the problem still remains, then please ZIP and attach the following resources here for testing:

  • Your simplified input Word document you have generated these PDF files from
  • A standalone simple Java application (source code without compilation errors) that helps us to reproduce/observe this problem on our end and attach it here for testing. Please do not include Aspose.Words JAR files in it to reduce the file size.
  • Any other necessary resources that you think are required to reproduce the same issue on our end.

As soon as you get these pieces of information ready, we will start further investigation into your scenario and provide you more information.

we have updated to the most recent versions of Aspose.words and it doesn’t seem to solve the problem. I am in the process of making a standalone java app that reproduces this issue. I’ll comment again when that is finished.

edit: also, we are building these not from word documents, but from streams of html/css. I will capture the stream that both of these documents are produced from, if that helps. We also convert these streams to word docs, and I can send those as well, if need be.

@zackforbing,

Yes, please provide all the necessary resources and steps that you think are required to reproduce/observe the same problem on our end.

Hi there,

I figured out what the problem was in attempting to recreate the issue. we run this function as we are building the document, so the only page we care about is the last. the problem was the for loop. since we have the if statement screening paragraphs that aren’t on the last page, the for loop was causing paragraphs to get added to the offset multiple times. the final function has coalesced to this, for anyone trying to achieve a similar result:

  private double getParagraphOffsetModifier(DocumentBuilderExtension builder) throws Exception {
Document doc = builder.getDocument();
LayoutCollector lc = new LayoutCollector(doc);
LayoutEnumerator le = new LayoutEnumerator(doc);
builder.getDocument().updatePageLayout();
// grab every paragraph in the current section
ParagraphCollection paragraphs = doc.getLastSection().getBody().getParagraphs();
double offset = 0;
for (Paragraph para : paragraphs) {
  // if paragraph is on the current page and has content, add its height to offset
  if (lc.getStartPageIndex(para) == doc.getPageCount()
      && !(para.toString(SaveFormat.TEXT).trim().equals("")
          || para.getChildNodes().getCount() == 0)) {
    le.setCurrent(lc.getEntity(para));
    double paragraphHeight = le.getRectangle().height;
    offset += paragraphHeight;
  }
}
return Math.ceil(offset);

}

thanks for all your help, @awais.hafeez!

@zackforbing,

It is great that you were able to find what you were looking for and thanks for sharing the solution which may also help other developers who are facing the same problem.

Hi, I’ve gotten more figured out, but I’m noticing an odd interaction in the logic for the above function. I was noticing that when using builder.writeln for a string that wraps onto a second line, the paragraph height is calculated the same as if it didn’t. however, as I was trying to build an example, I noticed that on subsequent pages, the paragraph offset calculation is zero.

below is the current function used to get the height:

private double getParagraphOffsetModifier(DocumentBuilderExtension builder) throws Exception {
  Document doc = builder.getDocument();
  doc.updatePageLayout();
  LayoutCollector lc = new LayoutCollector(doc);
  LayoutEnumerator le = new LayoutEnumerator(doc);
  ParagraphCollection paragraphs = doc.getLastSection().getBody().getParagraphs();
  double offset = 0;
  for (Paragraph para : paragraphs) {
    if (lc.getStartPageIndex(para) == doc.getPageCount()
        && !(para.toString(SaveFormat.TEXT).trim().equals("")
        || para.getChildNodes().getCount() == 0)) {
      le.setCurrent(lc.getEntity(para));
      double paragraphHeight = le.getRectangle().getHeight();
      offset += paragraphHeight;
  }
}
return Math.ceil(offset);

}

and the example to run:

builder.writeln("suuuuuuuuuuuuper long line like really long this should probably wrap onto a new line but let's keep typing to make sure. and it should have a bigger paragraph height than other pages");
String offset1 = Double.toString(getParagraphOffsetModifier(builder));
builder.writeln(String.format("paragraph size: %s", offset1));
builder.insertBreak(BreakType.PAGE_BREAK);
builder.writeln("second page");
String offset2 = Double.toString(getParagraphOffsetModifier(builder));
builder.writeln(String.format("paragraph size: %s", offset2));
builder.insertBreak(BreakType.PAGE_BREAK);
builder.writeln("third page");
String offset3 = Double.toString(getParagraphOffsetModifier(builder));
builder.writeln(String.format("paragraph size: %s", offset3));

Document doc = builder.getDocument();
doc.save("paragraphHeight.docx", SaveFormat.DOCX);

I’m attaching two output pdf files that show both of these issues. the first paragraph of each pdf shouldn’t be the same height, and the subsequent pages on both pdfs give a paragraph height of zero.
paragraphHeight.pdf (24.8 KB)
paragraphHeightLonger.pdf (26.9 KB)

@zackforbing,

We first used the following Java code to produce this document: paragraphHeight.zip (6.6 KB)

Document doc = new Document();
DocumentBuilder builder = new DocumentBuilder(doc);
builder.writeln("suuuuuuuuuuuuper long line like really long this should probably wrap onto a new line but let's keep typing to make sure. and it should have a bigger paragraph height than other pages");
builder.insertBreak(BreakType.PAGE_BREAK);
builder.writeln("second page");
builder.insertBreak(BreakType.PAGE_BREAK);
builder.writeln("third page");
doc.save("C:\\Temp\\paragraphHeight.docx"); 

After that we noticed that the following code returns negative height values for two non-empty paragraphs:

Document doc = new Document("C:\\Temp\\paragraphHeight.docx");
DocumentBuilder builder = new DocumentBuilder(doc);

int counter = 1;
for (Paragraph para : (Iterable<Paragraph>) doc.getFirstSection().getBody().getChildNodes(NodeType.PARAGRAPH, true)) {
    if (!para.toString(SaveFormat.TEXT).trim().equals("")) {
        builder.moveTo(para);
        BookmarkStart start = builder.startBookmark("bm_" + counter);
        builder.endBookmark("bm_" + counter++);
        if (start != para.getFirstChild())
            para.insertBefore(start, para.getFirstChild());
    }
}

doc.updatePageLayout();

LayoutCollector lc = new LayoutCollector(doc);
LayoutEnumerator le = new LayoutEnumerator(doc);
for (Paragraph para : (Iterable<Paragraph>) doc.getFirstSection().getBody().getChildNodes(NodeType.PARAGRAPH, true)) {
    if (!para.toString(SaveFormat.TEXT).trim().equals("")) {
        le.setCurrent(lc.getEntity(para.getFirstChild()));
        double topOfParaStart = le.getRectangle().getY();

        le.setCurrent(lc.getEntity(para.getLastChild()));
        double bottomOfParaEnd = le.getRectangle().getY() + le.getRectangle().getHeight();

        System.out.println("Actual height of Para = " + (bottomOfParaEnd - topOfParaStart));
    }
}

For the sake of any correction, we have logged the above problem in our issue tracking system. The ID of this issue is WORDSNET-20930. We will further look into the details of this problem and will keep you updated on the status this issue. We apologize for your inconvenience.

thank you! is there any eta on this?

@zackforbing,

I am afraid, WORDSNET-20930 is currently pending for analysis and there is no ETA available at the moment. We will inform you via this forum thread as soon as this issue will get resolved in future or any estimates may be available. We apologize for any inconvenience.

@zackforbing,

Please see this Word DOCX document (paragraphHeight.zip (6.6 KB)) and try running the following Java code that calculates heights of all the individual Paragraphs:

Document doc = new Document("C:\\Temp\\paragraphHeight.docx");
LayoutCollector lc = new LayoutCollector(doc);
LayoutEnumerator le = new LayoutEnumerator(doc);

for (Paragraph para : (Iterable<Paragraph>) doc.getFirstSection().getBody().getChildNodes(NodeType.PARAGRAPH, true)) {
    if (!para.toString(SaveFormat.TEXT).trim().equals("")) {
        le.setCurrent(lc.getEntity(para)); // this will position on paragraph break span
        le.moveParent(); // move to container line
        double bottom = le.getRectangle().getY() + le.getRectangle().getHeight();
        while (le.movePrevious()) {
        }  // move to the first line
        double top = le.getRectangle().getY();

        System.out.println("Actual height of Para = " + (bottom - top));
    }
}

@awais.hafeez,

thanks for the solution. does this need the paragraph for loop? I adapted this to our function and it seems to be overestimating the paragraph height when there are multiple paragraphs on the page, possibly by counting paragraphs multiple times due to the loop.
Let me know if I’m getting this right:
the le.setCurrent() sets the LE to the current paragraph
the le.moveParent() moves to the paragraph container
thele.movePrevious() would then move backwards over the paragraph containers until there aren’t any more?

If that’s true, this functionality would be sufficient to calculate the height of all paragraphs that currently exist on the page without using the for (para : paragraphs) loop. is that correct?

EDIT: apparently, it isn’t, or I’m doing something wrong. is there supposed to be anything in the while loop?

@zackforbing,

Inside paragraphHeight.docx document, the first Paragraph on first Page comprises of two Lines. The following line of code will move cursor to the hidden ‘paragraph break character’ at the very end of last Line.

le.setCurrent(lc.getEntity(para));

The following line of code will move cursor to the last Line. We will calculate the bottom of Line at this point.

le.moveParent();

The while loop will then ascend up the Layout hierarchy Line by Line until it reaches the First Line. When cursor reaches the first Line, we will calculate the top of Paragraph. So, this essentially will calculate the height of individual Paragraph. Please see screenshot for more details:

Calculate Paragraph Height Word DOCX Document C# .NET

1 Like