Determining a Page's starting and ending character position

Hello,
Is it possible to get the starting and ending character position of a Page of the word document after being loaded. We uses a Selection in the Word API to retrieve the positions of all Page breaks. We then build up a list of Pages with their starting char position and ending char position. Is it possible to do something similar in Aspose? Thanks!
Brad Back

Hi Brad,

Thanks for your inquiry. All text of the document is stored in runs of text. You can get list of all Run nodes in document using document.GetChildNodes(NodeType.Run, true) method, iterate through Run nodes collection and collect page numbers of every Run to achieve this. Please see following methods:

LayoutCollector.GetStartPageIndex method
LayoutCollector.GetEndPageIndex method
LayoutCollector.GetNumPagesSpanned method

I hope, this helps.

Best regards,

Is it possible for a single run of text to actually span more than 1 page?

Hi Brad,

Thanks for your inquiry. Yes, it is possible for a single run of text to actually span more than 1 page.

Best regards,

Thanks! So I can get the starting and ending character position of every run via the LayoutCollector. However, is it possible to find the character position of the pagebreak within the run (sorry if this is a dumb question, but I’m pretty new to this). If I can determine where the pagebreak is within the run, I will be able to determine the position within the document of the page break. Basically, what I’m ultimately trying to get is the character position of every page break within the document. We’re building up a PageCollection object and each page has a starting character position and ending character position. I can’t figure out in Aspose how to determine the starting and ending char position of every page. We’re using a “Selection” object in the Word DOM today to accomplish this.

I’m running a test process to pull out the text of the runs that span multiple pages and they are not lining up with what I’m seeing in the opened Word document. Is there a known issue or am I doing something wrong? I’m iterating through the doc.Sections. Then I iterate over each section’s Paragraphs. Finally, I iterate over each paragraph’s runs. I then call the layoutCollector.GetNumPagesSpanned(run) method on each run. Seems simple enough, but the runs that return anything greater than 0 don’t appear to be correct. The runs that we create using the Word Dom (and page break char positions) seem to match with what I’m seeing in the open Document. Anyone know what I’m doing wrong or is this just not working for some reason?

Thanks!

-Brad

Hi Brad,

Thanks for your inquiry. Well, you can look for a Run that contains a page break (please see ControlChar.PageBreak flag). Secondly, please make sure that you’re using latest version of Aspose.Words:
https://releases.aspose.com/words/net

In case the problem still remains, please attach your input Word document and source code here for testing. We will investigate the issue on our end and provide you more information.

Best regards,

This just made me realize there are no hard page breaks in the document. Even though the there are no actual Page Breaks, the layoutCollector still recognizes 11 pages in my document and I can find the runs that span more than one page. However, I still am unable to figure out the character position within the run that the break occurs (first character within the run that actually resides on the next page). Is this at all possible? TIA,

Brad

Hi Brad,

Thanks for your inquiry. I have attached a couple of classes here with this post. You can loop through Pages and get Text of Spans for example text of first Span as follows:

Document doc = new Document(MyDir + @"in.docx");
RenderedDocument layoutDoc = new RenderedDocument(doc);
ArrayList lines = new ArrayList();
foreach (RenderedPage page in layoutDoc.Pages)
{
    foreach (RenderedColumn column in page.Columns)
    {
        foreach (RenderedLine line in column.Lines)
        {
            foreach (RenderedSpan span in line.Spans)
            {
                // do something
            }
        }
    }
}

I hope, this way you will be able to get starting text of each page.

Best regards,

I’m not actually looking for the text, but rather the character position of the first and last character on the page. I’m not familiar with RenderedDocument, but will take a look. I really just need to know the character positions of the first and last character of every page. Doesn’t seem like this should be that hard, but I can’t figure out how to do it in Aspose. I tried a DocumentPageSplitter, but after creating the splitter, my document changed from 11 pages to 16. I wasn’t expecting the number of pages to change, so I’m a little confused. I figure it has to somehow split up the sections/paragraphs/runs that span pages, but didn’t think that would cause the number of pages in the document to change. I’ve attached a document I’ve been testing with. If anyone could explain how to get the POSITION of the last character on each page, I’d appreciate it (so if a run spans two pages, I just need to know the position of the last character of the run that resides on the starting page index. Thanks!

Brad

Hi Brad,

Thanks for the additional information. We are checking this scenario and will get back to you soon.

Best regards,

Hi Brad,

Generally, the class LayoutEnumerator should be able to accomplish this task.

Basically you need to create an instance of this class and then use MoveFirstChild() and MoveLastChild() methods to navigate to the first and last span on the page, also MoveNext() can be used to navigate between pages.

The only trick would be to get horizontal position of the last character as layout does not store character positions but rather positions of the spans (generally sequence of characters). So if you want exactly position of the character then Text property can be used to get rendered text of the last span on the page and then approximate X coordinate of the character based on the widths of characters comprising the span. It is not possible to get exact horizontal position because Aspose.Words uses internal font metrics which are not exposed in public API, but GDI+ metrics would be almost the same (with antialiasing turned on).

I hope, this helps.

Best regards,