How to get rectangles bounding Table of contents in a Word Document

oraspose · March 1, 2017, 6:56am

Hi,

We need to get the rectangles which surrounds the table of contents in a word document.

Could you please guide us how it can be done using aspose APIs?

Thanks

Manisha

oraspose · March 2, 2017, 4:22am

Hi,

I need to know the java.awt.geom.Rectangle2D.Float type object for the portion of document containing Automatic Table Of Contents.

Please send the sample code in Java only.

Thanks

Manisha

awais.hafeez · March 3, 2017, 5:35am

Hi Manisha,

Thanks for your inquiry.

You will find a “Table of Contents” field in this sample TOC.docx document. It is enclosed inside a content control. Please also refer to the attached screenshot.

When you open this document with MS Word 2016, it says that

(x, y) is at (0, 0)
Width of this TOC is 6.5 inches and
Height is a little more than 2.5 inches.

Do you want Aspose.Words for Java code that calculates these above parameters? Or you are looking for some other rectangle.

What rectangle values are you expecting in case when the TOC spans across multiple pages? For this case, please explain with sample Word document and screenshot(s).

Thanks for your cooperation.

Best regards,

oraspose · March 6, 2017, 9:17am

Hi,

We need following information for TOC:

page number where the TOC is lying

X coordinate of top left corner of rectangle

Y Coordinate of top left corner of rectangle

w width of rectangle

h height of rectangle

If TOC is spanning across multiple pages then we need rectangle information for each page.

Basically we need to get the information about the location where TOC is lying in the document.

Any help on this will be highly appreciated.

Looking forward to your reply.

Thanks

Manisha

awais.hafeez · March 7, 2017, 11:09pm

Hi Manisha,

Thanks for the additional information. We have logged your requirement in our issue tracking system. Your ticket number is WORDSNET-15006. Our product team will further look into the details of this problem and we will keep you posted on further updates. We apologize for your inconvenience.

Best regards,

awais.hafeez · March 21, 2017, 4:31am

Hi Manisha,

Thanks for being patient.

Usually table of content (TOC) in Word is generated by a special TOC field. It may or may not be wrapped in Structured Document Tag (SDT). In MS Word 2016 when TOC is inserted through UI it will be wrapped in SDT by default. TOC can also be inserted manually, that is hard coded, in which case it will be indistinguishable from surrounding text in the document.

Now the TOC itself is a set of paragraphs formatted according to the TOC styles. TOC content is just like any other content in the document - paragraphs of text, tables, etc. If one needs to know where visually TOC is on the pages then it is necessary to get bounds of all document nodes which belongs to the TOC content.

If TOC is wrapped in SDT then one can assume that whatever appears inside of that SDT is TOC content. If TOC is not wrapped in SDT then special logic must be used to find where TOC content starts and ends. Specifically, since TOC is generated by TOC field one needs to find result of this field and assume that all content inside of the result is TOC content. Note that document may have multiple TOCs and they could be positioned in tables, inline stories or headers/footers. It is also possible to insert content manually into generated TOC.

Let us assume we only interested in first TOC in the document and it is positioned in a single column section in the main text. In this case we would find first TOC field in the document, navigate to the field separator and iterate all nodes between that separator and the field end.

The following code is an example. It works for simple cases. You will need to add extra logic to support advanced scenarios.

Document doc = new Document(“D:\temp\TOC.docx”);

<pre style=“background-color: rgb(255, 255, 255); font-family: “Courier New”; font-size: 9pt;”>LayoutCollector collector = new LayoutCollector(doc);
LayoutEnumerator it = new LayoutEnumerator(doc);

// Field start of first TOC in the document
FieldStart fieldStart = null;
for (FieldStart start : (Iterable) doc.getChildNodes(NodeType.FIELD_START, true)){
if (start.getFieldType() == FieldType.FIELD_TOC){
fieldStart = start;
break;
}
}

// TOC can be anywhere in the document, including headers and table cells or it may be missing.
// We only interested in a TOC sitting in the main text story and not inside of a table.
// Also note that document can be invalid, for example, may not have field end or have it in different story.
// In this case code will crash, but we cannot assume anything about an invalid document anyway.
if (fieldStart == null ||
fieldStart.getAncestor(NodeType.CELL) != null ||
fieldStart.getAncestor(NodeType.BODY) == null)
return;

Field field = fieldStart.getField();
FieldSeparator separatorNode = field.getSeparator();
FieldEnd endNode = field.getEnd();

// Now we are interested in content between separator and end. Field result is sitting there.
// TOC has fields in the result too and fields in the result of those fields as well.
// { TOC | { HYPERLINK | First heading {PAGE} }¶{ HYPERLINK | Second heading {PAGE} }¶ … }
// Complicated but we do not care since field codes are not rendered and field charactes do not take space on the page either.

// It would be tricky to navigate document nodes in a search for the TOC content, instead we will navigate layout model
Object separator = collector.getEntity(separatorNode);
Object end = collector.getEntity(endNode);

// Maps page number to list of line bounding boxes relative to the page.
Hashtable pageMap = new Hashtable<Integer, List>();

// Remember line container of the field end. We need it to stop searching for TOC content.
it.setCurrent(end);
it.moveParent();
Object lastLine = it.getCurrent();

// Move to line container of the separator
it.setCurrent(separator);
it.moveParent();

// Collect rectangles of lines/rows inside TOC content
while (true)
{
Integer pageIndex = it.getPageIndex();
ArrayList rectList = (ArrayList) pageMap.get(pageIndex);

if (rectList == null)
{
rectList = new ArrayList();
pageMap.put(pageIndex, rectList);
}
rectList.add(it.getRectangle());

// Do not go beyond last line of TOC content
if (it.getCurrent() == lastLine)
break;

// Move to the next line in paragraph or next row in a table, if failed then need to find a logical link to the next entity.
// This implements moving along the logical order of the lines and rows in the document across page boundaries.
// Works for main text positions in a column, does not work within cell/header/footer, etc.
if (!it.moveNextLogical())
{
// Move to last span in a line or last cell in a row
it.moveLastChild();
// If moved to cell then descend to last span of it
if (it.getType() == LayoutEntityType.CELL)
{
// Note that last container in a cell is always a line.
it.moveLastChild();
it.moveLastChild();
}
// Since all spans are linked through the story we can move to the next logical span in the document.
it.moveNextLogical();
// Ascend to line
it.moveParent(LayoutEntityType.LINE);
// If inside a cell ascend to the topmost row
while (it.moveParent(LayoutEntityType.ROW)) { }
}
}

// For every page union line rectangles to get area covered by TOC
// Note that if TOC spans multiple columns on a page you would need to be smarter and union rectangles per column rather than page
Set keys = pageMap.keySet();
for (Integer key : (Iterable) keys) {
Integer pageIndex = key;

Rectangle2D rect = (Rectangle2D) (((ArrayList) pageMap.get(key)).get(0));

for (Rectangle2D r : (Iterable) ((ArrayList) pageMap.get(key)))
rect = rect.createUnion®;

System.out.println("Page: " + pageIndex + " & Rect: " + rect);
}

Hope, this helps.

Best regards,