Sections and DocumentVisitor

I am using the example shown in the DocumentVisitor help to write our the text of a wor ddocument. It works okay, but how can I determine whether a Run object is an actual Word section heading (ie has a style Heading 1 or 2 etc in the Word document. Note this is not an Aspose section object)?

thanks

You can use the following simple method to check if a run belongs to paragraph which has Heading1 or Heading2 style:

private bool CheckIsHeader(Run run)

{

Paragraph para = (Paragraph)run.GetAncestor(typeof(Aspose.Words.Paragraph));

return para != null && (para.ParagraphFormat.StyleIdentifier == StyleIdentifier.Heading1 || para.ParagraphFormat.StyleIdentifier == StyleIdentifier.Heading2);

}

Best regards,

Great that works well.

Another related question.

When parsing a document how do you detect if you are in:-

1) Ordered or unordered list (I would like to export it)

2) Table of contents (I would skip over the whole TOC)

Thanks

To detect if you are in a list you can use Paragraph.IsListItem property.

To detect whether you are inside TOC you can use the following approach:

private bool isInsideToc = false;

public override VisitorAction VisitFieldStart(FieldStart fieldStart)

{

if (fieldStart.FieldType == FieldType.FieldTOC)

isInsideToc = true;

return VisitorAction.Continue;

}

public override VisitorAction VisitFieldEnd(FieldEnd fieldEnd)

{

if (fieldStart.FieldType == FieldType.FieldTOC)

isInsideToc = false;

return VisitorAction.Continue;

}

Best regards,

Thats great. Works well, but still one last question (actually two)!

How do I know whether I am in a ordered or unordered list, and what number each item is in an ordered list? I could not find properties which would give me this information.

Thanks

If the paragraph belongs to a numbered or bulleted list use the following code:

if (Paragraph.ListFormat.ListLevel.NumberStyle == NumberStyle.Bullet)

// paragraph is a bulleted list item.

Finding a number of the current list item is a tricky task. You need to iterate through all lists in the document keeping count of list items belonging to various lists and list levels, and taking into account list formatting properties (list item numbering can be multilevel and can be restarted). All in all it requires a rather complex coding.

Best regards,

This is fine for this application.

Thanks.

This worked well for the first few documents as the section number appears in the header, but the latest document I have does not have the section number in the header, but appears in the document. How can I get the section number from a section? Can I send an example?

thanks

Well, if you are using the DocumentVisitor you can just keep counting sections in the VisitSectionStart method. The other, more straightforward way of getting the section number is the following:

int sectionNumber = doc.Sections.IndexOf(section);

Best regards,

I am using the document visitor model, but maybe section is the wrong word. There is only one Aspose section in the document. I am refering to the table of contents and the titles. As suggested, I am using the following code to get each Word section heading (which works well):-

private bool CheckIsHeader(Run run)

{

Paragraph para = (Paragraph)run.GetAncestor(typeof(Aspose.Words.Paragraph));

return para != null && (para.ParagraphFormat.StyleIdentifier == StyleIdentifier.Heading1 || para.ParagraphFormat.StyleIdentifier == StyleIdentifier.Heading2);

}

but how do I get the word section heading number eg 1, 1.1, 1.2, 1.2.1 etc ?

I am afraid it is impossible in the current version of Aspose.Words. You see, these numbers are not stored in the document file but are calculated by MS Word 'on the fly' when the document is rendered. We don't have rendering engine of our own at the moment but we plan to implement it in one of the future versions.

The only thing yoy can do is to try and calculate such numbers manually based on assumptions that you have for numbering in your document.

Not sure how to do this a you don't know which level you are at eg 1., 1.1 then either 1.1.1 or 1.2. How can you tell?

Can the numbers be obtained from the TOC. The numbers are there between HYPERLINK and PAGEREF fields. Is there a way to extract them eg assume numbers are the second run after HYPERLINK field or is this variable and not a good idea?

Technically it is possible but can require some complex and time-consuming code. I suggest you use our DocumentExplorer source code demo to check the actual node structure of your documents and find out if it is possible to implement this functionality with them. DocumentExplorer provides a very nice visual treeview of all nodes inside the document.

Best regards,