Aspose.Words- headings and full text

Hello,

we are currently working on a project, initally developped with a lot of Word automation, that we now need to move server side.

I am currently evaluating different vendors, and I just wanted to make sure that the following features are available in Aspose.word (I think they are, but I prefere to write a little post than a long prototype if they are not)

Basically, we need to do some pattern matching, and, once a pattern (text identified with a regular expression) has been identified, we need to find the section in which this pattern is located, and to retrieve the section title and number. It is not a “section”, which is the word that it was reffered to by the functional expert, but, in Word terms, the closer line with a “heading” style when reading the document backwards.

As far as I remember (I do not have the code at the moment,), there is an automation expression to navigate between the headings, and we would like to tknow if this one exists in Aspose too.

Thanks by advance

philippe

Hi
Thanks for your inquiry. Yes you can use regular expressions to find text in the document. See the following link for more information.
https://reference.aspose.com/words/net/aspose.words/range/replace/
Then you can get paragraph that contains matched text (using ReplaceEvaluator) and search for paragraph with heading style.
If you need I can create demo code for. If so please provide me sample document and regex that should be used. (Also, which language are you using& C#, VB or Java?)
Best regards.

*If you need I can create demo code for

That would be great !

Actually, I do not have a sample document available (sending the message from home, it is 00:30 in Europ), but if you have some time, the sample doc would simply be:

(font=heading 1)1 title

(font=heading 1)2 title 2
blabla
(font=heading 2)2.1 subtitle 2

blabla

And we need to get the following data (in an object):
id=001, title=title, section=1

id=002, title=subtitle 2, section=2.1

Technically, the regex is already done, my question was more concerning a clean way to identify the text and the numbering of each headings, ot to navigate through them.

I will try to look into the API tomorrow (for me…), and, if I am stucked, I will probably post another message here (12 hours offset…)

By the way, we are developping in C#

Anyway, thanks for your answer, at least I know it is feasible*

Hi
Thank you for additional information. I created sample code for you. This code iterates through all headings (1, 2 and 3) and create string that contains it’s text and level number.

// Open document
Document doc = new Document(@"Test180\in.doc");
// Get first node in the document
Node currentNode = doc.FirstSection.Body.FirstChild;
// Just to demonstrate a technique i will build strig that contains all headings and sub headings
string output = string.Empty;
// There is no way to get nubbers of list items leke 1, 1.2, 1.3.1 etc
// So I create three counters for eachlevel of headings
// You can use more than 3 levels
int firstLevel = 0;
int secondLevel = 0;
int thirdLevel = 0;
// Here string templates
string firstFormat = "{0} {1}\n";
string secondFormat = "{0}.{1} {2}\n";
string thirdFormat = "{0}.{1}.{2} {3}\n";
// loop through all nodes in the documen tand search for headings
while (currentNode != null)
{
    // Check whether current node is Paragraph
    if (currentNode.NodeType == NodeType.Paragraph)
    {
        Paragraph currPar = (Paragraph)currentNode;
        // Check whether paragraph is title of section
        switch (currPar.ParagraphFormat.StyleIdentifier)
        {
            case StyleIdentifier.Heading1:
                // Increase first level and reset second
                ++firstLevel;
                secondLevel = 0;
                output += String.Format(firstFormat, firstLevel, currPar.ToTxt());
                break;
            case StyleIdentifier.Heading2:
                ++secondLevel;
                thirdLevel = 0;
                output += String.Format(secondFormat, firstLevel, secondLevel, currPar.ToTxt());
                break;
            case StyleIdentifier.Heading3:
                ++thirdLevel;
                output += String.Format(thirdFormat, firstLevel, secondLevel, thirdLevel, currPar.ToTxt());
                break;
        }
    }
    if (currentNode.NextSibling == null)
    {
        Node currSect = currentNode.GetAncestor(NodeType.Section);
        // If there is one more section then move to next section
        if (currSect.NextSibling != null)
            currentNode = (currSect.NextSibling as Section).Body.FirstChild;
        else
            currentNode = null;
    }
    else
    {
        // Move to next node
        currentNode = currentNode.NextSibling;
    }
}
Console.Write(output);

Also see attached file.
Best regards.