Can we split word documnent by indexes,i.e like 2.1 one documnent and 2.2 as another

somasekhar.pyla · September 4, 2017, 5:57am

can we split word documnent by indexes,i.e like 2.1 one documnent and 2.2 as another .is it possible to split like this.

tahir.manzoor · September 4, 2017, 6:55am

@somasekhar.pyla,

Thanks for your inquiry. In your case, we suggest you please extract the desired contents from the document and save it to new document. Please refer to the following article:
How to Extract Selected Content Between Nodes in a Document

If you still face problem, please ZIP and attach your input and expected output documents here for our reference. We will then provide you more information on this.

somasekhar.pyla · September 4, 2017, 10:57am

hi thanks for reply…

i need to split into seperate document based on table of contents … i mean,based on table of contents it should split into different document i.e like 1.1.docx,1.2 docx like this

tahir.manzoor · September 5, 2017, 4:11am

@somasekhar.pyla,

Thanks for your inquiry. Please use the following code example to extract the document’s contents based on table of content. Hope this helps you.

Document doc = new Document(MyDir + "in.docx");
DocumentBuilder builder = new DocumentBuilder(doc);

builder.MoveToDocumentEnd();
builder.StartBookmark("_TocEnd");
builder.EndBookmark("_TocEnd");

NodeCollection nodes = doc.GetChildNodes(NodeType.FieldStart, true);

// Get list of bookmarks listed in TOC
ArrayList tocitems = new ArrayList();

foreach (FieldStart fstart in nodes)
{
    if (fstart.FieldType == Aspose.Words.Fields.FieldType.FieldPageRef)
    {
        String fieldText = fstart.GetField().GetFieldCode();

        if (fieldText.Contains("_Toc"))
        {
            fieldText = fieldText.Substring(fieldText.IndexOf("_Toc"), fieldText.Length - fieldText.IndexOf("_Toc")).Replace("\\h", "").Trim();
            tocitems.Add(fieldText);
        }
    }
}

for (int i = 0; i < tocitems.Count - 1; i++)
{
    BookmarkStart bookmarkStart =
    doc.Range.Bookmarks[tocitems[i].ToString()].BookmarkStart;
    BookmarkStart bookmarkEnd = doc.Range.Bookmarks[tocitems[i + 1].ToString()].BookmarkStart;

    // Firstly extract the content between these nodes including the bookmark.
    ArrayList extractedNodes = Extract_Contents.Common.ExtractContent(bookmarkStart, bookmarkEnd, false);

    Document doc2 = Extract_Contents.Common.GenerateDocument(doc, extractedNodes);
    doc2.Save(MyDir + tocitems[i] + "Out.docx");
}

somasekhar.pyla · September 5, 2017, 9:46am

hi thank for your reply…
the above code is working for heading only…i mean it working for table content like 1,2,3,etc
but i my requirment is ,in 1 there is sub session also …

introduction
1.1 my self
1.2 my life

here i need to split for 1.1 and 1.2 also.the above code is working for 1.introduction only…

and above code is generating error at

FieldStart fstart in nodes in foreach loop

foreach (FieldStart fstart in nodes)
{
if (fstart.FieldType == Aspose.Words.Fields.FieldType.FieldPageRef)
{
String fieldText = fstart.GetField().GetFieldCode();

    if (fieldText.Contains("_Toc"))
    {
        fieldText = fieldText.Substring(fieldText.IndexOf("_Toc"), fieldText.Length - fieldText.IndexOf("_Toc")).Replace("\\h", "").Trim();
        tocitems.Add(fieldText);
    }
}

}

thank u

tahir.manzoor · September 5, 2017, 3:31pm

@somasekhar.pyla,

Thanks for your inquiry. Please ZIP and attach your input and expected output Word documents here for our reference. We will then provide you more information on this along with code.

somasekhar.pyla · September 6, 2017, 3:38am

NewPlan1.java.zip (3.4 KB)

my word documenent contains following table of contents like this

heading
1.1 heading
1.2 heading
heading
2.1 heading
2.2 heading
2.2.1 heading
2.2.2 heading
3.heading
3.1 heading
etc
by using zip code,i am exacting output as
1.html for 1.heading
2.html for 1.1 heading
3.html for 1.2 heading
4.html for 2 heading
5.html for 2.1 heading
6.html for 2.2 heading
7.html for 2.2.1 heading
8.html for 2.2.2 heading
9.html for 3.heading
10.html for 3.1 heading
etc

But my requirment is

1.html for 1.heading
2.html for 1.1 heading
3.html for 1.2 heading
4.html for 2 heading
5.html for 2.1 heading
6.html for 2.2 heading(this must contains 2.2,2.2.1,2.2.2 heading)
7.html for 3.heading
8.html for 3.1 heading
etc

thank u

somasekhar.pyla · September 6, 2017, 10:47am

hi , i need code to extract data by table of contents
my requirrment is ,if i gave table of contents number from 3 to 6,
i should divide 3 as one html ,3.1 ,3.2…4,4.1,4.2,…upto 6.

so can i have code for it

tahir.manzoor · September 5, 2017, 3:31pm

@somasekhar.pyla,

Thanks for your inquiry. Please ZIP and attach your input and expected output Word documents here for our reference. We will then provide you more information on this along with code.

somasekhar.pyla · September 6, 2017, 3:42am

NewPlan1.java.zip (3.4 KB)

my word documenent contains following table of contents like this

heading
1.1 heading
1.2 heading
heading
2.1 heading
2.2 heading
2.2.1 heading
2.2.2 heading
3.heading
3.1 heading
etc
by using zip code,i am exacting output as
1.html for 1.heading
2.html for 1.1 heading
3.html for 1.2 heading
4.html for 2 heading
5.html for 2.1 heading
6.html for 2.2 heading
7.html for 2.2.1 heading
8.html for 2.2.2 heading
9.html for 3.heading
10.html for 3.1 heading
etc

But my requirment is
Based on heading like
if i gave 2.heading and 3.1 heading,then document must split from 2.heading to 3.1 heading.
etc

thank u

tahir.manzoor · September 6, 2017, 10:51am

@somasekhar.pyla,

Thanks for your inquiry. Please share the requested documents. This will help us to understand your scenario, and we will be in a better position to address your concerns accordingly. Thanks for your cooperation.

somasekhar.pyla · September 6, 2017, 10:58am

hi,sorry i cannot share document to you…

so can share me code for above requirment

tahir.manzoor · September 6, 2017, 11:22am

@somasekhar.pyla,

You can extract the content from the document using the code example shared in following article.
How to Extract Selected Content Between Nodes

You do not need to share your original document. Please create test document with some test data using MS Word and share it here for our reference.

somasekhar.pyla · September 6, 2017, 11:39am

new.docx.zip (17.4 KB)
hi…i need spliting based on toc like

1 Introduction
1.1 Scope
1.2 Functional Scope and Objectives
1.3 Key Assumptions
2 Traceability Matrix
2.1 User Comments
2.2 Additional Documents Matrix
2.3 Interface Matrix
2.4 Glossary of terms
3 Product Setup
3.1 Product Categories
3.2 Products
3.3 Loan Types
3.3.1.1 Basic details
3.3.1.2 Grace Period Schedule Details
3.3.1.3 Payment Schedule Details
3.3.1.4 Accounting Setup
3.3.1.5 Extended Details
3.3.1.6 Features
3.4 Promotions
3.5 Step-up & Step-Down

but when coming to 3.3 Loan Types,it should be entire one document upto 3.4,

so can you send me the code…

thank you

tahir.manzoor · September 6, 2017, 6:09pm

@somasekhar.pyla,

Thanks for sharing the detail. We are working over your query and will get back to you soon.

tahir.manzoor · September 7, 2017, 7:39am

@somasekhar.pyla,

Thanks for your patience. Please use the following code example to get the content between two TOC items. Hope this helps you.

String stratBookmrk = "3.3\tLoan Types";
String endBookmrk = "3.4\tPromotions";

Document doc = new Document(MyDir + "new.docx");

String start = GetTOCBookark(stratBookmrk, doc);
String end = GetTOCBookark(endBookmrk, doc);

if(start != "" && end != "")
{
    BookmarkStart bookmarkStart = doc.getRange().getBookmarks().get(start).getBookmarkStart();
    BookmarkStart bookmarkEnd = doc.getRange().getBookmarks().get(end).getBookmarkStart();

    ArrayList extractedNodes = ExtractContents.extractContent(bookmarkStart, bookmarkEnd, false);
    Document dstDoc = ExtractContents.generateDocument(doc, extractedNodes);
    dstDoc.save(MyDir + "output.html");

    System.out.println("successful");
}

public static String GetTOCBookark(String text, Document doc) throws  Exception
{
    NodeCollection<Node> nodes = doc.getChildNodes(NodeType.FIELD_START, true);

    // Get list of bookmarks listed in TOC
    ArrayList tocitems = new ArrayList();

    for (Node fstart : nodes) {
        if (((FieldChar) fstart).getFieldType() == FieldType.FIELD_PAGE_REF) {
            String fieldText = ((FieldChar) fstart).getField().getFieldCode();

            if (fieldText.contains("_Toc") && fstart.getParentNode().getText().contains(text)) {
                fieldText = fieldText
                        .substring(fieldText.indexOf("_Toc"), fieldText.length())
                        .replace("\\h", "").trim();
                return fieldText;
            }
        }
    }
    return "";
}

somasekhar.pyla · September 7, 2017, 8:48am

Thank u