can we split word documnent by indexes,i.e like 2.1 one documnent and 2.2 as another .is it possible to split like this.
Thanks for your inquiry. In your case, we suggest you please extract the desired contents from the document and save it to new document. Please refer to the following article:
How to Extract Selected Content Between Nodes in a Document
If you still face problem, please ZIP and attach your input and expected output documents here for our reference. We will then provide you more information on this.
hi thanks for reply…
i need to split into seperate document based on table of contents … i mean,based on table of contents it should split into different document i.e like 1.1.docx,1.2 docx like this
Thanks for your inquiry. Please use the following code example to extract the document’s contents based on table of content. Hope this helps you.
Document doc = new Document(MyDir + "in.docx");
DocumentBuilder builder = new DocumentBuilder(doc);
builder.MoveToDocumentEnd();
builder.StartBookmark("_TocEnd");
builder.EndBookmark("_TocEnd");
NodeCollection nodes = doc.GetChildNodes(NodeType.FieldStart, true);
// Get list of bookmarks listed in TOC
ArrayList tocitems = new ArrayList();
foreach (FieldStart fstart in nodes)
{
if (fstart.FieldType == Aspose.Words.Fields.FieldType.FieldPageRef)
{
String fieldText = fstart.GetField().GetFieldCode();
if (fieldText.Contains("_Toc"))
{
fieldText = fieldText.Substring(fieldText.IndexOf("_Toc"), fieldText.Length - fieldText.IndexOf("_Toc")).Replace("\\h", "").Trim();
tocitems.Add(fieldText);
}
}
}
for (int i = 0; i < tocitems.Count - 1; i++)
{
BookmarkStart bookmarkStart =
doc.Range.Bookmarks[tocitems[i].ToString()].BookmarkStart;
BookmarkStart bookmarkEnd = doc.Range.Bookmarks[tocitems[i + 1].ToString()].BookmarkStart;
// Firstly extract the content between these nodes including the bookmark.
ArrayList extractedNodes = Extract_Contents.Common.ExtractContent(bookmarkStart, bookmarkEnd, false);
Document doc2 = Extract_Contents.Common.GenerateDocument(doc, extractedNodes);
doc2.Save(MyDir + tocitems[i] + "Out.docx");
}
hi thank for your reply…
the above code is working for heading only…i mean it working for table content like 1,2,3,etc
but i my requirment is ,in 1 there is sub session also …
- introduction
1.1 my self
1.2 my life
here i need to split for 1.1 and 1.2 also.the above code is working for 1.introduction only…
and above code is generating error at
FieldStart fstart in nodes in foreach loop
foreach (FieldStart fstart in nodes)
{
if (fstart.FieldType == Aspose.Words.Fields.FieldType.FieldPageRef)
{
String fieldText = fstart.GetField().GetFieldCode();
if (fieldText.Contains("_Toc"))
{
fieldText = fieldText.Substring(fieldText.IndexOf("_Toc"), fieldText.Length - fieldText.IndexOf("_Toc")).Replace("\\h", "").Trim();
tocitems.Add(fieldText);
}
}
}
thank u
Thanks for your inquiry. Please ZIP and attach your input and expected output Word documents here for our reference. We will then provide you more information on this along with code.
NewPlan1.java.zip (3.4 KB)
my word documenent contains following table of contents like this
- heading
1.1 heading
1.2 heading - heading
2.1 heading
2.2 heading
2.2.1 heading
2.2.2 heading
3.heading
3.1 heading
etc
by using zip code,i am exacting output as
1.html for 1.heading
2.html for 1.1 heading
3.html for 1.2 heading
4.html for 2 heading
5.html for 2.1 heading
6.html for 2.2 heading
7.html for 2.2.1 heading
8.html for 2.2.2 heading
9.html for 3.heading
10.html for 3.1 heading
etc
But my requirment is
1.html for 1.heading
2.html for 1.1 heading
3.html for 1.2 heading
4.html for 2 heading
5.html for 2.1 heading
6.html for 2.2 heading(this must contains 2.2,2.2.1,2.2.2 heading)
7.html for 3.heading
8.html for 3.1 heading
etc
thank u
hi , i need code to extract data by table of contents
my requirrment is ,if i gave table of contents number from 3 to 6,
i should divide 3 as one html ,3.1 ,3.2…4,4.1,4.2,…upto 6.
so can i have code for it
Thanks for your inquiry. Please ZIP and attach your input and expected output Word documents here for our reference. We will then provide you more information on this along with code.
NewPlan1.java.zip (3.4 KB)
my word documenent contains following table of contents like this
- heading
1.1 heading
1.2 heading - heading
2.1 heading
2.2 heading
2.2.1 heading
2.2.2 heading
3.heading
3.1 heading
etc
by using zip code,i am exacting output as
1.html for 1.heading
2.html for 1.1 heading
3.html for 1.2 heading
4.html for 2 heading
5.html for 2.1 heading
6.html for 2.2 heading
7.html for 2.2.1 heading
8.html for 2.2.2 heading
9.html for 3.heading
10.html for 3.1 heading
etc
But my requirment is
Based on heading like
if i gave 2.heading and 3.1 heading,then document must split from 2.heading to 3.1 heading.
etc
thank u
Thanks for your inquiry. Please share the requested documents. This will help us to understand your scenario, and we will be in a better position to address your concerns accordingly. Thanks for your cooperation.
hi,sorry i cannot share document to you…
so can share me code for above requirment
You can extract the content from the document using the code example shared in following article.
How to Extract Selected Content Between Nodes
You do not need to share your original document. Please create test document with some test data using MS Word and share it here for our reference.
new.docx.zip (17.4 KB)
hi…i need spliting based on toc like
1 Introduction
1.1 Scope
1.2 Functional Scope and Objectives
1.3 Key Assumptions
2 Traceability Matrix
2.1 User Comments
2.2 Additional Documents Matrix
2.3 Interface Matrix
2.4 Glossary of terms
3 Product Setup
3.1 Product Categories
3.2 Products
3.3 Loan Types
3.3.1.1 Basic details
3.3.1.2 Grace Period Schedule Details
3.3.1.3 Payment Schedule Details
3.3.1.4 Accounting Setup
3.3.1.5 Extended Details
3.3.1.6 Features
3.4 Promotions
3.5 Step-up & Step-Down
but when coming to 3.3 Loan Types,it should be entire one document upto 3.4,
so can you send me the code…
thank you
Thanks for sharing the detail. We are working over your query and will get back to you soon.
Thanks for your patience. Please use the following code example to get the content between two TOC items. Hope this helps you.
String stratBookmrk = "3.3\tLoan Types";
String endBookmrk = "3.4\tPromotions";
Document doc = new Document(MyDir + "new.docx");
String start = GetTOCBookark(stratBookmrk, doc);
String end = GetTOCBookark(endBookmrk, doc);
if(start != "" && end != "")
{
BookmarkStart bookmarkStart = doc.getRange().getBookmarks().get(start).getBookmarkStart();
BookmarkStart bookmarkEnd = doc.getRange().getBookmarks().get(end).getBookmarkStart();
ArrayList extractedNodes = ExtractContents.extractContent(bookmarkStart, bookmarkEnd, false);
Document dstDoc = ExtractContents.generateDocument(doc, extractedNodes);
dstDoc.save(MyDir + "output.html");
System.out.println("successful");
}
public static String GetTOCBookark(String text, Document doc) throws Exception
{
NodeCollection<Node> nodes = doc.getChildNodes(NodeType.FIELD_START, true);
// Get list of bookmarks listed in TOC
ArrayList tocitems = new ArrayList();
for (Node fstart : nodes) {
if (((FieldChar) fstart).getFieldType() == FieldType.FIELD_PAGE_REF) {
String fieldText = ((FieldChar) fstart).getField().getFieldCode();
if (fieldText.contains("_Toc") && fstart.getParentNode().getText().contains(text)) {
fieldText = fieldText
.substring(fieldText.indexOf("_Toc"), fieldText.length())
.replace("\\h", "").trim();
return fieldText;
}
}
}
return "";
}
Thank u