Extract document content and save another document using C#

Can you please point me on the right direction on the following request?

I am trying to split a word document into other documents.. for example my document has different sections: INTRODUCTION,REVIEW AND CONCLUSION. I want to split the section review into a new document.. the document may look like this:

Introducion

this is my introduction

Review

this is my review

Conclusion

this is my conclusion

My new document will only have

Review

this is my review

Keep in mind that the location of the review section may vary base on the document, some documents may have multiple sections so I never now where exactly the review may be.. maybe search for the review word? I am attaching a test document. Please advise.. Thank You ahead of time

Hi

Thanks for your inquiry. Please follow up the code snippet to put each section in separate document.

// open source document.

Document doc = new Document("c:/temp/DocToSplit.docx");

// Loop through all sections.
for (int i = 0; i < doc.Sections.Count; i++)

{

Section section = doc.Sections[i];

// Create empty document.

Document subDoc = new Document();

subDoc.RemoveAllChildren();

// Append section to the empty document.

subDoc.AppendChild(subDoc.ImportNode(section, true, ImportFormatMode.KeepSourceFormatting));

// Save sub document to docx.

subDoc.Save("c:/temp/DocToSplit"+i+".docx");

}

In case any ambiguity, please let me know.

Hi

Moreover, please note that DocumentExplorer is a very useful tool which easily enables us to see the entire document structure. You can find DocumentExplorer in the folder where you installed Aspose.Words e.g. C:\Program Files (x86)\Aspose\Aspose.Words for .NET\Demos\CSharp\DocumentExplorer\bin\DocumentExplorer.exe. Below is the DOM structure of your document as viewed with DocumentExplorer:

Document explorer is showing Introduction, Review and conclusion within single section. I have attached output documents as well.

Can I split it even if it is in the same section? The document is created by the users so I never know if it will be created all in one section or in multiple section

Hi

Thanks for your inquiry. You can extract contents between two bookmarks and save it in separate Word document. I placed two bookmarks like “start” and “end”.

Please follow up the code snippet:

private void ExtractContent(Document srcDoc, string startBookmark, string endBookmark,string outputFile)

{

//Get start and end bookamerks from source document

Bookmark start = srcDoc.Range.Bookmarks[startBookmark];

Bookmark end = srcDoc.Range.Bookmarks[endBookmark];

 

//If strat of end bookamrk does not exist in the document then exit from the function

if (start == null || end == null)

return;

 

//Get first Node in the selection

Node startNode = start.BookmarkStart.ParentNode;

while (startNode.ParentNode.NodeType != NodeType.Body)

startNode = startNode.ParentNode;

//Get last Node in the selection

Node endNode = end.BookmarkStart.ParentNode;

while (endNode.ParentNode.NodeType != NodeType.Body)

endNode = startNode.ParentNode;

 

//Create new document

Document dstDoc = new Document();

 

Node currNode = startNode;

//Copy content

while (!currNode.Equals(endNode))

{

Node dstNode = dstDoc.ImportNode(currNode, true,ImportFormatMode.KeepSourceFormatting);

dstDoc.FirstSection.Body.AppendChild(dstNode);

 

//If next node is null we should move to the next section

if (currNode.NextSibling == null)

{

Section nextSection = (Section)currNode.GetAncestor(NodeType.Section).NextSibling;

currNode = nextSection.Body.FirstChild;

}

else

{

//move to next node

currNode = currNode.NextSibling;

}

}

 

//Save output document

dstDoc.Save(outputFile);

}

I have attached input/output documents. In case of any ambiguity, please let me know.