Free Support Forum - aspose.com

Splitting Word document into different files

Hello.

We need to split a Word document into different files. For each entry of the TOC there must be a seperate file. Can this be done with your framework?
We are using Java as Programming language.

Thanks a lot.


This message was posted using Page2Forum from Work with Ranges - Aspose.Words for .NET and Java

Hi<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" />

Thanks for your request. Yes I think that you can achieve this using Aspose.Words. For example see the attached document and the following code:

//Open document

Document doc = new Document("in.doc");

//Get collection of Paragraphs

NodeCollection paragraphs = doc.getChildNodes(NodeType.PARAGRAPH, true);

Paragraph par = null;

int docIndex = 0;

//Loop through all paragraphs in the document

for (int parIndex = 0; parIndex < paragraphs.getCount(); parIndex++)

{

par = (Paragraph)paragraphs.get(parIndex);

//If Paragraph style = HEADING_1 then copy content to the new document

if (par.getParagraphFormat().getStyleIdentifier() == StyleIdentifier.HEADING_1)

{

//Create new document

Document outDoc = new Document();

Node currentNode = par;

while (currentNode != null)

{

//Import Node

Node importedNode = outDoc.importNode(currentNode, true, ImportFormatMode.KEEP_SOURCE_FORMATTING);

//insert node into the new document

outDoc.getFirstSection().getBody().appendChild(importedNode);

//If next node=null then move to the next section

if (currentNode.getNextSibling() == null)

{

//Get next section

Section currrentSection = (Section)currentNode.getAncestor(NodeType.SECTION).getNextSibling();

//If next section != null then get its first child

if (currrentSection != null)

currentNode = currrentSection.getBody().getFirstChild();

else

break; //else exit from while

}

else

{

//Get next node

currentNode = currentNode.getNextSibling();

}

//Check if current node is paragraph

if (currentNode.getNodeType() == NodeType.PARAGRAPH)

{

//Check if its style is HEADING_1

if (((Paragraph)currentNode).getParagraphFormat().getStyleIdentifier() == StyleIdentifier.HEADING_1)

{

//If so then set par index and exit while

parIndex = paragraphs.indexOf(currentNode) - 1;

break;

}

}

}

//Save output document

outDoc.save("Section_" + String.valueOf(docIndex) + ".doc");

//increase docIndex

docIndex++;

}

}

I hope this could help you.

Best regards.

Hi.

Thanks for the fast reply. I tried the above code and it works fine for small documents. We have a very large document (about 130 MB) that has to be split in several documents. When I load that document it seems that only the first part of the document is parsed. When I copy a part of the document into a new Word doc and parse this one, it is working fine again.

Is this a restriction of the evaluation version?

Thanks.

Thomas

Hi<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" />

Thanks for your request. Aspose.Words in evaluation mode limits the maximum document size to several hundred paragraphs. Please see the following link to learn more about limitations.

http://www.aspose.com/documentation/file-format-components/aspose.words-for-.net-and-java/evaluate-aspose-words.html

If you want to test Aspose.Words without evaluation version limitations, you can also request a 30-Day Temporary License. See the following link.

http://www.aspose.com/corporate/temporary-license.aspx

Best regards.

Ok. With the temporary license it works fine.

Now there is one more open point. How can I copy the page layout (e.g. landscape) , the pagestyle and the background image to the splitted documents.

Thanks.

Thomas

Hi<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" />

Thanks for your request. Yes, of course you can achieve this. Please try using the following code:

//Open document

Document doc = new Document("in.doc");

//Get collection of Paragraphs

NodeCollection paragraphs = doc.getChildNodes(NodeType.PARAGRAPH, true);

Paragraph par = null;

int docIndex = 0;

//Loop through all paragraphs in the document

for (int parIndex = 0; parIndex < paragraphs.getCount(); parIndex++)

{

par = (Paragraph)paragraphs.get(parIndex);

//If Paragraph style = HEADING_1 then copy content to the new document

if (par.getParagraphFormat().getStyleIdentifier() == StyleIdentifier.HEADING_1)

{

//Create new document

Document outDoc = new Document();

//Remove sections from document

outDoc.removeAllChildren();

Node currentNode = par;

//import section from src document without its children

Section srcSect = (Section)outDoc.importNode(currentNode.getAncestor(NodeType.SECTION), true, ImportFormatMode.KEEP_SOURCE_FORMATTING);

outDoc.appendChild(srcSect);

srcSect.getBody().removeAllChildren();

while (currentNode != null)

{

//Import Node

Node importedNode = outDoc.importNode(currentNode, true, ImportFormatMode.KEEP_SOURCE_FORMATTING);

//insert node into the new document

outDoc.getLastSection().getBody().appendChild(importedNode);

//If next node=null then move to the next section

if (currentNode.getNextSibling() == null)

{

//Get next section

Section currrentSection = (Section)currentNode.getAncestor(NodeType.SECTION).getNextSibling();

//If next section != null then get its first child

if (currrentSection != null)

{

Section newSect = (Section)outDoc.importNode(currrentSection, true, ImportFormatMode.KEEP_SOURCE_FORMATTING);

outDoc.appendChild(newSect);

newSect.getBody().removeAllChildren();

currentNode = currrentSection.getBody().getFirstChild();

}

else

{

break; //else exit from while

}

}

else

{

//Get next node

currentNode = currentNode.getNextSibling();

}

//Check if current node is paragraph

if (currentNode.getNodeType() == NodeType.PARAGRAPH)

{

//Check if its style is HEADING_1

if (((Paragraph)currentNode).getParagraphFormat().getStyleIdentifier() == StyleIdentifier.HEADING_1)

{

//If so then set par index and exit while

parIndex = paragraphs.indexOf(currentNode) - 1;

break;

}

}

}

//Save output document

outDoc.save("Section_" + String.valueOf(docIndex) + ".doc");

//increase docIndex

docIndex++;

}

}

I hope this could help you.

Best regards.

Thanks! That did it.

But there is another problem. At the beginning of each splitted document there is a Page Break. How can I remove this when creating the splits. Just remove the first Node/Paragraph Node?

//Thomas

Hi<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" />

Thanks for your inquiry. Maybe this occurs because there are page breaks between sections in your document. Could you attach your document or part of the document for testing? I will investigate this issue and try to help you.

Best regards.

Ok. I created a document that shows the problem.

When splitting the document using the above code, The first split has a carriage return at the beginning and the second document a Page Break.

Can this be solved?

Thanks.

Thomas Ospelt

Hi<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" />

Thank you for additional information. I think that you can try using the following code to resolve this problem.

//Get first Paragraph

Paragraph firstPar = outDoc.getFirstSection().getBody().getFirstParagraph();

//Remove PageBreaks in the first paragraph

for (int runIndex = 0; runIndex < firstPar.getRuns().getCount(); runIndex++)

{

firstPar.getRuns().get(runIndex).setText(firstPar.getRuns().get(runIndex).getText().replace("\f", ""));

}

//Save output document

outDoc.save("Section_" + String.valueOf(docIndex) + ".doc");

//increase docIndex

docIndex++;

Hope this helps.

Best regards.