Hello.
We need to split a Word document into different files. For each entry of the TOC there must be a seperate file. Can this be done with your framework?
We are using Java as Programming language.
Thanks a lot.
Hello.
We need to split a Word document into different files. For each entry of the TOC there must be a seperate file. Can this be done with your framework?
We are using Java as Programming language.
Thanks a lot.
Hi
Thanks for your request. Yes I think that you can achieve this using Aspose.Words. For example see the attached document and the following code:
//Open document
Document doc = new Document("in.doc");
//Get collection of Paragraphs
NodeCollection paragraphs = doc.getChildNodes(NodeType.PARAGRAPH, true);
Paragraph par = null;
int docIndex = 0;
//Loop through all paragraphs in the document
for (int parIndex = 0; parIndex < paragraphs.getCount(); parIndex++)
{
par = (Paragraph)paragraphs.get(parIndex);
//If Paragraph style = HEADING_1 then copy content to the new document
if (par.getParagraphFormat().getStyleIdentifier() == StyleIdentifier.HEADING_1)
{
//Create new document
Document outDoc = new Document();
Node currentNode = par;
while (currentNode != null)
{
//Import Node
Node importedNode = outDoc.importNode(currentNode, true, ImportFormatMode.KEEP_SOURCE_FORMATTING);
//insert node into the new document
outDoc.getFirstSection().getBody().appendChild(importedNode);
//If next node=null then move to the next section
if (currentNode.getNextSibling() == null)
{
//Get next section
Section currrentSection = (Section)currentNode.getAncestor(NodeType.SECTION).getNextSibling();
//If next section != null then get its first child
if (currrentSection != null)
currentNode = currrentSection.getBody().getFirstChild();
else
break; //else exit from while
}
else
{
//Get next node
currentNode = currentNode.getNextSibling();
}
//Check if current node is paragraph
if (currentNode.getNodeType() == NodeType.PARAGRAPH)
{
//Check if its style is HEADING_1
if (((Paragraph)currentNode).getParagraphFormat().getStyleIdentifier() == StyleIdentifier.HEADING_1)
{
//If so then set par index and exit while
parIndex = paragraphs.indexOf(currentNode) - 1;
break;
}
}
}
//Save output document
outDoc.save("Section_" + String.valueOf(docIndex) + ".doc");
//increase docIndex
docIndex++;
}
}
I hope this could help you.
Best regards.
Hi.
Thanks for the fast reply. I tried the above code and it works fine for small documents. We have a very large document (about 130 MB) that has to be split in several documents. When I load that document it seems that only the first part of the document is parsed. When I copy a part of the document into a new Word doc and parse this one, it is working fine again.
Is this a restriction of the evaluation version?
Thanks.
Thomas
Hi
Thanks for your request. Aspose.Words in evaluation mode limits the maximum document size to several hundred paragraphs. Please see the following link to learn more about limitations.
https://docs.aspose.com/words/net/licensing/
If you want to test Aspose.Words without evaluation version limitations, you can also request a 30-Day Temporary License. See the following link.
https://purchase.aspose.com/temporary-license
Best regards.
Ok. With the temporary license it works fine.
Now there is one more open point. How can I copy the page layout (e.g. landscape) , the pagestyle and the background image to the splitted documents.
Thanks.
Thomas
Hi
Thanks for your request. Yes, of course you can achieve this. Please try using the following code:
//Open document
Document doc = new Document("in.doc");
//Get collection of Paragraphs
NodeCollection paragraphs = doc.getChildNodes(NodeType.PARAGRAPH, true);
Paragraph par = null;
int docIndex = 0;
//Loop through all paragraphs in the document
for (int parIndex = 0; parIndex < paragraphs.getCount(); parIndex++)
{
par = (Paragraph)paragraphs.get(parIndex);
//If Paragraph style = HEADING_1 then copy content to the new document
if (par.getParagraphFormat().getStyleIdentifier() == StyleIdentifier.HEADING_1)
{
//Create new document
Document outDoc = new Document();
//Remove sections from document
outDoc.removeAllChildren();
Node currentNode = par;
//import section from src document without its children
Section srcSect = (Section)outDoc.importNode(currentNode.getAncestor(NodeType.SECTION), true, ImportFormatMode.KEEP_SOURCE_FORMATTING);
outDoc.appendChild(srcSect);
srcSect.getBody().removeAllChildren();
while (currentNode != null)
{
//Import Node
Node importedNode = outDoc.importNode(currentNode, true, ImportFormatMode.KEEP_SOURCE_FORMATTING);
//insert node into the new document
outDoc.getLastSection().getBody().appendChild(importedNode);
//If next node=null then move to the next section
if (currentNode.getNextSibling() == null)
{
//Get next section
Section currrentSection = (Section)currentNode.getAncestor(NodeType.SECTION).getNextSibling();
//If next section != null then get its first child
if (currrentSection != null)
{
Section newSect = (Section)outDoc.importNode(currrentSection, true, ImportFormatMode.KEEP_SOURCE_FORMATTING);
outDoc.appendChild(newSect);
newSect.getBody().removeAllChildren();
currentNode = currrentSection.getBody().getFirstChild();
}
else
{
break; //else exit from while
}
}
else
{
//Get next node
currentNode = currentNode.getNextSibling();
}
//Check if current node is paragraph
if (currentNode.getNodeType() == NodeType.PARAGRAPH)
{
//Check if its style is HEADING_1
if (((Paragraph)currentNode).getParagraphFormat().getStyleIdentifier() == StyleIdentifier.HEADING_1)
{
//If so then set par index and exit while
parIndex = paragraphs.indexOf(currentNode) - 1;
break;
}
}
}
//Save output document
outDoc.save("Section_" + String.valueOf(docIndex) + ".doc");
//increase docIndex
docIndex++;
}
}
I hope this could help you.
Best regards.
Thanks! That did it.
But there is another problem. At the beginning of each splitted document there is a Page Break. How can I remove this when creating the splits. Just remove the first Node/Paragraph Node?
//Thomas
Hi
Thanks for your inquiry. Maybe this occurs because there are page breaks between sections in your document. Could you attach your document or part of the document for testing? I will investigate this issue and try to help you.
Best regards.
Ok. I created a document that shows the problem.
When splitting the document using the above code, The first split has a carriage return at the beginning and the second document a Page Break.
Can this be solved?
Thanks.
Thomas Ospelt
Hi
Thank you for additional information. I think that you can try using the following code to resolve this problem.
//Get first Paragraph
Paragraph firstPar = outDoc.getFirstSection().getBody().getFirstParagraph();
//Remove PageBreaks in the first paragraph
for (int runIndex = 0; runIndex < firstPar.getRuns().getCount(); runIndex++)
{
firstPar.getRuns().get(runIndex).setText(firstPar.getRuns().get(runIndex).getText().replace("\f", ""));
}
//Save output document
outDoc.save("Section_" + String.valueOf(docIndex) + ".doc");
//increase docIndex
docIndex++;
Hope this helps.
Best regards.