Split an existing document

Hi,

I would like to split a document in several parts. First, I tried to do it by adding the sections of the original document to the new documents (like copy + paste). This did not work because Aspose.Word found only one section in the original file and I would need something more refined.

My second idea was to do it using the IDocumentVisitor interface. In the RunOfText function I get the text what to write in the new document and I also get a Font what to use. Unfortunately the Font property of the DocumentBuilder class is read-only, so I have to set separately each member of the Font class

m_builder.Font.Size = font.Size;
m_builder.Font.Bold = font.Bold;
m_builder.Font.Color = font.Color;
m_builder.Font.Name = font.Name;
m_builder.Font.Underline=font.Underline;
m_builder.Font.Scaling=font.Scaling;
m_builder.Font.Spacing=font.Spacing;
m_builder.Font.StyleIdentifier=font.StyleIdentifier;
m_builder.Font.StyleName=font.StyleName;

The result is not bad but it is still not good enough, because the text is not aligned as in the original (centered) and also the line spacing is not like the original document.

What do you advice to get a result more close to the original?

Thanks,

Attila

Hi Attila,

Thank you for considering Aspose.

You could probably make the builder a class member (the m_ prefix possibly shows that you’ve done this already) and set the properties of DocumentBuilder.ParagraphFormat in the IDocumentVisitor.ParagraphStart method accordingly to those of the paragraphFormat parameter being passed (just as you’re doing for DocumentBuilder.Font inside IDocumentVisitor.RunOfText) so that you set up the current paragraph properties including ParagraphFormat.Alignment.

However, I think using sections would be a simpler solution if the original document is preliminarily split into sections as appropriate. I’ve not fully understood why “Aspose.Word found only one section in the original file”. Is that because the original document should not contain predefined sections?

Hi,

Thank you for your answer.

After opening an existing word document, aspose.word found only one section in it. Reading your answer I have the feeling that it should find more. Perhaps it is my fault.

I tried it using the following code.

Aspose.Word.Document doc = new Aspose.Word.Document("d:\sample.doc");
Aspose.Word.Document newDoc = new Aspose.Word.Document();
// at this point doc.Sections.Count was always 1
while (doc.Sections.Count > 0)
{
    Aspose.Word.Section mySection = doc.Sections[0];
    doc.Sections.RemoveAt(0);
    newDoc.Sections.Add(mySection);
}
newDoc.Save("d:\newdoc.doc");

So do the source document actually have multiply sections, not the only one? Have you put there section breaks so that each section would become a separate document after running the code? If so, you can attach it to let us figure out the problem’s cause.

Hi DmitryV,

The source document was created with MSWord, and aspose.word finds one section in it which contains the whole document. I guess it is normal since the document was not created with aspose.word (however it would be extremely helpful for me if it could identify each paragraph, image object, table, etc. like a separate section)

I tired to improve the IDocumentVisitor approach of the problem. Encountered some problems

  1. If a paragraph is aligned to left in the source document, in the result document it appears to be aligned to right.

  2. If I insert a picture in the source document and resize it, in the result document it appears with its full size. I have the following code in my Visitor class:

public void Image(byte[] imageBytes)
{
    MemoryStream stream = new MemoryStream(imageBytes);
    System.Drawing.Image image = System.Drawing.Image.FromStream(stream);
    m_builder.InsertImage(image);
}

I guess I the solution would be to call the following function for inserting the image: public void InsertImage(Image,double,double,PictureFormat);

But I don’t know how to obtain the missing parameters (width and height of the image in the source document and pictureFormat)

I guess setting the PictureFormat would also make the image to appear exactly it appear on the source document. Now stuff like text wrapping makes it to look differently.

  1. How do I make the WordArt object from the source document to appear in the result document?

  2. If the height of a table row is given as 0.0 in the source document <public void RowStart(RowFormat rowFormat) function shows this>, DocumentBuilder uses a default value for it which is not the same like in the original text. How could I make it to have the same size?

Thank you,

Attila

I’m glad to report to you that forthcoming Aspose.Word v3 API provides new object model similar to XmlDocument where all necessary classes like Paragraph, Table, Run etc are exposed!

  1. Very strange. Please attach the source document and specify which para is problematic.

  2. We’ve just added this to v3 API.

  3. We’re working on it.

  4. Please try to set RowFormat.HeightRule = HeightRule.Exactly.