File Split and Merge Help Needed!

Hi,

I am evaluating aspose.word for the server side .doc creation.

My current aim to to acheive following:

Scenario 1:

In a large word document there are some tags like and I need to find out similar kind of texts in the word file and take them out and replace with the “some file name.doc” contents. The file to be inserted could be of different format like “Portrait” or “Landscape”. So I need to maintain the formatting as well as orientation of origianl and inserted document.

Right now we are using client side merging using Word library by finding the mentioned text and then selecting from that location to end of file and cut/paste the content into some other document (temporary) with formatting. At the end of the chopped doc we append the file using some word OLE api and then at the end we merge the lower portion to maintain the orientaion and formatting. (We tried to insert in between but that does not maintain formatting and orientation in every case)

I was going through forum and tried to find some method so that I can split the existing document into 2 part and do above mentioned approach to save memory for very big files. There are some posts which talks about breaking the document at section level or using nodes but as in my case it could be any where in the middle and I want to break from that location.

Q1. Does Aspose has any method to break the fiile at anylocation in the file?

Q2. How I can acheive the mention task, if possible some blog/forum reference.

We are eagerly looking into server side options for mentioned document operations.

Scenario 2:

I was going through some samples and and tried the following:

Aspose.Word.Document objDocSource = new Document(@"c:\template\appendix.doc");
Aspose.Word.Document objDocTarget = new Document();
foreach (Aspose.Word.Section DocumentSection in objDocSource.Sections)
{
    // append all sections to the resulting document
    Aspose.Word.Section NewSection = (objDocTarget.ImportNode(DocumentSection, true) as Aspose.Word.Section);
    objDocTarget.Sections.Add(NewSection);
}
objDocTarget.Save(@"c:\template\final.doc", SaveFormat.FormatDocument);

Source file was with Ariel font but the target font has changed to newtimes roman.

Please let me know I am doing something wrong.

Thanks in Advance,

Sikandar Kumar

Hi,

Thank you for your interest in Aspose.Word.

  1. Aspose.Word API does not provide a special method to split the document so you should implement it manually. The best way is probably inserting a section break using DocumentBuilder and then creating new documents by copying sections. I think I will create a code sample for you shortly, maybe in the form of a Wiki article.

  2. You seem to do everything correct, simply try to use another overload of ImportNode which accepts ImportFormatMode. You need to specify ImportFormatMode.KeepSourceFormatting.

Hi Dmitry,

Thank you very much for considering file split and planning to post very soon.

As suggested I was trying to split at the location of .

To do so I did following.

  1. Found the first occurrence of the mentioned string and replaced with a dummy_string.

  2. Provided regular expression matching with any char to that dummy_string and replacing with blank

  3. Saving the replaced doc with new name.

But the above mentioned approach fails with following error
The match includes one or more special or break characters and cannot be replaced.
Description: An unhandled exception occurred during the execution of the current web request. Please review the stack trace for more information about the error and where it originated in the code.

Exception Details: System.NotSupportedException: The match includes one or more special or break characters and cannot be replaced.

Source Error:

Line 90:
Line 91: bFound = false;
Line 92: objDocSource.Range.Replace(new Regex(strPattern),new ReplaceEvaluator(ReplaceDummyString),false);
Line 93:
Line 94: objDocSource.Save(strFileName+"_tmp",SaveFormat.FormatDocument);

I have debugged the code and found that Aspose Document Range.Replace method does not allow special characters. But looking at the text of e.Match.Value, only new line characters are present.

Code used is:

private bool SplitFile(string strFileName)
{
    Aspose.Word.Document objDocSource = new Document(strFileName);
    bFound = false;
    strMatch = "";
    string strPattern = ".\*" + strDummy;
    objDocSource.Range.Replace(new Regex(@"=]\*/>"), new ReplaceEvaluator(ReplaceDummyString), false);
    objDocSource.Range.Replace(new Regex(strPattern), "");
    objDocSource.Save(strFileName + "\_tmp", SaveFormat.FormatDocument);
    return bFound;
}

private ReplaceAction ReplaceDummyString(object sender, ReplaceEvaluatorArgs e)
{
    if (!bFound)
    {
        strMatch = e.Match.Value;
        bFound = true;
        e.Replacement = strDummy;
        return ReplaceAction.Replace;
    }
    else
        return ReplaceAction.Stop;
}

Please help me to find a manual way to split file. I tried Range.Replace method and it is not working.

Thanks,

Sikandar

Wiki code would be appreciated!

Thanks to flexibility of the Aspose.Word object model, you can replace your custom tags with contents of other documents without splitting it first and doing other intermediate operations you described. The formatting and page orientation will be retained.

Due to this fact, I decided to write a code sample specially for you. It uses replace functionality which you tried; if you use the latest version of Aspose.Word, line break characters should be supported.

Note this is just a sample. Update this code in accordance to your requirements; for instance, the expected tags are currently (without quotation marks), the path to the documents is D:\Work etc.

Report us if something doesn’t work to you.

[TestFixture]
public class ExampleInsertContent
{
    [Test]
    public void TestIndertContent()
    {
        Document doc = TestUtil.Open(@"D:\Work\Main.doc");
        DoInsertion(doc);
        doc.Save(@"D:\Work\Result.doc");
    }

    public void DoInsertion(Document doc)
    {
        mDocument = doc;
        mBuilder.Document = doc;
        doc.Range.Replace(new Regex(@"\<include\sname=(?[^<>=]+)/\>"), new ReplaceEvaluator(HandleReplace), false);
    }

    private ReplaceAction HandleReplace(object sender, ReplaceEvaluatorArgs e)
    {
        e.Replacement = String.Empty;
        string srcDocName = System.IO.Path.Combine(@"D:\Work", e.Match.Result("${name}"));
        Document srcDoc = new Document(srcDocName);
        mBuilder.MoveTo(e.MatchNode);
        mBuilder.InsertBreak(BreakType.SectionBreakContinuous);
        int dstSectionIndex = mDocument.Sections.IndexOf(mBuilder.CurrentParagraph.ParentSection);
        for (int srcSectionIndex = 0; srcSectionIndex < srcDoc.Sections.Count; srcSectionIndex++)
        {
            Section srcSection = srcDoc.Sections[srcSectionIndex];
            Section newSection = mDocument.ImportNode(srcSection, true, ImportFormatMode.KeepSourceFormatting) as Section;
            mDocument.Sections.Insert(dstSectionIndex + srcSectionIndex, newSection);
        }
        return ReplaceAction.Replace;
    }
    private Document mDocument;
    private DocumentBuilder mBuilder = new DocumentBuilder();
}

Hi Dmitry,

Thank you for the provided code.

I tried the same and found following issues:

  1. It failed with Object Not Referenced Null exception after random number of Inserts. Mostly in 2,3 or in 4th Insert.

  2. Making Replace parameter to True make it fail in 2nd Run only.

  3. I thought it may be due to object creation at the beginning so I created a function who opens the source and finds just first occurrence of string and replace with to be inserted file content with newly created mBuilder and mDocument objects. (Although it bad approach but to check the null reference issue I did this)

This time it works fine till 4th or 5th Iteration and fails with following error.

Cannot insert the requested break outside of the main story.

Description: An unhandled exception occurred during the execution of the current web request. Please review the stack trace for more information about the error and where it originated in the code.

Exception Details: System.InvalidOperationException: Cannot insert the requested break outside of the main story.

Source Error:

Line 90: Document srcDoc = new Document(srcDocName);

Line 91: mBuilder.MoveTo(e.MatchNode);

Line 92: mBuilder.InsertBreak(BreakType.SectionBreakContinuous);

Line 93: int dstSectionIndex = mDocument.Sections.IndexOf(mBuilder.CurrentParagraph.ParentSection);

Line 94: for (int srcSectionIndex = 0; srcSectionIndex < srcDoc.Sections.Count; srcSectionIndex++)

Source File: c:\program files\aspose\aspose.word\demos\aspose.word.demos\doctest.aspx.cs Line: 92

I am using Latest ASPOSE.WORD component (with Trial license)

Due to confidentiality nature of the doc I am not attaching the 2 sample word doc. But Please let me know if u need that. Then we will try to scramble the content and send.

Details of the .docs are

appendix.doc is of 28KB with around 20 entries of tag.

insert.doc is of 205KB and with lot of tables. (Although it can have images also but my test doc is not having)

I am inserting insert.doc in appendix.doc for every placeholder of

Issue 2:

----------

I know that trial version inserts junk characters.

Purchasing the product is not an issue for our company but its difficult to convince the higher management to purchase it before they see actual results. I do not have any issues in providing contact point in our company.

So is there any way that ASPOSE provide temporary full license for around 2 weeks for demo purpose inside the company and not for commercial use. I assure that once we find that its doing file merge thing nicely for us, we will be the first to contact you guys.

Your quick response has been highly appreciated so far.

Please look into the source file what we are using for file merge

Thanks,

Sikandar Kumar

  1. It would be great if you attach your documents here. It would enable us to fully reproduce the issue and possibly modify the provided code so it works for you. Don’t worry about privacy, the attachments are only visible to you and Aspose, nobody else.

  2. Yes, it is possible. Please read this:
    https://docs.aspose.com/words/net/licensing/#temporary-license

Hi Dmitry ,

Here is the templates which I am using to merge. I have already attached the source Doctest.aspx.cs in previous posts.

I am opening appendix.doc and inserting content of insert.doc for every occurence of tag.

I was unable to upload the files on time because site was giving some error(even after getting reply from you that now i should be able to :(.

Thanks and Warm Regards,

Sikandar Kumar

I’m answering this:

Cannot insert the requested break outside of the main story.

Line 90: Document srcDoc = new Document(srcDocName);
Line 91: mBuilder.MoveTo(e.MatchNode);
Line 92: mBuilder.InsertBreak(BreakType.SectionBreakContinuous);

This error happens when you try to insert a section break, page break or a column break in a header or footer. This is not possible in a Word document. Such breaks can only be inserted in the main text of a section.

Apparently your search operation finds a match in a header or footer. You need to either avoid searching in headers/footers or detect this and don’t attempt inserting section breaks there.

If you only want to search main text, you can do Section.Body.Range.Replace.

If you want to detect if a matching node is inside a header or footer:

if (e.MatchNode.GetAncestor(typeof(Body)) != null)
{
    //The text is inside the main text, inserting section break is possible.
}