Merging variable number of word docs together


#1

Hello Everyone,

I’m attempting to merge a variable number of word docs together and send to the client’s browser based on hyperlinks to these docs gleaned from yet another word doc. I basically am iterating through the fields grabbing the locations of the files that are linked to and adding them to an array if the extension is “.doc”. Now I’m a little confused as to how I would cycle through the array and use the section moving technique to combine the documents. Can anyone offer any advice? I am using VB/asp.net.



Dim refHyperLinks() As String
Dim i As Integer
i = 0
Dim fieldStarts As NodeList
fieldStarts = doc.SelectNodes("//FieldStart")

foreach fieldStart in fieldStarts
if fieldStart.FieldType = FieldType.FieldHyperlink then
’ The field is a hyperlink field, use the “facade” class to help to deal with the field.
Dim Hyperlink hyperlink As Hyperlink
hyperlink = New Hyperlink(fieldStart)
if Left(hyperlink.Target, 4) = “.doc” then
refHyperLinks(i) = hyperlink.Target
i = i + 1
end if
end if
next

’ Create nodes and sections for each document in the array and combine together



#2

Hi,

Thank you for considering Aspose.

If you have an array of the document file names, just iterate through the array, open the documents and append their sections to the destination document:

Public Function CombineDocuments(ByVal refHyperLinks() As String) As Document
Dim destinationDocument As Document = New Document()

Dim fileName As String
For Each fileName In refHyperLinks
Dim sourceDocument As Document = New Document(fileName)
AppendDocument(destinationDocument, sourceDocument)
Next

Return destinationDocument
End Function

Public Sub AppendDocument(ByVal destinationDocument As Document, ByVal sourceDocument As Document)
Dim sourceSection As Section
For Each sourceSection In sourceDocument.Sections
Dim NewSection As Section = CType(destinationDocument.ImportNode(sourceSection,True), Section)
destinationDocument.Sections.Add(NewSection)
Next
End Sub

Note however that at the moment documents cannot be opened using URI, only regular file path is allowed, so if the hyperlinks point to local files, simply remove the file:/// prefix; if the hyperlinks point to remote files, you should download them first using some of the .NET classes say WebClient.


#3

Thanks Dmitry!

This solution seems to fit perfectly, however I’m having an issue with the code I posted previously - I’m trying to convert it from the c# wiki on replacing hyperlinks (with some modifications of course). Right now it is in the following state:

Dim refHyperLinks() As String
Dim i As Integer
i = 0
Dim fieldBegin As fieldStart
Dim fieldStarts As NodeList
fieldStarts = doc.SelectNodes("//FieldStart")

for each fieldBegin in fieldStarts
if fieldBegin.FieldType = FieldType.FieldHyperlink then
’ The field is a hyperlink field, use the “facade” class to help to deal with the field.
Dim hyperlink As Hyperlink
hyperlink = New Hyperlink(fieldBegin)
if Left(hyperlink.Target, 4) = “.doc” then
refHyperLinks(i) = Server.MapPath(Replace(Replace(Replace(hyperlink.Target, “http://apps.laticrete.com/”, “”), “AG2.0/”, “”), “AG/”, “”))
i = i + 1
end if
end if
next

The line that is bolded is throwing an error: Too many arguments to ‘Public Sub New()’.

I’m unsure as to how I would go about assigning the field to the hyperlink var. I’m thinking that this has something to do with vb assuming that I am talking about the asp hyperlink object rather than the Aspose.Word one?

Thanks for the help!
Ken


#4

Just try to rename the Hyperlink class to something like say MyHyperlink. Will it work?


#5

Dmitry,

No, I’m afraid not. It’s still giving me the Too many arguments to ‘Public Sub New()’. Error.

It seems to think that New is a subroutine… this is VB and I’m trying to convert from C# - because I don’t understand what they are trying to do in the example I can’t figure out how to convert to VB at this point in the code.

Currently it looks like:



Dim refHyperLinks() As String
Dim i As Integer
i = 0
Dim fieldBegin As fieldStart
Dim fieldStarts As NodeList
fieldStarts = doc.SelectNodes("//FieldStart")

for each fieldBegin in fieldStarts
if fieldBegin.FieldType = FieldType.FieldHyperlink then
’ The field is a hyperlink field, use the “facade” class to help to deal with the field.
Dim myHyperlink As Hyperlink
myHyperlink = New Hyperlink(fieldBegin)
if Left(myHyperlink.Target, 4) = “.doc” then
refHyperLinks(i) = Server.MapPath(Replace(Replace(Replace(myHyperlink.Target, “http://apps.laticrete.com/”, “”), “AG2.0/”, “”), “AG/”, “”))
i = i + 1
end if
end if
next
doc = Nothing

’ Combine documents in array together
CombineDocuments(refHyperLinks()).Save(“SubmittalPackage-ES-” & ES_No & “.doc”, SaveFormat.FormatDocument, SaveType.OpenInWord, Response)



Thanks for the help!


#6
  1. I have added a VB .NET version of the Hyperlink example:

    http://www.aspose.com/Wiki/default.aspx/Aspose.Word/ReplacingHyperlinksExample.html

    2. You are trying to use the custom Hyperlink class, not the ASP .NET Hyperlink control so don’t forget to add its code.

    3. If VS does not “see” your Hyperlink class or there are ambiguous references (VS is not sure what class Hyperlink identifier actually points to), just put it to a separate namespace (e.g. Namespace My) and use fully qualified names (Dim hyperlink As My.Hyperlink). Another approach is just renaming your Hyperlink class, not the name of the variable as you did (Dim hyperlink As RenamedHyperlink). Sorry if my suggestion was not clear enough.

#7

Hi Dmitry,

Thank you! The code is working now, however I’m getting an error from one of the lines in the class you provided:
System.InvalidCastException: Specified cast is not valid.

On the line:
Dim fieldSeparator As FieldSeparator = CType(mFieldCode.NextSibling, FieldSeparator)

Is there a possibility that the field seperator is not immediately following the field code? These are standard external links created in MS Word (not aspose originally).

Thanks for the help,
Ken


#8

I’ve fixed the VB code in the example, however, it should throw anyway in your case unless the field separator immediately follows the field code. Please attach your document, we will test it.


#9

Hi Dmitry,

I’ve attached a sample document. I tried your updated class and it threw the could not find field separator error. I looked at my doc in an ascii viewer and didn’t notice anything odd, perhaps you can discover more. All of these documents are essentially crafted in the same way, so hopefully I can modify the code to accept this type of hyperlink.

http://apps.laticrete.com/ES-B411_body.doc

Thanks again!
Ken Tarwood



#10

Thanks for the report Ken, the point is that the hyperlink field code and result may consist of more than one text run... We will fix the sample code shortly.


#11

Thanks again!


#12

Hi Ken,

We’ve updated the sample code:

http://www.aspose.com/Wiki/default.aspx/Aspose.Word/ReplacingHyperlinksExample.html


#13

Hi Dmitry,

Thank you for your continued efforts. The new code seems to solve the original problem, however now I am receiving a ‘System.ArgumentOutOfRangeException: Ticks must be between DateTime.MinValue.Ticks and DateTime.MaxValue.Ticks. Parameter name: ticks’
error on the line:
CombineDocuments(refHyperLinks).Save(“SubmittalPackage-ES-” & ES_No & “.doc”, SaveFormat.FormatDocument, SaveType.OpenInWord, Response)

I’ve seen this error before when datetime variables are unintialized and such (.net thinks the variable is set to prior to 1640 or so), however I am not using any date functions or types in the program as far as I know. Is there any reason you can think of why this exception would be thrown? I’ll continue to test.

Thanks again,
Ken


#14

Here is the stack trace from the above error:

[ArgumentOutOfRangeException: Ticks must be between DateTime.MinValue.Ticks and DateTime.MaxValue.Ticks.
Parameter name: ticks]
System.DateTime…ctor(Int64 ticks) +101
System.CurrentSystemTimeZone.ToLocalTime(DateTime time) +49
System.DateTime.ToLocalTime() +25
?.?.?(Document ?, BinaryWriter ?) +2631
?.?.?() +826
?.?.?() +308
?.?.Save(Document document, Stream stream, String fileName) +36
Aspose.Word.Document.Save(String fileName, SaveFormat fileFormat, SaveType saveType, HttpResponse response) +208
ASP.render_sp_aspx.Page_Load(Object sender, EventArgs e) in F:\websites\AG2.0\render_sp.aspx:231
System.Web.UI.Control.OnLoad(EventArgs e) +67
System.Web.UI.Control.LoadRecursive() +35
System.Web.UI.Page.ProcessRequestMain() +750

The arabic character next to BinaryWriter has me confused. I’ve never seen a trace
like this before. I tried setting my system clock on the server
to the exact time with no change.


#15

The arabic characters are result of obfuscation. Could you please attach the document you are trying to save so I'd be able to reproduce the error?


#16

Hi Dmitry,

Oh! That makes sense. There are a variety of documents involved as the hyperlinks are being pulled from numerous word documents. It seems to happen with all documents regardless of the documents that are linked to, however I will attach the one that caused the initial error.

Below is the code that I’m using:

<br /> <%@Import NameSpace="Aspose.Word" %><br /> <%@Import NameSpace="System.Data" %><br /> <%@Import NameSpace="System.Data.SqlClient" %><br /> <br /> <script language="vb" runat="server"><br /> ' This "facade" class makes it easier to work with a hyperlink field in a Word document. <br /> ' <br /> ' A hyperlink is represented by a HYPERLINK field in a Word document. A field in Aspose.Word <br /> ' consists of several nodes and it might be difficult to work with all those nodes directly. <br /> ' Note this is a simple implementation and will work only if the hyperlink code and name <br /> ' each consist of one Run only.<br /> ' <br /> ' [FieldStart][Run - field code][FieldSeparator][Run - field result][FieldEnd]<br /> ' <br /> ' The field code contains a string in one of these formats:<br /> ' HYPERLINK "url"<br /> ' HYPERLINK \l "bookmark name"<br /> ' <br /> ' The field result contains text that is displayed to the user.<br /> Friend Class Hyperlink<br /> Friend Sub New(ByVal fieldStart As FieldStart)<br /> If fieldStart Is Nothing Then<br /> Throw New ArgumentNullException("fieldStart")<br /> End If<br /> If fieldStart.FieldType <> FieldType.FieldHyperlink Then<br /> Throw New ArgumentException("Field start type must be FieldHyperlink.")<br /> End If<br /> <br /> mFieldStart = fieldStart<br /> <br /> 'Find field separator node.<br /> mFieldSeparator = FindNextSibling(mFieldStart, NodeType.FieldSeparator)<br /> If mFieldSeparator Is Nothing Then<br /> Throw New Exception("Cannot find field separator.")<br /> End If<br /> <br /> 'Find field end node. Normally field end will always be found, but in the example document <br /> 'there happens to be a paragraph break included in the hyperlink and this puts the field end <br /> 'in the next paragraph. It will be much more complicated to handle fields which span several <br /> 'paragraphs correctly, but in this case allowing field end to be null is enough for our purposes.<br /> mFieldEnd = FindNextSibling(mFieldSeparator, NodeType.FieldEnd)<br /> <br /> 'Field code looks something like [ HYPERLINK "http:\\www.myurl.com" ], but it can consist of several runs.<br /> Dim fieldCode As String = GetTextSameParent(mFieldStart.NextSibling, mFieldSeparator)<br /> Dim match As Match = gRegex.Match(fieldCode.Trim())<br /> mIsLocal = (match.Groups(1).Length > 0) 'The link is local if \l is present in the field code.<br /> mTarget = match.Groups(2).Value<br /> End Sub<br /> <br /> ' Gets or sets the display name of the hyperlink.<br /> Friend Property Name() As String<br /> Get<br /> Return GetTextSameParent(mFieldSeparator, mFieldEnd)<br /> End Get<br /> Set(ByVal Value As String)<br /> 'Hyperlink display name is stored in the field result which is a Run <br /> 'node between field separator and field end.<br /> Dim fieldResult As Run = CType(mFieldSeparator.NextSibling, Run)<br /> fieldResult.Text = Value<br /> <br /> 'But sometimes the field result can consist of more than one run, delete these runs.<br /> RemoveSameParent(fieldResult.NextSibling, mFieldEnd)<br /> End Set<br /> End Property<br /> <br /> ' Gets or sets the target url or bookmark name of the hyperlink.<br /> Friend Property Target() As String<br /> Get<br /> Return mTarget<br /> End Get<br /> Set(ByVal Value As String)<br /> mTarget = Value<br /> UpdateFieldCode()<br /> End Set<br /> End Property<br /> <br /> ' True if the hyperlink's target is a bookmark inside the document. False if the hyperlink is a url.<br /> Friend Property IsLocal() As Boolean<br /> Get<br /> Return mIsLocal<br /> End Get<br /> Set(ByVal Value As Boolean)<br /> mIsLocal = Value<br /> UpdateFieldCode()<br /> End Set<br /> End Property<br /> <br /> Private Sub UpdateFieldCode()<br /> 'Field code is stored in a Run node between field start and field separator.<br /> Dim fieldCode As Run = CType(mFieldStart.NextSibling, Run)<br /> Dim sb As StringBuilder = New StringBuilder<br /> sb.Append("HYPERLINK ")<br /> If mIsLocal Then<br /> sb.Append("\l ")<br /> End If<br /> sb.Append("""")<br /> sb.Append(mTarget)<br /> sb.Append("""")<br /> fieldCode.Text = sb.ToString<br /> <br /> 'But sometimes the field code can consist of more than one run, delete these runs.<br /> RemoveSameParent(fieldCode.NextSibling, mFieldSeparator)<br /> End Sub<br /> <br /> ' Goes through siblings starting from the start node until it finds a node of the specified type or null.<br /> Private Shared Function FindNextSibling(ByVal startNode As Node, ByVal nodeType As NodeType) As Node<br /> Dim node As Node = startNode<br /> <br /> While Not node Is Nothing<br /> If node.NodeType = nodeType Then<br /> Return node<br /> End If<br /> node = node.NextSibling<br /> End While<br /> Return Nothing<br /> End Function<br /> <br /> ' Retrieves text from start up to but not including the end node.<br /> Private Shared Function GetTextSameParent(ByVal startNode As Node, ByVal endNode As Node) As String<br /> If Not endNode Is Nothing Then<br /> If Not startNode.ParentNode Is endNode.ParentNode Then<br /> Throw New ArgumentException("Start and end nodes are expected to have the same parent.")<br /> End If<br /> End If<br /> <br /> Dim builder As StringBuilder = New StringBuilder<br /> Dim child As Node = startNode<br /> While Not child Is endNode<br /> builder.Append(child.GetText())<br /> child = child.NextSibling<br /> End While<br /> Return builder.ToString()<br /> End Function<br /> <br /> ' Removes nodes from start up to but not including the end node.<br /> ' Start and end are assumed to have the same parent.<br /> Private Shared Sub RemoveSameParent(ByVal startNode As Node, ByVal endNode As Node)<br /> If Not endNode Is Nothing Then<br /> If Not startNode.ParentNode Is endNode.ParentNode Then<br /> Throw New ArgumentException("Start and end nodes are expected to have the same parent.")<br /> End If<br /> End If<br /> <br /> Dim curChild As Node = startNode<br /> While Not curChild Is endNode<br /> Dim nextChild As Node = curChild.NextSibling<br /> curChild.Remove()<br /> curChild = nextChild<br /> End While<br /> End Sub<br /> <br /> Private mFieldStart As Node<br /> Private mFieldSeparator As Node<br /> Private mFieldEnd As Node<br /> Private mTarget As String<br /> Private mIsLocal As Boolean<br /> Private Shared ReadOnly gRegex As Regex = New Regex("\S+\s+(?:""""\s+)?(\\\\l\s+)?""([^""]+)""")<br /> End Class<br /> <br /> Public Function CombineDocuments(ByVal refHyperLinks() As String) As Document<br /> Dim destinationDocument As Document = New Document()<br /> Dim fileName As String<br /> For Each fileName In refHyperLinks<br /> Dim sourceDocument As Document = New Document(fileName)<br /> AppendDocument(destinationDocument, sourceDocument)<br /> Next<br /> Return destinationDocument<br /> End Function<br /> <br /> Public Sub AppendDocument(ByVal destinationDocument As Document, ByVal sourceDocument As Document)<br /> Dim sourceSection As Section<br /> For Each sourceSection In sourceDocument.Sections<br /> Dim NewSection As Section = CType(destinationDocument.ImportNode(sourceSection,True), Section)<br /> destinationDocument.Sections.Add(NewSection)<br /> Next<br /> End Sub<br /> <br /> Sub Page_Load(ByVal sender As System.Object, ByVal e As System.EventArgs)<br /> Dim ES_id As Integer<br /> Dim ES_Body As String<br /> Dim ES_No As String<br /> <br /> if not isNothing(Request.QueryString("id")) then<br /> ES_id = Request.QueryString("id")<br /> ' Grab all pertinent information about the ES<br /> Dim conn As SqlConnection<br /> <br /> conn = New SqlConnection(*CONNECTION STRING SNIPPED*)<br /> conn.Open()<br /> <br /> Dim ESCommand As SqlCommand<br /> ESCommand = conn.CreateCommand<br /> ESCommand.CommandText = "SELECT id, ES_Body, ES_No FROM ES_DATA WHERE id = " & ES_id<br /> <br /> Dim ESReader As SqlDataReader<br /> <br /> ESReader = ESCommand.ExecuteReader()<br /> if ESReader.Read() then<br /> ES_id = ESReader("id")<br /> ES_Body = ESReader("ES_Body")<br /> ES_No = ESReader("ES_No")<br /> end if<br /> <br /> ' Insert Aspose API Code Here<br /> Dim doc As Document <br /> doc = New Document(Server.MapPath("ES/" & ES_Body))<br /> <br /> ' Next add in the various datasheets and such<br /> Dim refHyperLinks(100) As String<br /> Dim i As Integer<br /> i = 0<br /> Dim fieldBegin As fieldStart<br /> Dim fieldStarts As NodeList<br /> fieldStarts = doc.SelectNodes("//FieldStart")<br /> <br /> for each fieldBegin in fieldStarts<br /> if fieldBegin.FieldType = FieldType.FieldHyperlink then<br /> ' The field is a hyperlink field, use the "facade" class to help to deal with the field.<br /> Dim hyperlink As Hyperlink <br /> hyperlink = New Hyperlink(fieldBegin)<br /> if Right(hyperlink.Target, 4) = ".doc" then<br /> refHyperLinks(i) = Server.MapPath(RegEx.Replace(RegEx.Replace(RegEx.Replace(RegEx.Replace(hyperlink.Target, "http://apps.laticrete.com/", ""), "AG2.0/", ""), "AG/", ""), "%20", " "))<br /> i = i + 1<br /> end if<br /> end if<br /> next <br /> doc = Nothing<br /> ReDim Preserve refHyperLinks(i - 1)<br /> <br /> ' Create nodes and sections for each document in the array and combine together<br /> CombineDocuments(refHyperLinks).Save("SubmittalPackage-ES-" & ES_No & ".doc", SaveFormat.FormatDocument, SaveType.OpenInWord, Response)<br /> else<br /> response.clear()<br /> response.redirect("auth.aspx")<br /> response.end()<br /> end if<br /> End Sub<br /> </script><br />


#17

Thanks for the posting, we will try to investigate the issue asap.


#18

Hi,

We have released Aspose.Word 3.1.3.

  • Fixed a DateTime exception on document save.
  • http://www.aspose.com/Blogs/Roman.Korchagin/


    #19

    Dmitry,

    You guys rule! Great support! Just in time for our launch as well - kudos!

    Cheers,
    Ken


    #20

    Dmitry -

    One thing I’ve noticed with your demo code - the resultant merged document included a blank page at the beginning. Is this a result of this line:

    Dim destinationDocument As Document = New Document()

    In the following code when there is no current destinationDocument?:
    <br /> Public Function CombineDocuments(ByVal refHyperLinks() As String) As Document<br /> Dim destinationDocument As Document = New Document()<br /> Dim fileName As String<br /> For Each fileName In refHyperLinks<br /> Dim sourceDocument As Document = New Document(fileName)<br /> AppendDocument(destinationDocument, sourceDocument)<br /> Next<br /> Return destinationDocument<br /> End Function<br />

    Please advise - thanks!
    Ken