Merging variable number of word docs together

Hello Everyone,

I’m attempting to merge a variable number of word docs together and send to the client’s browser based on hyperlinks to these docs gleaned from yet another word doc. I basically am iterating through the fields grabbing the locations of the files that are linked to and adding them to an array if the extension is “.doc”. Now I’m a little confused as to how I would cycle through the array and use the section moving technique to combine the documents. Can anyone offer any advice? I am using VB/asp.net.

Dim refHyperLinks() As String
Dim i As Integer
i = 0
Dim fieldStarts As NodeList
fieldStarts = doc.SelectNodes("//FieldStart")

foreach fieldStart in fieldStarts
if fieldStart.FieldType = FieldType.FieldHyperlink then
’ The field is a hyperlink field, use the “facade” class to help to deal with the field.
Dim Hyperlink hyperlink As Hyperlink
hyperlink = New Hyperlink(fieldStart)
if Left(hyperlink.Target, 4) = “.doc” then
refHyperLinks(i) = hyperlink.Target
i = i + 1
end if
end if
next

Hi,

Thank you for considering Aspose.

If you have an array of the document file names, just iterate through the array, open the documents and append their sections to the destination document:

Public Function CombineDocuments(ByVal refHyperLinks() As String) As Document
Dim destinationDocument As Document = New Document() 

Dim fileName As String
For Each fileName In refHyperLinks
Dim sourceDocument As Document = New Document(fileName) 
AppendDocument(destinationDocument, sourceDocument)
Next

Return destinationDocument
End Function

Public Sub AppendDocument(ByVal destinationDocument As Document, ByVal sourceDocument As Document)
Dim sourceSection As Section
For Each sourceSection In sourceDocument.Sections
Dim NewSection As Section = CType(destinationDocument.ImportNode(sourceSection,True), Section)
destinationDocument.Sections.Add(NewSection)
Next
End Sub

Note however that at the moment documents cannot be opened using URI, only regular file path is allowed, so if the hyperlinks point to local files, simply remove the file:/// prefix; if the hyperlinks point to remote files, you should download them first using some of the .NET classes say WebClient.

Thanks Dmitry!

This solution seems to fit perfectly, however I’m having an issue with the code I posted previously - I’m trying to convert it from the c# wiki on replacing hyperlinks (with some modifications of course). Right now it is in the following state:

Dim refHyperLinks() As String
Dim i As Integer
i = 0
Dim fieldBegin As fieldStart
Dim fieldStarts As NodeList
fieldStarts = doc.SelectNodes("//FieldStart")

for each fieldBegin in fieldStarts
if fieldBegin.FieldType = FieldType.FieldHyperlink then
’ The field is a hyperlink field, use the "facade" class to help to deal with the field.
Dim hyperlink As Hyperlink
**hyperlink = New Hyperlink(fieldBegin)**
if Left(hyperlink.Target, 4) = ".doc" then
refHyperLinks(i) = Server.MapPath(Replace(Replace(Replace(hyperlink.Target, "http://apps.laticrete.com/", ""), "AG2.0/", ""), "AG/", ""))
i = i + 1
end if
end if
next 

The line that is bolded is throwing an error: Too many arguments to ‘Public Sub New()’.

I’m unsure as to how I would go about assigning the field to the hyperlink var. I’m thinking that this has something to do with vb assuming that I am talking about the asp hyperlink object rather than the Aspose.Word one?

Thanks for the help!
Ken

Just try to rename the Hyperlink class to something like say MyHyperlink. Will it work?

Dmitry,

No, I’m afraid not. It’s still giving me the Too many arguments to ‘Public Sub New()’. Error.

It seems to think that New is a subroutine… this is VB and I’m trying to convert from C# - because I don’t understand what they are trying to do in the example I can’t figure out how to convert to VB at this point in the code.

Currently it looks like:

Dim refHyperLinks() As String
Dim i As Integer
i = 0
Dim fieldBegin As fieldStart
Dim fieldStarts As NodeList
fieldStarts = doc.SelectNodes("//FieldStart")

for each fieldBegin in fieldStarts
if fieldBegin.FieldType = FieldType.FieldHyperlink then
’ The field is a hyperlink field, use the “facade” class to help to deal with the field.
Dim myHyperlink As Hyperlink
myHyperlink = New Hyperlink(fieldBegin)
if Left(myHyperlink.Target, 4) = “.doc” then
refHyperLinks(i) = Server.MapPath(Replace(Replace(Replace(myHyperlink.Target, “http://apps.laticrete.com/”, “”), “AG2.0/”, “”), “AG/”, “”))
i = i + 1
end if
end if
next
doc = Nothing

’ Combine documents in array together
CombineDocuments(refHyperLinks()).Save(“SubmittalPackage-ES-” & ES_No & “.doc”, SaveFormat.FormatDocument, SaveType.OpenInWord, Response)

Thanks for the help!

  1. I have added a VB .NET version of the Hyperlink example:

https://docs.aspose.com/words/net/find-and-replace/#customize-find-and-replace-operation

  1. You are trying to use the custom Hyperlink class, not the ASP .NET Hyperlink control so don’t forget to add its code.

  2. If VS does not “see” your Hyperlink class or there are ambiguous references (VS is not sure what class Hyperlink identifier actually points to), just put it to a separate namespace (e.g. Namespace My) and use fully qualified names (Dim hyperlink As My.Hyperlink). Another approach is just renaming your Hyperlink class, not the name of the variable as you did (Dim hyperlink As RenamedHyperlink). Sorry if my suggestion was not clear enough.

Hi Dmitry,

Thank you! The code is working now, however I’m getting an error from one of the lines in the class you provided:

System.InvalidCastException: Specified cast is not valid.

On the line:

Dim fieldSeparator As FieldSeparator = CType(mFieldCode.NextSibling, FieldSeparator)

Is there a possibility that the field seperator is not immediately following the field code? These are standard external links created in MS Word (not aspose originally).

Thanks for the help,
Ken

I’ve fixed the VB code in the example, however, it should throw anyway in your case unless the field separator immediately follows the field code. Please attach your document, we will test it.

Hi Dmitry,

I’ve attached a sample document. I tried your updated class and it threw the could not find field separator error. I looked at my doc in an ascii viewer and didn’t notice anything odd, perhaps you can discover more. All of these documents are essentially crafted in the same way, so hopefully I can modify the code to accept this type of hyperlink.

Thanks again!
Ken Tarwood

Thanks for the report Ken, the point is that the hyperlink field code and result may consist of more than one text run… We will fix the sample code shortly.

Thanks again!

Hi Ken,

We’ve updated the sample code:

https://docs.aspose.com/words/net/find-and-replace/#customize-find-and-replace-operation

Hi Dmitry,

Thank you for your continued efforts. The new code seems to solve the original problem, however now I am receiving a ‘System.ArgumentOutOfRangeException: Ticks must be between DateTime.MinValue.Ticks and DateTime.MaxValue.Ticks. Parameter name: ticks’
error on the line:

CombineDocuments(refHyperLinks).Save("SubmittalPackage-ES-" & ES\_No & ".doc", SaveFormat.FormatDocument, SaveType.OpenInWord, Response)

I’ve seen this error before when datetime variables are unintialized and such (.net thinks the variable is set to prior to 1640 or so), however I am not using any date functions or types in the program as far as I know. Is there any reason you can think of why this exception would be thrown? I’ll continue to test.

Thanks again,
Ken

Here is the stack trace from the above error:

[ArgumentOutOfRangeException: Ticks must be between DateTime.MinValue.Ticks and DateTime.MaxValue.Ticks.
Parameter name: ticks]
System.DateTime…ctor(Int64 ticks) +101
System.CurrentSystemTimeZone.ToLocalTime(DateTime time) +49
System.DateTime.ToLocalTime() +25
?.?.?(Document ?, BinaryWriter ?) +2631
?.?.?() +826
?.?.?() +308
?.?.Save(Document document, Stream stream, String fileName) +36
Aspose.Word.Document.Save(String fileName, SaveFormat fileFormat, SaveType saveType, HttpResponse response) +208
ASP.render\_sp\_aspx.Page\_Load(Object sender, EventArgs e) in F:\websites\AG2.0\render\_sp.aspx:231
System.Web.UI.Control.OnLoad(EventArgs e) +67
System.Web.UI.Control.LoadRecursive() +35
System.Web.UI.Page.ProcessRequestMain() +750

The arabic character next to BinaryWriter has me confused. I’ve never seen a trace
like this before. I tried setting my system clock on the server
to the exact time with no change.

The arabic characters are result of obfuscation. Could you please attach the document you are trying to save so I’d be able to reproduce the error?

Hi Dmitry,

Oh! That makes sense. There are a variety of documents involved as the hyperlinks are being pulled from numerous word documents. It seems to happen with all documents regardless of the documents that are linked to, however I will attach the one that caused the initial error.

Below is the code that I’m using:

 <%@Import NameSpace="Aspose.Word" %>
 <%@Import NameSpace="System.Data" %>
 <%@Import NameSpace="System.Data.SqlClient" %>
 
 <script language="vb" runat="server">
 ' This "facade" class makes it easier to work with a hyperlink field in a Word document. 
 ' 
 ' A hyperlink is represented by a HYPERLINK field in a Word document. A field in Aspose.Word 
 ' consists of several nodes and it might be difficult to work with all those nodes directly. 
 ' Note this is a simple implementation and will work only if the hyperlink code and name 
 ' each consist of one Run only.
 ' 
 ' [FieldStart][Run - field code][FieldSeparator][Run - field result][FieldEnd]
 ' 
 ' The field code contains a string in one of these formats:
 ' HYPERLINK "url"
 ' HYPERLINK \l "bookmark name"
 ' 
 ' The field result contains text that is displayed to the user.
 Friend Class Hyperlink
 Friend Sub New(ByVal fieldStart As FieldStart)
 If fieldStart Is Nothing Then
 Throw New ArgumentNullException("fieldStart")
 End If
 If fieldStart.FieldType <> FieldType.FieldHyperlink Then
 Throw New ArgumentException("Field start type must be FieldHyperlink.")
 End If
 
 mFieldStart = fieldStart
 
 'Find field separator node.
 mFieldSeparator = FindNextSibling(mFieldStart, NodeType.FieldSeparator)
 If mFieldSeparator Is Nothing Then
 Throw New Exception("Cannot find field separator.")
 End If
 
 'Find field end node. Normally field end will always be found, but in the example document 
 'there happens to be a paragraph break included in the hyperlink and this puts the field end 
 'in the next paragraph. It will be much more complicated to handle fields which span several 
 'paragraphs correctly, but in this case allowing field end to be null is enough for our purposes.
 mFieldEnd = FindNextSibling(mFieldSeparator, NodeType.FieldEnd)
 
 'Field code looks something like [ HYPERLINK "http:\\www.myurl.com" ], but it can consist of several runs.
 Dim fieldCode As String = GetTextSameParent(mFieldStart.NextSibling, mFieldSeparator)
 Dim match As Match = gRegex.Match(fieldCode.Trim())
 mIsLocal = (match.Groups(1).Length > 0) 'The link is local if \l is present in the field code.
 mTarget = match.Groups(2).Value
 End Sub
 
 ' Gets or sets the display name of the hyperlink.
 Friend Property Name() As String
 Get
 Return GetTextSameParent(mFieldSeparator, mFieldEnd)
 End Get
 Set(ByVal Value As String)
 'Hyperlink display name is stored in the field result which is a Run 
 'node between field separator and field end.
 Dim fieldResult As Run = CType(mFieldSeparator.NextSibling, Run)
 fieldResult.Text = Value
 
 'But sometimes the field result can consist of more than one run, delete these runs.
 RemoveSameParent(fieldResult.NextSibling, mFieldEnd)
 End Set
 End Property
 
 ' Gets or sets the target url or bookmark name of the hyperlink.
 Friend Property Target() As String
 Get
 Return mTarget
 End Get
 Set(ByVal Value As String)
 mTarget = Value
 UpdateFieldCode()
 End Set
 End Property
 
 ' True if the hyperlink's target is a bookmark inside the document. False if the hyperlink is a url.
 Friend Property IsLocal() As Boolean
 Get
 Return mIsLocal
 End Get
 Set(ByVal Value As Boolean)
 mIsLocal = Value
 UpdateFieldCode()
 End Set
 End Property
 
 Private Sub UpdateFieldCode()
 'Field code is stored in a Run node between field start and field separator.
 Dim fieldCode As Run = CType(mFieldStart.NextSibling, Run)
 Dim sb As StringBuilder = New StringBuilder
 sb.Append("HYPERLINK ")
 If mIsLocal Then
 sb.Append("\l ")
 End If
 sb.Append("""")
 sb.Append(mTarget)
 sb.Append("""")
 fieldCode.Text = sb.ToString
 
 'But sometimes the field code can consist of more than one run, delete these runs.
 RemoveSameParent(fieldCode.NextSibling, mFieldSeparator)
 End Sub
 
 ' Goes through siblings starting from the start node until it finds a node of the specified type or null.
 Private Shared Function FindNextSibling(ByVal startNode As Node, ByVal nodeType As NodeType) As Node
 Dim node As Node = startNode
 
 While Not node Is Nothing
 If node.NodeType = nodeType Then
 Return node
 End If
 node = node.NextSibling
 End While
 Return Nothing
 End Function
 
 ' Retrieves text from start up to but not including the end node.
 Private Shared Function GetTextSameParent(ByVal startNode As Node, ByVal endNode As Node) As String
 If Not endNode Is Nothing Then
 If Not startNode.ParentNode Is endNode.ParentNode Then
 Throw New ArgumentException("Start and end nodes are expected to have the same parent.")
 End If
 End If
 
 Dim builder As StringBuilder = New StringBuilder
 Dim child As Node = startNode
 While Not child Is endNode
 builder.Append(child.GetText())
 child = child.NextSibling
 End While
 Return builder.ToString()
 End Function
 
 ' Removes nodes from start up to but not including the end node.
 ' Start and end are assumed to have the same parent.
 Private Shared Sub RemoveSameParent(ByVal startNode As Node, ByVal endNode As Node)
 If Not endNode Is Nothing Then
 If Not startNode.ParentNode Is endNode.ParentNode Then
 Throw New ArgumentException("Start and end nodes are expected to have the same parent.")
 End If
 End If
 
 Dim curChild As Node = startNode
 While Not curChild Is endNode
 Dim nextChild As Node = curChild.NextSibling
 curChild.Remove()
 curChild = nextChild
 End While
 End Sub
 
 Private mFieldStart As Node
 Private mFieldSeparator As Node
 Private mFieldEnd As Node
 Private mTarget As String
 Private mIsLocal As Boolean
 Private Shared ReadOnly gRegex As Regex = New Regex("\S+\s+(?:""""\s+)?(\\\\l\s+)?""([^""]+)""")
 End Class
 
 Public Function CombineDocuments(ByVal refHyperLinks() As String) As Document
 Dim destinationDocument As Document = New Document()
 Dim fileName As String
 For Each fileName In refHyperLinks
 Dim sourceDocument As Document = New Document(fileName)
 AppendDocument(destinationDocument, sourceDocument)
 Next
 Return destinationDocument
 End Function
 
 Public Sub AppendDocument(ByVal destinationDocument As Document, ByVal sourceDocument As Document)
 Dim sourceSection As Section
 For Each sourceSection In sourceDocument.Sections
 Dim NewSection As Section = CType(destinationDocument.ImportNode(sourceSection,True), Section)
 destinationDocument.Sections.Add(NewSection)
 Next
 End Sub
 
 Sub Page\_Load(ByVal sender As System.Object, ByVal e As System.EventArgs)
 Dim ES\_id As Integer
 Dim ES\_Body As String
 Dim ES\_No As String
 
 if not isNothing(Request.QueryString("id")) then
 ES\_id = Request.QueryString("id")
 ' Grab all pertinent information about the ES
 Dim conn As SqlConnection
 
 conn = New SqlConnection(\*CONNECTION STRING SNIPPED\*)
 conn.Open()
 
 Dim ESCommand As SqlCommand
 ESCommand = conn.CreateCommand
 ESCommand.CommandText = "SELECT id, ES\_Body, ES\_No FROM ES\_DATA WHERE id = " & ES\_id
 
 Dim ESReader As SqlDataReader
 
 ESReader = ESCommand.ExecuteReader()
 if ESReader.Read() then
 ES\_id = ESReader("id")
 ES\_Body = ESReader("ES\_Body")
 ES\_No = ESReader("ES\_No")
 end if
 
 ' Insert Aspose API Code Here
 Dim doc As Document 
 doc = New Document(Server.MapPath("ES/" & ES\_Body))
 
 ' Next add in the various datasheets and such
 Dim refHyperLinks(100) As String
 Dim i As Integer
 i = 0
 Dim fieldBegin As fieldStart
 Dim fieldStarts As NodeList
 fieldStarts = doc.SelectNodes("//FieldStart")
 
 for each fieldBegin in fieldStarts
 if fieldBegin.FieldType = FieldType.FieldHyperlink then
 ' The field is a hyperlink field, use the "facade" class to help to deal with the field.
 Dim hyperlink As Hyperlink 
 hyperlink = New Hyperlink(fieldBegin)
 if Right(hyperlink.Target, 4) = ".doc" then
 refHyperLinks(i) = Server.MapPath(RegEx.Replace(RegEx.Replace(RegEx.Replace(RegEx.Replace(hyperlink.Target, "http://apps.laticrete.com/", ""), "AG2.0/", ""), "AG/", ""), "%20", " "))
 i = i + 1
 end if
 end if
 next 
 doc = Nothing
 ReDim Preserve refHyperLinks(i - 1)
 
 ' Create nodes and sections for each document in the array and combine together
 CombineDocuments(refHyperLinks).Save("SubmittalPackage-ES-" & ES\_No & ".doc", SaveFormat.FormatDocument, SaveType.OpenInWord, Response)
 else
 response.clear()
 response.redirect("auth.aspx")
 response.end()
 end if
 End Sub
 </script>

Thanks for the posting, we will try to investigate the issue asap.

Hi,

We have released Aspose.Word 3.1.3.

  • Fixed a DateTime exception on document save.

https://docs.aspose.com/words/net/release-notes/

Dmitry,

You guys rule! Great support! Just in time for our launch as well - kudos!

Cheers,
Ken

Dmitry -

One thing I’ve noticed with your demo code - the resultant merged document included a blank page at the beginning. Is this a result of this line:

Dim destinationDocument As Document = New Document()

In the following code when there is no current destinationDocument?:

Public Function CombineDocuments(ByVal refHyperLinks() As String) As Document
Dim destinationDocument As Document = New Document()
Dim fileName As String
For Each fileName In refHyperLinks
Dim sourceDocument As Document = New Document(fileName)
AppendDocument(destinationDocument, sourceDocument)
Next
Return destinationDocument
End Function

Please advise - thanks!
Ken