I’m attempting to merge a variable number of word docs together and send to the client’s browser based on hyperlinks to these docs gleaned from yet another word doc. I basically am iterating through the fields grabbing the locations of the files that are linked to and adding them to an array if the extension is “.doc”. Now I’m a little confused as to how I would cycle through the array and use the section moving technique to combine the documents. Can anyone offer any advice? I am using VB/asp.net.
Dim refHyperLinks() As String
Dim i As Integer
i = 0
Dim fieldStarts As NodeList
fieldStarts = doc.SelectNodes("//FieldStart")
foreach fieldStart in fieldStarts
if fieldStart.FieldType = FieldType.FieldHyperlink then
’ The field is a hyperlink field, use the “facade” class to help to deal with the field.
Dim Hyperlink hyperlink As Hyperlink
hyperlink = New Hyperlink(fieldStart)
if Left(hyperlink.Target, 4) = “.doc” then
refHyperLinks(i) = hyperlink.Target
i = i + 1
end if
end if
next
If you have an array of the document file names, just iterate through the array, open the documents and append their sections to the destination document:
Public Function CombineDocuments(ByVal refHyperLinks() As String) As Document
Dim destinationDocument As Document = New Document()
Dim fileName As String
For Each fileName In refHyperLinks
Dim sourceDocument As Document = New Document(fileName)
AppendDocument(destinationDocument, sourceDocument)
Next
Return destinationDocument
End Function
Public Sub AppendDocument(ByVal destinationDocument As Document, ByVal sourceDocument As Document)
Dim sourceSection As Section
For Each sourceSection In sourceDocument.Sections
Dim NewSection As Section = CType(destinationDocument.ImportNode(sourceSection,True), Section)
destinationDocument.Sections.Add(NewSection)
Next
End Sub
Note however that at the moment documents cannot be opened using URI, only regular file path is allowed, so if the hyperlinks point to local files, simply remove the file:/// prefix; if the hyperlinks point to remote files, you should download them first using some of the .NET classes say WebClient.
This solution seems to fit perfectly, however I’m having an issue with the code I posted previously - I’m trying to convert it from the c# wiki on replacing hyperlinks (with some modifications of course). Right now it is in the following state:
Dim refHyperLinks() As String
Dim i As Integer
i = 0
Dim fieldBegin As fieldStart
Dim fieldStarts As NodeList
fieldStarts = doc.SelectNodes("//FieldStart")
for each fieldBegin in fieldStarts
if fieldBegin.FieldType = FieldType.FieldHyperlink then
’ The field is a hyperlink field, use the "facade" class to help to deal with the field.
Dim hyperlink As Hyperlink
**hyperlink = New Hyperlink(fieldBegin)**
if Left(hyperlink.Target, 4) = ".doc" then
refHyperLinks(i) = Server.MapPath(Replace(Replace(Replace(hyperlink.Target, "http://apps.laticrete.com/", ""), "AG2.0/", ""), "AG/", ""))
i = i + 1
end if
end if
next
The line that is bolded is throwing an error: Too many arguments to ‘Public Sub New()’.
I’m unsure as to how I would go about assigning the field to the hyperlink var. I’m thinking that this has something to do with vb assuming that I am talking about the asp hyperlink object rather than the Aspose.Word one?
No, I’m afraid not. It’s still giving me the Too many arguments to ‘Public Sub New()’. Error.
It seems to think that New is a subroutine… this is VB and I’m trying to convert from C# - because I don’t understand what they are trying to do in the example I can’t figure out how to convert to VB at this point in the code.
Currently it looks like:
Dim refHyperLinks() As String
Dim i As Integer
i = 0
Dim fieldBegin As fieldStart
Dim fieldStarts As NodeList
fieldStarts = doc.SelectNodes("//FieldStart")
for each fieldBegin in fieldStarts
if fieldBegin.FieldType = FieldType.FieldHyperlink then
’ The field is a hyperlink field, use the “facade” class to help to deal with the field.
Dim myHyperlink As Hyperlink
myHyperlink = New Hyperlink(fieldBegin)
if Left(myHyperlink.Target, 4) = “.doc” then
refHyperLinks(i) = Server.MapPath(Replace(Replace(Replace(myHyperlink.Target, “http://apps.laticrete.com/”, “”), “AG2.0/”, “”), “AG/”, “”))
i = i + 1
end if
end if
next
doc = Nothing
’ Combine documents in array together
CombineDocuments(refHyperLinks()).Save(“SubmittalPackage-ES-” & ES_No & “.doc”, SaveFormat.FormatDocument, SaveType.OpenInWord, Response)
You are trying to use the custom Hyperlink class, not the ASP .NET Hyperlink control so don’t forget to add its code.
If VS does not “see” your Hyperlink class or there are ambiguous references (VS is not sure what class Hyperlink identifier actually points to), just put it to a separate namespace (e.g. Namespace My) and use fully qualified names (Dim hyperlink As My.Hyperlink). Another approach is just renaming your Hyperlink class, not the name of the variable as you did (Dim hyperlink As RenamedHyperlink). Sorry if my suggestion was not clear enough.
Thank you! The code is working now, however I’m getting an error from one of the lines in the class you provided:
System.InvalidCastException: Specified cast is not valid.
On the line:
Dim fieldSeparator As FieldSeparator = CType(mFieldCode.NextSibling, FieldSeparator)
Is there a possibility that the field seperator is not immediately following the field code? These are standard external links created in MS Word (not aspose originally).
I’ve fixed the VB code in the example, however, it should throw anyway in your case unless the field separator immediately follows the field code. Please attach your document, we will test it.
I’ve attached a sample document. I tried your updated class and it threw the could not find field separator error. I looked at my doc in an ascii viewer and didn’t notice anything odd, perhaps you can discover more. All of these documents are essentially crafted in the same way, so hopefully I can modify the code to accept this type of hyperlink.
Thanks for the report Ken, the point is that the hyperlink field code and result may consist of more than one text run… We will fix the sample code shortly.
Thank you for your continued efforts. The new code seems to solve the original problem, however now I am receiving a ‘System.ArgumentOutOfRangeException: Ticks must be between DateTime.MinValue.Ticks and DateTime.MaxValue.Ticks. Parameter name: ticks’
error on the line:
I’ve seen this error before when datetime variables are unintialized and such (.net thinks the variable is set to prior to 1640 or so), however I am not using any date functions or types in the program as far as I know. Is there any reason you can think of why this exception would be thrown? I’ll continue to test.
The arabic character next to BinaryWriter has me confused. I’ve never seen a trace
like this before. I tried setting my system clock on the server
to the exact time with no change.
Oh! That makes sense. There are a variety of documents involved as the hyperlinks are being pulled from numerous word documents. It seems to happen with all documents regardless of the documents that are linked to, however I will attach the one that caused the initial error.
Below is the code that I’m using:
<%@Import NameSpace="Aspose.Word" %>
<%@Import NameSpace="System.Data" %>
<%@Import NameSpace="System.Data.SqlClient" %>
<script language="vb" runat="server">
' This "facade" class makes it easier to work with a hyperlink field in a Word document.
'
' A hyperlink is represented by a HYPERLINK field in a Word document. A field in Aspose.Word
' consists of several nodes and it might be difficult to work with all those nodes directly.
' Note this is a simple implementation and will work only if the hyperlink code and name
' each consist of one Run only.
'
' [FieldStart][Run - field code][FieldSeparator][Run - field result][FieldEnd]
'
' The field code contains a string in one of these formats:
' HYPERLINK "url"
' HYPERLINK \l "bookmark name"
'
' The field result contains text that is displayed to the user.
Friend Class Hyperlink
Friend Sub New(ByVal fieldStart As FieldStart)
If fieldStart Is Nothing Then
Throw New ArgumentNullException("fieldStart")
End If
If fieldStart.FieldType <> FieldType.FieldHyperlink Then
Throw New ArgumentException("Field start type must be FieldHyperlink.")
End If
mFieldStart = fieldStart
'Find field separator node.
mFieldSeparator = FindNextSibling(mFieldStart, NodeType.FieldSeparator)
If mFieldSeparator Is Nothing Then
Throw New Exception("Cannot find field separator.")
End If
'Find field end node. Normally field end will always be found, but in the example document
'there happens to be a paragraph break included in the hyperlink and this puts the field end
'in the next paragraph. It will be much more complicated to handle fields which span several
'paragraphs correctly, but in this case allowing field end to be null is enough for our purposes.
mFieldEnd = FindNextSibling(mFieldSeparator, NodeType.FieldEnd)
'Field code looks something like [ HYPERLINK "http:\\www.myurl.com" ], but it can consist of several runs.
Dim fieldCode As String = GetTextSameParent(mFieldStart.NextSibling, mFieldSeparator)
Dim match As Match = gRegex.Match(fieldCode.Trim())
mIsLocal = (match.Groups(1).Length > 0) 'The link is local if \l is present in the field code.
mTarget = match.Groups(2).Value
End Sub
' Gets or sets the display name of the hyperlink.
Friend Property Name() As String
Get
Return GetTextSameParent(mFieldSeparator, mFieldEnd)
End Get
Set(ByVal Value As String)
'Hyperlink display name is stored in the field result which is a Run
'node between field separator and field end.
Dim fieldResult As Run = CType(mFieldSeparator.NextSibling, Run)
fieldResult.Text = Value
'But sometimes the field result can consist of more than one run, delete these runs.
RemoveSameParent(fieldResult.NextSibling, mFieldEnd)
End Set
End Property
' Gets or sets the target url or bookmark name of the hyperlink.
Friend Property Target() As String
Get
Return mTarget
End Get
Set(ByVal Value As String)
mTarget = Value
UpdateFieldCode()
End Set
End Property
' True if the hyperlink's target is a bookmark inside the document. False if the hyperlink is a url.
Friend Property IsLocal() As Boolean
Get
Return mIsLocal
End Get
Set(ByVal Value As Boolean)
mIsLocal = Value
UpdateFieldCode()
End Set
End Property
Private Sub UpdateFieldCode()
'Field code is stored in a Run node between field start and field separator.
Dim fieldCode As Run = CType(mFieldStart.NextSibling, Run)
Dim sb As StringBuilder = New StringBuilder
sb.Append("HYPERLINK ")
If mIsLocal Then
sb.Append("\l ")
End If
sb.Append("""")
sb.Append(mTarget)
sb.Append("""")
fieldCode.Text = sb.ToString
'But sometimes the field code can consist of more than one run, delete these runs.
RemoveSameParent(fieldCode.NextSibling, mFieldSeparator)
End Sub
' Goes through siblings starting from the start node until it finds a node of the specified type or null.
Private Shared Function FindNextSibling(ByVal startNode As Node, ByVal nodeType As NodeType) As Node
Dim node As Node = startNode
While Not node Is Nothing
If node.NodeType = nodeType Then
Return node
End If
node = node.NextSibling
End While
Return Nothing
End Function
' Retrieves text from start up to but not including the end node.
Private Shared Function GetTextSameParent(ByVal startNode As Node, ByVal endNode As Node) As String
If Not endNode Is Nothing Then
If Not startNode.ParentNode Is endNode.ParentNode Then
Throw New ArgumentException("Start and end nodes are expected to have the same parent.")
End If
End If
Dim builder As StringBuilder = New StringBuilder
Dim child As Node = startNode
While Not child Is endNode
builder.Append(child.GetText())
child = child.NextSibling
End While
Return builder.ToString()
End Function
' Removes nodes from start up to but not including the end node.
' Start and end are assumed to have the same parent.
Private Shared Sub RemoveSameParent(ByVal startNode As Node, ByVal endNode As Node)
If Not endNode Is Nothing Then
If Not startNode.ParentNode Is endNode.ParentNode Then
Throw New ArgumentException("Start and end nodes are expected to have the same parent.")
End If
End If
Dim curChild As Node = startNode
While Not curChild Is endNode
Dim nextChild As Node = curChild.NextSibling
curChild.Remove()
curChild = nextChild
End While
End Sub
Private mFieldStart As Node
Private mFieldSeparator As Node
Private mFieldEnd As Node
Private mTarget As String
Private mIsLocal As Boolean
Private Shared ReadOnly gRegex As Regex = New Regex("\S+\s+(?:""""\s+)?(\\\\l\s+)?""([^""]+)""")
End Class
Public Function CombineDocuments(ByVal refHyperLinks() As String) As Document
Dim destinationDocument As Document = New Document()
Dim fileName As String
For Each fileName In refHyperLinks
Dim sourceDocument As Document = New Document(fileName)
AppendDocument(destinationDocument, sourceDocument)
Next
Return destinationDocument
End Function
Public Sub AppendDocument(ByVal destinationDocument As Document, ByVal sourceDocument As Document)
Dim sourceSection As Section
For Each sourceSection In sourceDocument.Sections
Dim NewSection As Section = CType(destinationDocument.ImportNode(sourceSection,True), Section)
destinationDocument.Sections.Add(NewSection)
Next
End Sub
Sub Page\_Load(ByVal sender As System.Object, ByVal e As System.EventArgs)
Dim ES\_id As Integer
Dim ES\_Body As String
Dim ES\_No As String
if not isNothing(Request.QueryString("id")) then
ES\_id = Request.QueryString("id")
' Grab all pertinent information about the ES
Dim conn As SqlConnection
conn = New SqlConnection(\*CONNECTION STRING SNIPPED\*)
conn.Open()
Dim ESCommand As SqlCommand
ESCommand = conn.CreateCommand
ESCommand.CommandText = "SELECT id, ES\_Body, ES\_No FROM ES\_DATA WHERE id = " & ES\_id
Dim ESReader As SqlDataReader
ESReader = ESCommand.ExecuteReader()
if ESReader.Read() then
ES\_id = ESReader("id")
ES\_Body = ESReader("ES\_Body")
ES\_No = ESReader("ES\_No")
end if
' Insert Aspose API Code Here
Dim doc As Document
doc = New Document(Server.MapPath("ES/" & ES\_Body))
' Next add in the various datasheets and such
Dim refHyperLinks(100) As String
Dim i As Integer
i = 0
Dim fieldBegin As fieldStart
Dim fieldStarts As NodeList
fieldStarts = doc.SelectNodes("//FieldStart")
for each fieldBegin in fieldStarts
if fieldBegin.FieldType = FieldType.FieldHyperlink then
' The field is a hyperlink field, use the "facade" class to help to deal with the field.
Dim hyperlink As Hyperlink
hyperlink = New Hyperlink(fieldBegin)
if Right(hyperlink.Target, 4) = ".doc" then
refHyperLinks(i) = Server.MapPath(RegEx.Replace(RegEx.Replace(RegEx.Replace(RegEx.Replace(hyperlink.Target, "http://apps.laticrete.com/", ""), "AG2.0/", ""), "AG/", ""), "%20", " "))
i = i + 1
end if
end if
next
doc = Nothing
ReDim Preserve refHyperLinks(i - 1)
' Create nodes and sections for each document in the array and combine together
CombineDocuments(refHyperLinks).Save("SubmittalPackage-ES-" & ES\_No & ".doc", SaveFormat.FormatDocument, SaveType.OpenInWord, Response)
else
response.clear()
response.redirect("auth.aspx")
response.end()
end if
End Sub
</script>
One thing I’ve noticed with your demo code - the resultant merged document included a blank page at the beginning. Is this a result of this line:
Dim destinationDocument As Document = New Document()
In the following code when there is no current destinationDocument?:
Public Function CombineDocuments(ByVal refHyperLinks() As String) As Document
Dim destinationDocument As Document = New Document()
Dim fileName As String
For Each fileName In refHyperLinks
Dim sourceDocument As Document = New Document(fileName)
AppendDocument(destinationDocument, sourceDocument)
Next
Return destinationDocument
End Function