I have a document created in 2007 (but .doc format) which contains a table and a picture, and cross-references (links) to both.
After conversion via Aspose.Words, there are named anchors present in the table and picture markup, but the links to these items have lost their anchors and are now just text.
I have attached my .doc
The resulting HTML looks like this:
Picture
1
Link to
Picture
1
Is this a problem with Aspose.Words, or on my side?
Hello Laura!
Thank you for asking this.
I tried exporting the same document to HTML using MS Word. But these links were not preserved. Theoretically HTML links can be put on any elements and refer any other elements. That’s a question why MS Word developers didn’t manage to implement such links in their HTML export. But it’s also a consideration why we shouldn’t try implementing them. Maybe some workaround can be applied here if really needed.
Regards,
The problem with this approach is that it is not really a workaround we can perform as we have no control over the business users who create the Word documentsnd cannot hand correct the data. They will use standard word features like cross referneces to tables and images and expect them to work,
I would have thought that this would be a very popular feature for anyone exporting to HTML (where hyperlinking is tha tstandard form of navigation) from Word?
I dont quite understand what is so hard to do? You already create a
<a name element for the anchor (although sticking an ID on a
span/image/table would also be acceptable) all that we are asking for
is that the cross reference text (to a table or image or heading or
bookmark) is wrapped in a <a href element.
I take your point about the difficulty of supporting html IMPORT where any link/target can be on any element but this is not the case for EXPORT and export is the key feature that we need as most business users around the world author their content in Word.
Hi
Thanks for your inquiry. I created code for you. This code replace REF field with HYPERLINK fields. In this case cross-references work fine.
// Open document and Create DocumentBuilder
Document doc = new Document(@"Test107\in.doc");
DocumentBuilder builder = new DocumentBuilder(doc);
// Get collection of fieldStarts from the document
NodeCollection starts = doc.GetChildNodes(NodeType.FieldStart, true);
// Create Array list. We will store nodes that will be removed in this list
ArrayList removeList = new ArrayList();
// Loop though all field starts
foreach (FieldStart start in starts)
{
// Check whether current field start is start of REF field
if (start.FieldType == FieldType.FieldRef)
{
// We should get field code and field value
string fieldCode = string.Empty;
string fieldValue = string.Empty;
Node currentNode = start;
// Get Field code
while (currentNode.NodeType != NodeType.FieldSeparator)
{
if (currentNode.NodeType == NodeType.Run)
fieldCode += (currentNode as Run).Text;
removeList.Add(currentNode);
currentNode = currentNode.NextSibling;
}
// Get field value
while (currentNode.NodeType != NodeType.FieldEnd)
{
if (currentNode.NodeType == NodeType.Run)
fieldValue += (currentNode as Run).Text;
removeList.Add(currentNode);
currentNode = currentNode.NextSibling;
}
removeList.Add(currentNode);
// We should get ref name from field code
Regex regex = new Regex(@"\s*(?\S+)\s+(?\S+)\s+(?.+)");
Match match = regex.Match(fieldCode);
string refName = match.Groups["name"].Value;
// Now we should insert HYPERLINK instead REF field
// move DocumentBuilder cursor to Field Start
builder.MoveTo(start);
// Insert Hyperlink
builder.InsertHyperlink(fieldValue, refName, true);
}
}
// now we can remove REF fields
foreach (Node node in removeList)
{
node.Remove();
}
// Save document
doc.Save(@"Test107\out.html", SaveFormat.Html);
Thanks for this code - we can definitely use it in our application. It’s really appreciated!
Do you think this code will be included in a future version of Aspose.Words? We think that other users will find it useful, as adding cross-references to tables and pictures is a common thing to do.
Hi
Thanks for your request. I created new issue #6036 in our defect database. But I can’t promise you that this will be added in the next release.
Best regards.