Autodetect Hyperlinks

JK_Online · August 10, 2010, 10:18am

I am using aspose words to take RTF from a rich textbox, insert into a word document then convert to HTML for sending in an email.

The issue I have is that in the RTF box the URL’s are auto detected and become links. This is obviously not in the RTF markup so is not converted into the eventual HTML.

Is there a way while it is in the Aspose words document to auto link all URL’s ?

Thanks

alexey.noskov · August 10, 2010, 11:19am

Hi John,

Thanks for your request. You can use ReplacingCallback to replace plain text with hyperlinks in your document. I created a simple code example for you:

// Open the document.
Document doc = new Document(@"Test001\in.doc");
doc.Range.Replace(new Regex("http(s)?://([\\w-]+\\.)+[\\w-]+(/[\\w- ./?%&=]*)?"),
    new ReplaceWithHyperlinkEvaluator(), false);
// Save the modified document.
doc.Save(@"Test001\out.doc");

===============================================================

private class ReplaceWithHyperlinkEvaluator : IReplacingCallback
{
    public ReplaceAction Replacing(ReplacingArgs args)
    {
        // This is a Run node that contains either the beginning or the complete match.
        Node currentNode = args.MatchNode;
        // The first (and may be the only) run can contain text before the match,
        // in this case it is necessary to split the run.
        if (args.MatchOffset> 0)
            currentNode = SplitRun((Run) currentNode, args.MatchOffset);
        // This array is used to store all nodes of the match for further removing.
        ArrayList runs = new ArrayList();
        // Get url.
        string url = args.Match.Value;
        // Find all runs that contain parts of the match string.
        int remainingLength = args.Match.Value.Length;
        while ((remainingLength> 0) &&
            (currentNode != null) &&
            (currentNode.GetText().Length <= remainingLength))
        {
            runs.Add(currentNode);
            remainingLength = remainingLength - currentNode.GetText().Length;
            // Select the next Run node.
            // Have to loop because there could be other nodes such as BookmarkStart etc.
            do {
                currentNode = currentNode.NextSibling;
            }
            while ((currentNode != null) && (currentNode.NodeType != NodeType.Run));
        }
        // Split the last run that contains the match if there is any text left.
        if ((currentNode != null) && (remainingLength> 0))
        {
            SplitRun((Run) currentNode, remainingLength);
            runs.Add(currentNode);
        }
        // Create DocumentBuilder and move its cursor to the match node.
        DocumentBuilder builder = new DocumentBuilder((Document) args.MatchNode.Document);
        builder.MoveTo((Node) runs[0]);
        // Insert Hyperlink
        builder.Font.StyleIdentifier = StyleIdentifier.Hyperlink;
        builder.InsertHyperlink(url, url, false);
        // Now remove all runs in the sequence.
        foreach(Run run in runs)
        run.Remove();
        // Signal to the replace engine to do nothing because we have already done all what we wanted.
        return ReplaceAction.Skip;
    }
    ///
    /// Splits text of the specified run into two runs.
    /// Inserts the new run just after the specified run.
    ///
    private static Run SplitRun(Run run, int position)
    {
        Run afterRun = (Run) run.Clone(true);
        afterRun.Text = run.Text.Substring(position);
        run.Text = run.Text.Substring(0, position);
        run.ParentNode.InsertAfter(afterRun, run);
        return afterRun;
    }
}

Hope this helps. Please let me know if you need more assistance, I will be glad to help you.
Best regards.

JK_Online · August 11, 2010, 2:36am

In what namespace or dll are IReplacingCallback and ReplacingArgs. I am currently using aspose words 9.1 for .net (though will upgrade to the latest soon)

Thanks

alexey.noskov · August 11, 2010, 3:00am

Hi John,

There were some breaking changes in API in Aspose.Words 9.2.0. And IReplacingCallback replaced ReplaceEvaluator. You can learn more about breaking changes in Aspose.Words 9.2.0 release notes:
https://releases.aspose.com/words/net
I used the latest version of Aspose.Words (9.3.0) to create this code example:
https://reference.aspose.com/words/net/aspose.words.replacing/ireplacingcallback/
Best regards,

JK_Online · August 11, 2010, 4:05am

Thanks, I have got the C# sample working however when I converted it to VB.net the Regex was now invalid? This may be beyond the scope of support however do you have any ideas why?

JK_Online · August 11, 2010, 4:07am

Actually do not worry just spotted the double escape in the regex string.

All working now thank you for your help

alexey.noskov · August 11, 2010, 5:59am

Hi John,

It is perfect that you found the solution. Please let me know if you need more assistance, I will be glad to help you.
Best regards,