Issue in Saving teh Hyperlink Present in Word Document back to the database using Aspose.Word

Rajesh123 · May 27, 2015, 9:14am

Hi Team,

We are reading text from the word document based on the paragraph style. Er are getting the text fine and we are saving that text to the database.

However if any hyperlink is present in that text then while saving to the database it is not fetching the exact text with hyperlink.

Please refer below for details.

Suppose the text present in the document is " Failure www.google.com is testing". ( here www.google.com is present as hyperlink).

While saving it to the database it getting the text as “Failure HYPERLINK “http://www.google.com” www.google.com is testing” and saved that to database.

Could you please help me in resolving this?

Please revert in case of any issue.
Regards,
Rajesh

muhammad.ijaz · May 28, 2015, 10:27am

Hi Rajesh,
Can you please elaborate your requirement? Do you want to extract the text as “Failure www.google.com is testing” or something else?
Best Regards,

Rajesh123 · May 29, 2015, 2:59am

Hi Muhammad,

Thanks for your reply.

Actually I am trying to get the text present between the two styles I defined in the word document.

Suppose I entered the text as “Failure www.google.com is testing” ( P.S- www.google.com is present as hyperlink).

While I am fetching the text I am getting as “Failure HYPERLINK “http://www.google.com” www.google.com is testing” instead of “Failure www.google.com is testing”.

I hope this may be helpful in clarifying your question.

Could you please help me on this?

Regards,
Rajesh

muhammad.ijaz · June 1, 2015, 7:44am

Hi Rajesh,

You can use the code from Replace or Modify Hyperlinks example to replace the hyperlinks with plain text and then use doc.Range.Text to get plain text.

Best Regards,

Rajesh123 · June 1, 2015, 8:31am

Hi Muhammad,

Thanks for looking in to this.

I have undergone the link shared by you. However I did not able to get the clue to resolve the issue in this scenario.

I am explaining the code that i am using so far. I hope this will help us in this case.

We are using the below code to retrieve the text ( let us assume faildesc in this scenario).

ArrayList failureDescFAArray = DAL.ParagraphsByStyleName(doc, "Failure FADescription");
ArrayList summaryFAAnalysis = DAL.ParagraphsByStyleName(doc, "Summary FAAnalysis");

ArrayList failureDescFANodes = DAL.ExtractContent((Node)failureDescFAArray[0], (Node)summaryFAAnalysis[0], false);
String failDesc = DAL.GenerateDocument(doc, failureDescFANodes).GetText();

Let us assume the in the document the for failure description the text present contains the hyperlink. ( let us assume the text present is “Failure www.google.com is testing”)

Please Note: www.google.com is present as hyperlink.

So by using the above code when we were retrieving the failDesc the value is coming like below and when the same value we were passing to database.

“Failure HYPERLINK “http://www.google.com” www.google.com is testing”.

However we need to get “Failure www.google.com is testing” as this is correct value.

I am confused on how to use the remove and replace hyperlink concept here to achieve the same as Shared by you.

Could you please help me on achieving the same?

Regards,
Rajesh

muhammad.ijaz · June 2, 2015, 6:42am

Hi Rajesh,
I do not see your complete code but what I have understood from your method names is that you are extracting paragraphs based on styles. You can use the following code to remove hyperlinks from the document or from any paragraph. I think UnlinkFieldsFromParagraph method is your actual requirement.

static void Main(string[] args)
{
    Document doc = new Document("ExtractText.docx");

    // string docTextWithLinks = doc.Range.Text;
    // UnlinkFieldsFromDocument(doc);
    // string docTextWithoutLinks = doc.Range.Text;
    foreach (Paragraph para in doc.FirstSection.Body.Paragraphs)
    {
        string paragraphWithLinks = para.Range.Text;
        UnlinkFieldsFromParagraph(para);
        string paragraphWithoutLinks = para.Range.Text;
    }
}
static void UnlinkFieldsFromDocument(Document doc)
{
    // Get collection of FieldStart nodes
    NodeCollection fieldStarts = doc.GetChildNodes(NodeType.FieldStart, true);
    // Get collection of FieldSeparator nodes
    NodeCollection fieldSeparators = doc.GetChildNodes(NodeType.FieldSeparator, true);
    // And get collection of FieldEnd nodes
    NodeCollection fieldEnds = doc.GetChildNodes(NodeType.FieldEnd, true);
    // Loop through all FieldStart nodes
    foreach (FieldStart start in fieldStarts)
    {
        // Search for FieldSeparator node. it is needed to remove field code from the document
        Node curNode = start;
        while (curNode.NodeType != NodeType.FieldSeparator && curNode.NodeType != NodeType.FieldEnd)
        {
            curNode = curNode.NextPreOrder(doc);
            if (curNode == null)
                break;
        }
        // Remove all nodes between Fieldstart and FieldSeparator (of FieldEnd, depending from field type)
        if (curNode != null)
        {
            RemoveSequence(start, curNode);
        }
    }
    // Now we can remove FieldStart, FieldSeparator and FieldEnd nodes
    fieldStarts.Clear();
    fieldSeparators.Clear();
    fieldEnds.Clear();
}
static void UnlinkFieldsFromParagraph(Paragraph para)
{
    // Get collection of FieldStart nodes
    NodeCollection fieldStarts = para.GetChildNodes(NodeType.FieldStart, true);
    // Get collection of FieldSeparator nodes
    NodeCollection fieldSeparators = para.GetChildNodes(NodeType.FieldSeparator, true);
    // And get collection of FieldEnd nodes
    NodeCollection fieldEnds = para.GetChildNodes(NodeType.FieldEnd, true);
    // Loop through all FieldStart nodes
    foreach (FieldStart start in fieldStarts)
    {
        // Search for FieldSeparator node. it is needed to remove field code from the document
        Node curNode = start;
        while (curNode.NodeType != NodeType.FieldSeparator && curNode.NodeType != NodeType.FieldEnd)
        {
            curNode = curNode.NextPreOrder(para);
            if (curNode == null)
                break;
        }
        // Remove all nodes between Fieldstart and FieldSeparator (of FieldEnd, depending from field type)
        if (curNode != null)
        {
            RemoveSequence(start, curNode);
        }
    }
    // Now we can remove FieldStart, FieldSeparator and FieldEnd nodes
    fieldStarts.Clear();
    fieldSeparators.Clear();
    fieldEnds.Clear();
}
/// 
/// Remove all nodes between start and end nodes, except start and end nodes
/// 
/// The start node
/// The end node
static void RemoveSequence(Node start, Node end)
{
    Node curNode = start.NextPreOrder(start.Document);
    while (curNode != null && !curNode.Equals(end))
    {
        // Move to next node
        Node nextNode = curNode.NextPreOrder(start.Document);
        // Check whether current contains end node
        if (curNode.IsComposite)
        {
            if (!(curNode as CompositeNode).GetChildNodes(NodeType.Any, true).Contains(end) &&
            !(curNode as CompositeNode).GetChildNodes(NodeType.Any, true).Contains(start))
            {
                nextNode = curNode.NextSibling;
                curNode.Remove();
            }
        }
        else
        {
            curNode.Remove();
        }
        curNode = nextNode;
    }
}

Please feel free to contact us in case you have further comments or questions.
Best Regards,

Rajesh123 · June 4, 2015, 11:17am

Hi Muhammad,

Thanks for looking in to this.

Your suggestion helped us in resolving the issue.

We have achieved the requirement by referring your given code snippet.

Thanks a lot.

Regards,
Rajesh

muhammad.ijaz · June 5, 2015, 7:57am

Hi Rajesh,
Thanks for the confirmation and good to know that the issue has been resolved.
Best Regards,