Get paragraph text, exclude hyperlink text


#1

Hi,

I am need to parse all text in a document in chunks - perhaps paragraph-by-paragraph, but make sure I exclude hyperlink display text.

In other words, I want to do find-and-replace with my own code, in small chunks, but I want to make sure that hyperlink text is never changed.

How do I get paragraph text without hyperlink text? or, how can I identify hyperlink text within text objects.

The products I am interested in working with are Aspose.Words and Aspose.Slides.

Thanks!


#2

@JohnGrahamLT,

Regarding Aspose.Words, you can build logic on the following snippets to meet your requirement:

Document doc = new Document("E:\\temp\\in.docx");

// Code 1
foreach (Field field in doc.Range.Fields)
{
    if (field.Type == FieldType.FieldHyperlink)
    {
        FieldHyperlink link = (FieldHyperlink)field;
        Console.WriteLine("Hyperlink Text: " + link.DisplayResult);
    }
}

//// Code 2
//NodeCollection runNodes = doc.GetChildNodes(NodeType.Run, true);
//foreach (Run run in runNodes)
//{
//    foreach (Field field in doc.Range.Fields)
//    {
//        if (field.Type == FieldType.FieldHyperlink)
//        {
//            Node currentNode = field.Start;
//            bool isInside = false;
//            while (currentNode != field.End && !isInside)
//            {
//                if (currentNode.NodeType == NodeType.Run)
//                {
//                    if (currentNode.Equals(run))
//                    {
//                        isInside = true;
//                        if (currentNode.ToString(SaveFormat.Text).Equals(field.Result))
//                        {
//                            Console.WriteLine("Hyperlink Text: " + run.Text);
//                        }
//                    }
//                }

//                Node nextNode = currentNode.NextPreOrder(currentNode.Document);
//                currentNode = nextNode;
//            }                        
//        }
//    }
//}

#3

@JohnGrahamLT,

Please check following code snippet. This will help you to meet your requirements.

using (Presentation pres = new Presentation(path+“source.pptx”))
{
foreach (ISlide slide in pres.Slides)
{
foreach (IShape shape in slide.Shapes)
{
AutoShape autoShape = shape as AutoShape;
if (autoShape != null)
{
foreach (IParagraph paragraph in autoShape.TextFrame.Paragraphs)
{
IEnumerable noHyperlinks = paragraph.Portions
.Where(portion => portion.PortionFormat.HyperlinkClick == null);

foreach (IPortion portion in noHyperlinks)
{
portion.Text = portion.Text.ToUpper();
}
}
}
}
}

pres.Save(path+“Text.pptx”, SaveFormat.Pptx);
}