Failed to Retrieve Hyperlinks in C# in the Same Visual Order as Available in PPT File

@lpappachen,
The SlideUtil.GetAllTextBoxes method processes the contents of the slides in the order of the objects on the slide. If you unzip the “QA testing links PPT.pptx” file and open the internal “ppt/slides/slide1.xml” file, you will see the same order of text boxes and hyperlinks accordingly.

You can also see the order of text boxes as follows:

using var presentation = new Presentation("QA testing links PPT.pptx");
var slide = presentation.Slides[0];

foreach (var shape in slide.Shapes)
{
    if (shape is IAutoShape autoShape)
    {
        Console.WriteLine(autoShape.Name);
    }
}

Output:

Footer Placeholder 1
Title 2
TextBox 3

Thanks for the information and is there a temporary work around to read the links and their names in the visual order for time being ?

@lpappachen,
You can try to parse the position of the text boxes and sort them according to your requirements.

Can you share the sample code ? you can take the attached PPT as a reference if needed.

Would it impact performance as we are dealing with lots of slides ?

@lpappachen,
I will prepare the sample code and will get back to you as soon as possible.

@lpappachen,
Please look at the following code snippets.

using var presentation = new Presentation("QA testing links PPT.pptx");

var slide = presentation.Slides[0];

// A collection for shapes with text to be sorted.
var textShapes = new List<IShape>();

foreach (var shape in slide.Shapes)
{
    if (shape is IAutoShape)
    {
        textShapes.Add(shape);
    }
}

textShapes.Sort(new ShapeComparer());
class ShapeComparer : IComparer<IShape>
{
    public int Compare(IShape shape1, IShape shape2)
    {
        // Who is higher, the earlier.
        if (shape1.Y < shape2.Y) return -1;

        // Who is lower, the later.
        if (shape1.Y > shape2.Y) return 1;

        // Who is to the left, the earlier.
        if (shape1.X < shape2.X) return -1;

        // Who is to the right, the later.
        if (shape1.X > shape2.X) return 1;

        // The shapes are in the same position.
        return 0;
    }
}

The code

foreach (var shape in textShapes)
{
    if (shape is IAutoShape autoShape)
    {
        Console.WriteLine(autoShape.Name);
    }
}

gives the following result:

TextBox 3
Title 2
Footer Placeholder 1

We are having 2 issues with above approach.

its very slow when we have multiple slides and on top of that we had to do other operations as well.

I was trying to convert the shape to text frame and extract the hyper link and name but no luck below is the code. please let me if you have another way to achieve this ? Really appreciate your help with this.

foreach (var shape in textShapes)
{
    if (shape is IAutoShape autoShape)
    {
        var textFrame = shape.HyperlinkClick;
                        
        Console.WriteLine(autoShape.Name);
    }
}

@lpappachen,
Could you please describe more details about the issue with the hyperlinks you mentioned above?

Apologies for not being clear in my previous response. We were able to use the code snippet below and able to get the response as you stated.

foreach (var shape in textShapes)
{
    if (shape is IAutoShape autoShape)
    {
        Console.WriteLine(autoShape.Name);
    }
} 

Output:

TextBox 3
Title 2
Footer Placeholder 1

However , after getting the above result we are not able to extract the hyperlinks and their names as in the visual order (like we are expecting). If you could share the sample code that would be great.

Note: I’m not aware of any pre defined methods that can help us to take shape / textshape as input and process them to get hyperlinks and their names. If we have any such , please let me know.

Thank you for your support.

@lpappachen,
Thank you for the details. I will get back to you as soon as possible.

@lpappachen,
Unfortunately, I can only offer the approach described above.

Can you give us an ETA on permanent solution for this ?so that we can plan our deliverables accordingly.

@lpappachen,
We have not logged any issue related to Aspose.Slides. Our library returns results according to the structure of the sample documents you provided. The task of obtaining hyperlinks in visual order is algorithmic. Please let us know if there is anything else we can help you with.

Can you explain what do you mean by algorithmic? Is it something aspose will not do and we need to do our own coding for it?

@lpappachen,
Yes, you should implement getting hyperlinks in visual order yourself.

that does not make sense. We use the aspose library for word. The below function was used and it works perfectly

Aspose.Words.Document asposeWordDocument = new Aspose.Words.Document(dataDir + agendaFileName);
NodeList fieldStartNodeList = asposeWordDocument.SelectNodes("//FieldStart");

foreach (FieldStart item in fieldStartNodeList)
{
    if (item.FieldType == Aspose.Words.Fields.FieldType.FieldHyperlink) 

We expect the same way methods worked for PPT. The method you gave us gives the visual order of text boxes but it does not extract the name and hyperlink within the text box. How can we create our own methods if the entire structure is dependent on aspose reading the file and extracting content i.e. text boxes and content within it?

@lpappachen,
You can get text and hyperlinks from text portions like this:

foreach (var paragraph in autoShape.TextFrame.Paragraphs)
{
    foreach (var portion in paragraph.Portions)
    {
        var text = portion.Text;
        var url = portion.PortionFormat.HyperlinkClick.ExternalUrl;

        //...
    }
}

Documents: Manage Hyperlinks

Please let us know if you have some difficulties with this.

The above code is working fine for few scenarios ,how ever its not working for few special cases (attaching a sample document for one of such) like when we have combination of text regions and content.
Result.zip (93.2 KB)

below is the full code we are using and the attaching the output screenshot as well.

Please suggest what can we do here to retrieve the links and names in visual order .

public static Dictionary<string, string> ExtractHTMlLink(string PPTFilePath)
{
    Presentation pptxPresentation = new Presentation(PPTFilePath);

    Dictionary<string, string> HypLinks = new Dictionary<string, string>();

    for (int slideNo = 0; slideNo < pptxPresentation.Slides.Count; slideNo++)
    {
        var slide = pptxPresentation.Slides[slideNo];

        // A collection for shapes with text to be sorted.
        var textShapes = new List<IShape>();

        foreach (var shape in slide.Shapes)
        {
            if (shape is IAutoShape)
            {
                textShapes.Add(shape);
            }
        }

        textShapes.Sort(new ShapeComparer());
        textShapes.Reverse();

        foreach (var shape in textShapes)
        {
            if (shape is IAutoShape autoShape)
            {
                foreach (var paragraph in autoShape.TextFrame.Paragraphs)
                {
                    foreach (var port in paragraph.Portions)
                    {
                        if (port.PortionFormat.AsIHyperlinkContainer.HyperlinkClick != null
                            && port.PortionFormat.AsIHyperlinkContainer.HyperlinkClick.ExternalUrl != null)
                        {
                            {
                                if (!string.IsNullOrEmpty(port.Text.Trim()))
                                {
                                    HypLinks.Add(port.Text?.Trim(), port.PortionFormat.AsIHyperlinkContainer.HyperlinkClick.ExternalUrl);
                                }
                            }
                        }
                    }
                }
            }
        }
    }

    return HypLinks;
}

@lpappachen,
Thank you for the case description. I am working on the issue and will get back to you soon.

@lpappachen,
Please try using the following code examples:

using var presentation = new Presentation("QA testing links PPT.pptx");

// Collect all hyperlinks in shapes.
var allHyperlinks = new List<Tuple<IShape, IPortion>>();
foreach (var slide in presentation.Slides)
{
    var slideHyperlinks = ExtractHyperlinks(slide);
    slideHyperlinks.Sort(new PortionComparer());
    allHyperlinks.AddRange(slideHyperlinks);
}

// The sorted list of portions with hyperlinks.
var portionList = allHyperlinks.Select(tuple => tuple.Item2).ToList();

// Print all hyperlinks.
foreach (var portion in portionList)
{
    var text = portion.Text.Trim();
    var url = portion.PortionFormat.HyperlinkClick.ExternalUrl;

    Console.WriteLine("Text: " + text);
    Console.WriteLine("URL: " + url + "\r\n");
}
private static List<Tuple<IShape, IPortion>> ExtractHyperlinks(ISlide slide)
{
    var hyperlinks = new List<Tuple<IShape, IPortion>>();

    foreach (var shape in slide.Shapes)
    {
        if (shape is IAutoShape autoShape)
        {
            foreach (var paragraph in autoShape.TextFrame.Paragraphs)
            {
                foreach (var portion in paragraph.Portions)
                {
                    var hyperlink = portion.PortionFormat.HyperlinkClick;
                    if (hyperlink != null && hyperlink.ExternalUrl != null)
                    {
                        var text = portion.Text.Trim();
                        if (!string.IsNullOrEmpty(text))
                        {
                            hyperlinks.Add(Tuple.Create(shape, portion));
                        }
                    }
                }
            }
        }
    }

    return hyperlinks;
}
private class PortionComparer : IComparer<Tuple<IShape, IPortion>>
{
    public int Compare(Tuple<IShape, IPortion> pair1, Tuple<IShape, IPortion> pair2)
    {
        var rect1 = pair1.Item2.GetCoordinates();
        var rect2 = pair2.Item2.GetCoordinates();

        // Сoordinates of the first portion relative to the slide.
        var x1 = rect1.X + pair1.Item1.X;
        var y1 = rect1.Y + pair1.Item1.Y;

        // Сoordinates of the second portion relative to the slide.
        var x2 = rect2.X + pair2.Item1.X;
        var y2 = rect2.Y + pair2.Item1.Y;

        // Who is higher, the earlier.
        if (y1 < y2) return -1;

        // Who is lower, the later.
        if (y1 > y2) return 1;

        // Who is to the left, the earlier.
        if (x1 < x2) return -1;

        // Who is to the right, the later.
        if (x1 > x2) return 1;

        // The portions are in the same position.
        return 0;
    }
}

Output:

Text: Link1
URL: https://forum.aspose.com/t/read-hyperlinks/97484/2

Text: Link2
URL: https://forum.aspose.com/t/read-hyperlinks/97484/2

Text: Link3
URL: https://forum.aspose.com/t/read-hyperlinks/97484/2

Text: Link4
URL: https://forum.aspose.com/t/read-hyperlinks/97484/2

Text: Link5
URL: https://forum.aspose.com/t/read-hyperlinks/97484/2