Failed to Retrieve Hyperlinks in C# in the Same Visual Order as Available in PPT File

Hi Team,

We have a need of reading PPT files which contains hyperlinks to documents and we want to retrieve all the hyperlinks and name of the hyperlinks available in PPT file in same order as given in PPT file. For this we are using below code snippet.

Presentation presentation = new Presentation(PPTFilePath);

ITextFrame[] textFramesSlideOne = SlideUtil.GetAllTextBoxes(presentation.Slides[index]);

//Loop through paragraphs in current TextFrame
foreach (Aspose.Slides.Paragraph para in textFramesSlideOne[index].Paragraphs)
{
    //Loop through portions in the current Paragraph
    foreach (Portion port in para.Portions)
    {
        // .. logic to extract links
    }
}

By using above option, we are able to get all the links and its name but not in same visual order as available in PPT. It is returned in random order.

We have also tried below function but it is not returning the links in proper order.

presentation.HyperlinkQueries.GetAnyHyperlinks()

Please note,

  • We are using ASPOSE libraries version 23.5 and the framework is dot net framework 4.8.
  • The hyperlinks can be anywhere in PPT file like Header, Footer, body, etc.

Could you please share insights on how to achieve this to get the hyperlinks with its name in exact same order from PPT file?

Thanks.

@lpappachen,
Thank you for describing the issue.

Please share the following:

  • input sample presentation file
  • your output list of the hyperlinks and their names produced by your code

We will then investigate the case and help you as soon as possible.

Attaching 2 sample PPT files with which we are having issues with.
I have also attached the output our code produced.

Note: Please consider below points and provide solution accordingly.

Our Expectation : We are trying to read the hyperlinks and their names in visual order .In below example we need to read the links in this order Link1,Link2,Link3,Link4,Link5,Link6,Link7,Link8,Link9.

Link1 Link2 Link3
Link4 Link5 Link6
Link7 Link8 Link9
  • But SlideUtil.GetAllTextBoxes(pptxPresentation.Slides[slideNo]) and pptxPresentation.HyperlinkQueries.GetAnyHyperlinks(); are resulting random order.

  • The attached files are just for your reference to debug the issue but we are having many issues w.r.t to the visual order in which links being retrieved. We could not share the actual files because of data sensitivity and when we try to replicate the same format and content in test (sample) files but we are loosing some format and styles.

  • Links can be anywhere in the slide in any number and we are using different layout /styles / regions in PPT.

  • Can you also share the information on in which order SlideUtil.GetAllTextBoxes() reads the links ? and are there any other constraints that we need to aware of ?

We really appreciate your quick response on this.

Thanks in advance.

QA testing links PPT.zip (176.6 KB)

@lpappachen,
Thank you for the details. I am working on the issue and will get back to you soon.

@lpappachen,
The SlideUtil.GetAllTextBoxes method processes the contents of the slides in the order of the objects on the slide. If you unzip the “QA testing links PPT.pptx” file and open the internal “ppt/slides/slide1.xml” file, you will see the same order of text boxes and hyperlinks accordingly.

You can also see the order of text boxes as follows:

using var presentation = new Presentation("QA testing links PPT.pptx");
var slide = presentation.Slides[0];

foreach (var shape in slide.Shapes)
{
    if (shape is IAutoShape autoShape)
    {
        Console.WriteLine(autoShape.Name);
    }
}

Output:

Footer Placeholder 1
Title 2
TextBox 3

Thanks for the information and is there a temporary work around to read the links and their names in the visual order for time being ?

@lpappachen,
You can try to parse the position of the text boxes and sort them according to your requirements.

Can you share the sample code ? you can take the attached PPT as a reference if needed.

Would it impact performance as we are dealing with lots of slides ?

@lpappachen,
I will prepare the sample code and will get back to you as soon as possible.

@lpappachen,
Please look at the following code snippets.

using var presentation = new Presentation("QA testing links PPT.pptx");

var slide = presentation.Slides[0];

// A collection for shapes with text to be sorted.
var textShapes = new List<IShape>();

foreach (var shape in slide.Shapes)
{
    if (shape is IAutoShape)
    {
        textShapes.Add(shape);
    }
}

textShapes.Sort(new ShapeComparer());
class ShapeComparer : IComparer<IShape>
{
    public int Compare(IShape shape1, IShape shape2)
    {
        // Who is higher, the earlier.
        if (shape1.Y < shape2.Y) return -1;

        // Who is lower, the later.
        if (shape1.Y > shape2.Y) return 1;

        // Who is to the left, the earlier.
        if (shape1.X < shape2.X) return -1;

        // Who is to the right, the later.
        if (shape1.X > shape2.X) return 1;

        // The shapes are in the same position.
        return 0;
    }
}

The code

foreach (var shape in textShapes)
{
    if (shape is IAutoShape autoShape)
    {
        Console.WriteLine(autoShape.Name);
    }
}

gives the following result:

TextBox 3
Title 2
Footer Placeholder 1

We are having 2 issues with above approach.

its very slow when we have multiple slides and on top of that we had to do other operations as well.

I was trying to convert the shape to text frame and extract the hyper link and name but no luck below is the code. please let me if you have another way to achieve this ? Really appreciate your help with this.

foreach (var shape in textShapes)
{
    if (shape is IAutoShape autoShape)
    {
        var textFrame = shape.HyperlinkClick;
                        
        Console.WriteLine(autoShape.Name);
    }
}

@lpappachen,
Could you please describe more details about the issue with the hyperlinks you mentioned above?

Apologies for not being clear in my previous response. We were able to use the code snippet below and able to get the response as you stated.

foreach (var shape in textShapes)
{
    if (shape is IAutoShape autoShape)
    {
        Console.WriteLine(autoShape.Name);
    }
} 

Output:

TextBox 3
Title 2
Footer Placeholder 1

However , after getting the above result we are not able to extract the hyperlinks and their names as in the visual order (like we are expecting). If you could share the sample code that would be great.

Note: I’m not aware of any pre defined methods that can help us to take shape / textshape as input and process them to get hyperlinks and their names. If we have any such , please let me know.

Thank you for your support.

@lpappachen,
Thank you for the details. I will get back to you as soon as possible.

@lpappachen,
Unfortunately, I can only offer the approach described above.

Can you give us an ETA on permanent solution for this ?so that we can plan our deliverables accordingly.

@lpappachen,
We have not logged any issue related to Aspose.Slides. Our library returns results according to the structure of the sample documents you provided. The task of obtaining hyperlinks in visual order is algorithmic. Please let us know if there is anything else we can help you with.

Can you explain what do you mean by algorithmic? Is it something aspose will not do and we need to do our own coding for it?

@lpappachen,
Yes, you should implement getting hyperlinks in visual order yourself.

that does not make sense. We use the aspose library for word. The below function was used and it works perfectly

Aspose.Words.Document asposeWordDocument = new Aspose.Words.Document(dataDir + agendaFileName);
NodeList fieldStartNodeList = asposeWordDocument.SelectNodes("//FieldStart");

foreach (FieldStart item in fieldStartNodeList)
{
    if (item.FieldType == Aspose.Words.Fields.FieldType.FieldHyperlink) 

We expect the same way methods worked for PPT. The method you gave us gives the visual order of text boxes but it does not extract the name and hyperlink within the text box. How can we create our own methods if the entire structure is dependent on aspose reading the file and extracting content i.e. text boxes and content within it?