Aspose.Slides for .NET - Text Extraction from PPTX Superscript Is Not Working

We using aspose.slide in c# to extract text content from pptx and then user done some changes and save it again in pptx file.

What’s happening that aspose not able to extract superscript text from pptx. It make the superscript text as normal text and when it save back it save like a normal text. Let us know what to do so that aspose gives us the superscript text in superscript format only ?

@rajesh.agarawalatgmail.com,
Thank you for contacting support.

To investigate the case and help you, we need more details. Please share the following files and information:

  • sample presentation file
  • code example that reproduces the problem
  • output text or output presentation file
  • Aspose.Slides version you used

Test1.zip (234.4 KB)

Zip contain a pptx which giving problem.

We using aspose slides version 20.11.0.0.

aspose giving below text from slide:- (See below superscript 1 has been loss, its not coming as Union Plus1)

Subvención para Veteranos Union Plus1
Proporcionada por Union Plus®
Permite a veteranos miembros de un sindicato que califiquen presentar una solicitud ante Union Plus para recibir una Subvención para Veteranos de $1,000 para Préstamos Hipotecarios para la compra o el refinanciamiento de una nueva vivienda principal
Debe presentar la solicitud a Union Plus en el transcurso de los 120 días posteriores a la fecha de cierre del Préstamo Hipotecario Union Plus
Podrían aplicarse términos y restricciones
Los miembros pueden visitar dddddd.org/ddddddd (en inglés) para obtener más detalles

@rajesh.agarawalatgmail.com,
Thank you for the details. Could you please also share a code example you used to extract the text and share the expected result?

Expected result is its not giving 1 in Union Plus1 as superscript. It should retain superscript.

@rajesh.agarawalatgmail.com,
It looks like the text should be the following:

Union Plus¹

Could you please confirm?

With Aspose.Slides for .NET, you can extract text from presentations as follows

var text1 = autoShape.TextFrame.Text;

or

var text2 = autoShape.TextFrame.Paragraphs[0].Text;

or

var text3 = autoShape.TextFrame.Paragraphs[0].Portions[0].Text;

Do you use these methods?

We use below way:- (myppDoc is Presentation type)

Aspose.Slides.TextFrame[] tbales = (Aspose.Slides.TextFrame[])SlideUtil.GetAllTextFrames(myppDoc, false);
for (int i = 0; i < tbales.Length; i++)
{
    foreach (Paragraph para in tb[i].Paragraphs)
    {
        foreach (Portion port in para.Portions)
        {
            //Find text to be replaced

            if (port.Text.Contains(sbDataField.ToString()))
            //Replace exisitng text with the new text
            {

                string str = port.Text;



            }
        }
    }
}

@rajesh.agarawalatgmail.com,
Thank you for the code example. In my post above I suggested the correct output text for the case. Could you please confirm my option or provide the expected output text?

Yes, it should be like this.

@rajesh.agarawalatgmail.com,
Thank you for the additional information. I am working on the issue and will get back to you soon.

@rajesh.agarawalatgmail.com,
We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.

Issue ID(s): SLIDESNET-44437

You can obtain Paid Support Services if you need support on a priority basis, along with the direct access to our Paid Support management team.

@rajesh.agarawalatgmail.com,
Our developers have investigated the case. The escapement is the property that defines if a text portion is superscript or subscript. Therefore, you can get the text like this:

var presentation = new Presentation("test1.pptx");
var slide = presentation.Slides[0];
var shape = slide.Shapes[1] as IAutoShape;
var paragraph = shape.TextFrame.Paragraphs[0];

Console.OutputEncoding = System.Text.Encoding.Unicode;

foreach (var portion in paragraph.Portions)
{
    if (portion.PortionFormat.Escapement > 0) // superscript
    {
        Console.Write(ToSuperscript(portion.Text.ToArray()));
    }
    else
    {
        Console.Write(portion.Text);
    }
}
private static string ToSuperscript(char[] chars)
{
    const string superscript = 
        "\u2070\u00b9\u00b2\u00b3\u2074\u2075\u2076\u2077\u2078\u2079";

    char[] res = new char[chars.Length];
    Array.Copy(chars, res, chars.Length);

    for (int i = 0; i < res.Length; i++)
    {
        int idx = res[i] - '0';
        if (superscript.Length > idx)
        {
            res[i] = superscript[res[i] - '0'];
        }
    }

    return new string(res);
}

Output:

Subvención para Veteranos Union Plus¹

We hope this will help you.