Missing table text using Aspose code

Hi There

I’m trying to retrieve all of the text contained in a Powerpoint presentation.
I’m using the code provided in the Aspose.Slides helpfile under the entry " Extract entire text from a presentation." However, I’m finding that this code is missing the contents of tables completely. We have a document with a word count of 4265, but using the code in the helpfile it only retrieves 553 words.

I added code to read text from tables but when I use that in conjunction with Aspose code, the word count is 4555, so there must be duplicates in there:

The code I’m using the read the paragraphs in Powerpoint files is:

int lastSlidePosition = oDoc.Slides.LastSlidePosition;

currentSlideNumber = 0;
for (int pos = 1; pos <= lastSlidePosition; pos++)
{
Slide sld = oDoc.GetSlideByPosition(pos);


//iterate all shapes

int shapesCount = sld.Shapes.Count;

for (int shpIdx = 0; shpIdx < shapesCount; shpIdx++)
{
Shape shp = sld.Shapes[shpIdx];
//Get the paragraphs from textholder or textframe

Paragraphs paras = null;
//Check if shape holds a textholder

if (shp.Placeholder != null && shp.IsTextHolder == true)
{
TextHolder thld = (TextHolder)shp.Placeholder;
paras = thld.Paragraphs;
}


else if (shp is GroupShape)
{
GroupShape gs = (GroupShape)shp;
if (gs is Table)
{

Table table = (Table)gs;
for (int i = 0; i < table.RowsNumber; i++)
{
for (int j = 0; j < table.ColumnsNumber; j++)
{
Cell cell = table.GetCell(j, i);
TextFrame cellTextFrame = cell.TextFrame;
if (cellTextFrame != null)
paras = cellTextFrame.Paragraphs;
}

}
}
}

else
{
if (shp.TextFrame != null)
{
paras = shp.TextFrame.Paragraphs;
}
}
//Parse paragraphs…



Thanks

Eric

Hi Eric,

Can you please provide the source PPT for investigation.

Hi Muhammad

The file is attached

Thanks

Eric

Eric, could you post the entire code for extracting text from PPT? I’m evaluating the Aspose product and need to extract text as part of the evaluation.

Hi Eric,

This code sample works correctly:

Presentation pres = new Presentation("d:\\ppt\\eric\\textppt.ppt");

for (int i = 1; i < pres.Slides.Count; i++)

{

Slide sld = pres.GetSlideByPosition(i);

Response.Write("

Slide No "+i.ToString()+"


");

foreach (Aspose.Slides.Shape shp in sld.Shapes)

{

if (shp.Placeholder != null && shp.IsTextHolder == true)

{

TextHolder thld = (TextHolder)shp.Placeholder;

Response.Write("Text Holder : " + thld.Text + "
");

}

else if (shp is Aspose.Slides.Rectangle)

{

Aspose.Slides.Rectangle rect=(Aspose.Slides.Rectangle)shp;

if (rect.TextFrame != null)

{

TextFrame tf = rect.TextFrame;

Response.Write("Text Frame : " + tf.Text + "
");

}

}

else if (shp is Aspose.Slides.GroupShape)

{

Aspose.Slides.GroupShape gshp=(Aspose.Slides.GroupShape)shp;

foreach(Aspose.Slides.Shape shp1 in gshp.Shapes)

if (shp1.Placeholder != null && shp1.IsTextHolder == true)

{

TextHolder thld = (TextHolder)shp1.Placeholder;

Response.Write("G Text Holder : " + thld.Text + "
");

}

else if (shp1 is Aspose.Slides.Rectangle)

{

Aspose.Slides.Rectangle rect = (Aspose.Slides.Rectangle)shp1;

if (rect.TextFrame != null)

{

TextFrame tf = rect.TextFrame;

Response.Write("G Text Frame : " + tf.Text + "
");

}

}

}

}

}

Hi Muhammad,

I’ll give it a try, thanks!

Eric