Hi all,
We are currently evaluating Aspose to replace our current PPT extraction system. We found that Aspose is currently not able to extract text from graphs. We would like to know if this feature is on your roadmap and if so what is the timeframe when this would come.
Check out the attachment to see our example PPT where not all text gets extracted.
Hi Zac,
Thank you for your interest in Aspose.Slides.
I have observed your comments and like to share with you that the charts contain text in the form of series, categories and legends. I have created a code sample which shows text from categories of a chart, which is present on first slide. This is how the text from chart categories could be extracted.
Presentation pres = new Presentation(@"D:\16765.pptx");
foreach (IShape shp in pres.Slides[0].Shapes)
{
if (shp is Chart)
{
IChart chart = (IChart)shp;
IChartCategory category = null;
for (int i = 0; i < chart.ChartData.Categories.Count; i++)
{
category = chart.ChartData.Categories[i];
Console.WriteLine(category.Value.ToString());
}
break;
}
}
I hope this will clarify the concept. Please share if I may help you further in this regard.
Best Regards,
Muhmammad,
Thanks for the quick reply. We will test out your suggestion and let you know how it works.
-Zac
Hi again,
We tried your recommendation and found that it worked to grab about 50% of the text data. For the ppt attached in the first message we extracted the following text:
- Industry
- Commercial Buildings
- Residential Buildings
- Transportation
- Computers
- Cooking
- Electronics
- Wash
- Refrigeration
- Cooling
- Lights
- Water Heat
- Heating
- Cooking
- Computers
- Refrigeration
- Office Equipment
- Ventilation
- Water Heat
- Cooling
- Heating
- Lights
- 1990
- 1995
- 2000
- 2005
- 2010
- 2015
- 2020
- 2025
- 2030
From the looks of it we are missing grabbing numbers (slide #1 for example does not pull “21%” but does pull out “Commercial Buildings”), we are missing text from the legend (slide #2 didn’t extract “Baseline” or “Buildings Decrease Energy Use by 40% by 2030”, etc) and we don’t extract data from the y-axis (slide #2 didn’t extract “20000”, “40000”, etc.).
Is there a way to CC someone into this ticket?
The dev on our side that is looking into this issue also should be on this thread.
Thanks!
Hi Zac,
I have observed your requests concerning to extracting the chart data. You have mentioned that some data is missing in chart data extraction code shared by Adnan. Actually, the missing data you are pointing is of chart labels and chart legends. As far as chart legends are concerned they are read only entities containing the series name. All you need to is to iterate through the chart series and extract their name. This will solve one issue for you and following code will help you to achieve this.
The other missing data that you have pointed is about chart series data labels inside Pie chart on slide 1. The chart data labels can have predefined value that you can even set using PowerPoint to show value, show category name and show percentage etc via check box. When there are predefined chart data labels inside chart, you cannot extract their text as that is manged internally via charting engine and Aspose.Slide also is unable to extract the predefined labels text. I have added a new feature requirement with ID SLIDESNET-36934 in our issue tracking system to provide support to access the predefined labels text. However, if the chart series data label is a custom text label and set using Aspose.Slides, you can access and extract the text for that. I hope the clarification will be helpful. Please share, if I may help you further in this regard.
Presentation pres = new Presentation(@"D:\16765.pptx");
foreach (ISlide slide in pres.Slides)
{
foreach (IShape shp in slide.Shapes)
{
if (shp is Chart)
{
IChart chart = (IChart)shp;
IChartCategory category = null;
IChartSeries series = null;
for (int i = 0; i < chart.ChartData.Categories.Count; i++)
{
category = chart.ChartData.Categories[i];
Console.WriteLine(category.Value.ToString());
}
for (int i = 0; i < chart.ChartData.Series.Count; i++)
{
series = chart.ChartData.Series[i];
Console.WriteLine(series.Name.ToString());
foreach (IChartDataPoint point in series.DataPoints)
{
//Only if the label is with custom text
Console.WriteLine(point.Label.TextFrameForOverriding.Text);
}
}
}
}
}
Many Thanks,
The issues you have found earlier (filed as SLIDESNET-36934) have been fixed in this update.
This message was posted using Notification2Forum from Downloads module by Aspose Notifier.