Extract text from ppt and pptx

Hey guys, feature request for you. I’ve noticed a fair number of people on the forum asking about how to extract text from powerpoints … I know that it can be done, but only by manually looping through various collections in your API

So, the feature request: Add a method to Presentation and PresentationEx called GetText (just like the method in the Words dll), and have it return a string. Nice and simple, and will end all of the support question for you! You may want to allow people the option of passing in a separator string that you put in between each text fragment

Thanks for your consideration
Brian.

Dear Brian,

We will consider your request of adding this feature. Please also see the recently added technical tip, you might consider it to modify for your own specific needs.

http://www.aspose.com/documentation/file-format-components/aspose.slides-for-.net-and-java/extract-entire-text-from-a-presentation.html

Great, thanks. This tip is great for ppt files, but the API for pptx is different. It would be nice to see this article expanded to show the code for pptx files

Hello Brian,

I posted pptx code example for you in this thread 2 days ago.

Yes, thank you for that. What I had found with extracting text from ppt is that there are several different posts on your forums showing how to extract the text, until I saw that technical article I wasn’t sure which was the “right” way … I even remember a comment from a Aspose staff member saying something like “we don’t provide a method to extract text because it is very simple” … it’s not simple

For pptx it also looks like there are already a few different ways to do it … you yourself posted 2 very different ways in the very thread you just linked to.

Hopefully you can see why it’s so confusing … to me it is a no-brainer that there should be a single method for extracting text … the only thing I’m using the Slides component for is to extract the text, and I have to maintain quite a bit of code to do what should be very simple. This is one small part of my application, and I’ve spent way too much time on it … I don’t have time to learn the API or the in’s and out’s of the differences between ppt and pptx … I just want the text

Sorry for sounding a bit negative, I know you guys like feedback, so this is it. Your support response times are outstanding and I do appreciate it

Brian.

Same comment.

Have to extract text from multiple formats and syntax is not common.
Have to extract document properties from multiple formats and syntax is not common.

Ideally would have an interface Office that has common methods and properties.

Still an ugly way to extract text is better than no way.

Hi,


Thanks for inquiring Aspose.Slides.

Can you please share the details about the issue incurring on your end. I have not been able to completely understand the issue on your end. Please visit the following articles for your convenience by using latest version of Aspose.Slides.


If there is stil an issue then please share the sample application, source presentation and problem incurring on your end. I will try to help you further in this regard.

Many Thanks,

I got text to extract.


I am extracting text and properties from:
ppt
pps
pptx
xls
xlxls
doc
docs

4 diffenent syntax

If a common GetText() was supported it would have been easier.

But it appears to be working.
Good code samples.

Hi,


Thanks for your feedback and like to share that the requirements relates to three different APIs so the ways to extract text is different for all three slightly. If there is any thing else I may do for you please shre with us.

Many Thanks,