Convert Slides to Text


#1

Using .NET 4.7 C# and Aspose version 19.9

I am trying to convert ppt(x) files to text (like text extraction), it seems I can get the raw text by iterating thru the Shapes property, but I would like to also extract the bullets and spacing format. I searched the forums and google, can’t find any documentation or examples that would show how to do this.

Another words I am looking to convert to text like it converts to PDF

Your help would be greatly appreciated.

Thanks


#2

@reisvm,

I have observed your requirements and like to share that Aspose.Slides allow you to iterate shapes and extract text along with all other properties associated with text. Like properties related to bullets and spacing formats are set on paragraph level. So, if you want to retain the text properties along with text, you need to extract them too on your end.

We have also added a new feature request with ID SLIDESNET-41406 to provide support for serializing Paragraph and Portions of text as well. Once this will be achieved, one will be able to serialize the text along with its all respective properties in the form of byte array.


#3

@reisvm,

We have investigated the requirements on our end and like to share that you can use serialization of Paragraph to JSON. By doing so, all the text including the properties will be serialized in JSON format and you can use or store that on your end. You can retrieve that back too as well as per your requirements. The following code shall be helpful to you. Please share, if it suffice your requirements.

JsonSerializerSettings settings = new JsonSerializerSettings
{
    ReferenceLoopHandling = ReferenceLoopHandling.Ignore,
};

using (Presentation pres = new Presentation("pres.pptx"))
{
    foreach (ISlide slide in pres.Slides)
    {
        foreach (IShape shape in slide.Shapes)
        {
            AutoShape autoShape = shape as AutoShape;
            if (autoShape != null)
            {
                foreach (IParagraph paragraph in autoShape.TextFrame.Paragraphs)
                {
                    string paragraphJson = JsonConvert.SerializeObject(paragraph, settings);
                    Console.WriteLine(paragraphJson);
                }
            }
        }
    }
}