How to Extract All Styled Text Including Comments and Notes from PPT?

Hi,
I want to extract all text from ppt, but text always in one line. how can I extract text in styled ?
Question 2: how can I extract annotation and comment text ?

@lucy.hq,
Thank you for contacting support. Could you describe your requirements for extracting styled text in more detail, please?

Thanks for reply so quickly. We want to keep the same new line as in PPT, not all in one line in text file. And we also want to extract comment and annotation text, can you give us some suggestions?
here is our ppt filetestfile.zip (75.4 KB)

@lucy.hq,
The text may contain portions styled differently. You can read each text portion and its formatting properties as shown below:

var slide = presentation.getSlides().get_Item(0);
for (var shape : slide.getShapes()) {
    if(shape instanceof IAutoShape) {
        var autoShape = (IAutoShape) shape;
        for (var paragraph : autoShape.getTextFrame().getParagraphs()) {
            // you can read paragraph format options:
            var paragraphFormat = paragraph.getParagraphFormat().getEffective();

            for (var portion : paragraph.getPortions()) {
                System.out.println("Text portion: " + portion.getText());

                // you can read portion format options:
                var portionFormat = portion.getPortionFormat().getEffective();
                System.out.println("Font name: " + portionFormat.getLatinFont().getFontName());
                System.out.println("Font size: " + portionFormat.getFontHeight());
                // read other style properties...
                System.out.println();
            }
        }
    }
}

Documents: Manage Paragraph, Text Formatting
API Reference: IParagraphFormatEffectiveData Interface, IPortionFormatEffectiveData Interface

Presentation comments can be extracted like this:

for (var commentAuthor : presentation.getCommentAuthors()) {
    for (var comment : commentAuthor.getComments()) {
        System.out.println("Slide number: " + comment.getSlide().getSlideNumber());
        System.out.println("Comment author: " + comment.getAuthor().getName());
        System.out.println("Comment time: " + comment.getCreatedTime());
        System.out.println("Comment: " + comment.getText());
        System.out.println();
    }
}

Documents: Presentation Comments
API Reference: getCommentAuthors Method, ICommentCollection Interface

The code snippet below shows you how to extract slide notes:

for (var slide : presentation.getSlides()) {
    var notesSlide = slide.getNotesSlideManager().getNotesSlide();
    System.out.println("Slide notes: " + notesSlide.getNotesTextFrame().getText());
    System.out.println();
}

Documents: Presentation Notes
API Reference: getNotesSlideManager Method, INotesSlide Interface

1 Like

many thanks for your suggestions, we can extract content text and comment, note text now. We still have some questiones.

  1. how can we get master slide’s text and image?
  2. We also want to know difference between layout slide and master slide. Does text in layout slide also exist in master slide, text in master slide may not exist layout slide?
  3. can master slide have comment and note?
  4. presentation.getImages() return all images include images in master slide ?

@lucy.hq,
Thank you for your questions.

You can find text and images on master slides in the same way exactly as on presentation slides:

for (var masterSlide : presentation.getMasters()) {
    for (var shape : masterSlide.getShapes()) {
        // search for text frames
        if (shape instanceof IAutoShape) {
            var autoShape = (IAutoShape) shape;
            var textFrame = autoShape.getTextFrame();
            // ...
        }

        // search for images
        if (shape instanceof IPictureFrame) {
            var pictureFrame = (IPictureFrame) shape;
            // ...
        }
    }
}

Documents: Slide Master, Picture Frame
API Reference: IMasterSlide Interface, IPictureFrame Interface

Slides masters and slide layouts are design model components of PowerPoint documents. You should familiarize yourself with their capabilities through official sources. Particularly, your questions may be related to placeholders.

Documents:
What is a slide master?
What is a slide layout?
Add, edit, or remove a placeholder on a slide layout

Unfortunately, I have not found such capabilities in PowerPoint documents.

The presentation.getImages method returns all images contained in a presentation, including images on masters and layouts.

API Reference: Presentation.getImages Method

1 Like

thanks for so detail reply. I have a pptx and there are still some areas can’t be updated after I enter master view mode. Is it possible that a master slide also has master slide? If it is true, how can I extract this kind area text?

@lucy.hq,
If the problem is related to Aspose.Slides API, please share the presentation file and code example reproducing the problem. We will be glad to help you.

Far as I know, PowerPoint documents don’t have such a capability.