We are encountering two challenges in our use case of text extraction from slides using Aspose:
- Garbage text is being appended during the extraction process.
- The segmentation of text is inaccurate, leading to break lines within sentences. For example, a sentence like “The quick brown fox jumped over the fence, and chased the sheep” is segmented incorrectly as:
“The quick brown fox jumped
over the fence, and chased the
sheep”
CODE SNIPPET
public static String ExtractTextFromPptx(String pptxFilePath){
// Full path where the extracted text will be saved
String fullPath = "/tmp/output_aspose.txt";
// Loading the presentation
try {
Presentation pres = new Presentation("/tmp/" + pptxFilePath);
PrintWriter writerPath = new PrintWriter(new FileWriter(fullPath));
// Extracting all text frames from the presentation
ITextFrame[] textFramesPPTX = SlideUtil.getAllTextFrames(pres, true);
// Looping through the extracted text frames
for (ITextFrame textFrame : textFramesPPTX) {
// Looping through paragraphs in the current text frame
for (IParagraph para : textFrame.getParagraphs()) {
// Looping through portions in the current paragraph to get text
for (IPortion port : para.getPortions()) {
// Writing extracted text to our output file
//System.out.println(port.getText().trim() + "\n");
writerPath.write(port.getText().trim() + "\n");
}
}
}
writerPath.close();
} catch (IOException e) {
System.err.println("Error occurred while working with files: " + e.getMessage());
return "";
} catch (Exception e) {
System.err.println("An error occurred: " + e);
// Handling other Exceptions
return "";
}
// Returning the file path of the saved text file as an indication of success
return fullPath;
}
PPT SAMPLE SLIDE
ppt_zip.zip (15.5 KB)
OUTPUT (TEXT EXTRACTED)
Click to edit the title text format
Click to edit the outline text format
Second Outline Level
Third Outline Level
Fourth Outline Level
Fifth Outline Level
Sixth Outline Level
Seventh Outline Level
Hello world
The quick brown fox jumped over the fence, a
nd chased the ship
. But whenever we fight we fight as community and not as an individual
However the design made for this experiment is not accurate as needed, but it is important to hang in there and try again:
Why is it important to start again?
But it is difficult then hit the honest button again.
Hence in conclusion I would like to say best of luck for your journey.