PowerPoint file remains busy after opening


#1

Dear Support Team,

I’m using the following code to extract text from ppt document in ASP.NET:

public string ConvertToText(string filePath) {
Presentation pres = new Presentation(filePath);

StringBuilder result = new StringBuilder();

for (int i = 1; i <= pres.Slides.LastSlidePosition; i++) {
Slide slide = pres.GetSlideByPosition(i);

for(int j = 0; j < slide.Shapes.Count; j++) {
Shape shape = slide.Shapes[j];
if (!shape.IsTextHolder && shape.TextFrame != null) {
result.Append(shape.TextFrame.Text);
result.Append(System.Environment.NewLine);
}
}

for(int j = 0; j < slide.Placeholders.Count; j++) {
TextHolder holder = slide.Placeholders[j] as TextHolder;
if (holder != null) {
result.Append(holder.Text);
result.Append(System.Environment.NewLine);
}
}
}

return result.ToString();
}

After this method exits the ppt file remains busy. So when I open it with PowerPoint it says the file is being edited (sharing violation).

How can I avoid this situation?

Best Regards,
Mike


#2

Dear Mike,

Use stream instead of file name and close it after file opening:

FileStream fs = new FileStream(filePath, FileMode.Open, FileAccess.Read);
Presentation pres = new Presentation(fs);
fs.Close();


#3

Thank you, Alexey! This helps.

Another question which concerns me much more is time it takes to convert PPT to text. For example I have PPT file of 1 Mb. Text extraction takes 7 sec. to get all text out of it! This is unacceptable for me. I suspect I do something wrong. Could you please estimate the code above if it is the optimal way to extract text?

Thanks a lot!
Mike


#4

The time for text extraction can’t be 7 sec. for 1Mb file. If file has huge amount of graphics
and images then it can take time to open it. But in any case 7 sec is too much.
I hope you measured this time on release version of your app and not in debugger ?


#5

Alexey,

You are keen adviser! :slight_smile:
I switched off the debugger. Now it take 2 secs. Much better.

You also were right about graphics. My file contains 44 page, each has at least one image (not in master slide). Can we anyhow prevent the library from parsing and loading graphics?

Many thanks,
Mike


#6

Sorry, that is not possible to disable parsing. I thought about “half opened” presentations
to allow only extract text, graphics, presentation properties and etc. and not parse whole
presentation at all. It could work very quickly for text extraction.
But there are no any plans or time frames. That is just in idea stage.