We are aspose.total user.
How to do word count for a PPT document with Aspose.Slides?
We do that in the following ways:
public void GetTotalWords (string filepath){
Presentation ppt = new Presentation(filepath);
int totalwords = 0;
ITextFrame[] textFramesPPTX = Aspose.Slides.Util.SlideUtil.GetAllTextFrames(ppt, true);
//Loop through the Array of TextFrames
for (int i = 0; i < textFramesPPTX.Length; i++)
{
//Loop through paragraphs in current ITextFrame
foreach (IParagraph para in textFramesPPTX[i].Paragraphs)
{
var text = TrimString(para.Text);
if(text.Length>0)
totalwords += GetWordCount(text);
}
}
}
private string TrimString(string text)
{
return text.Replace("\v", “”).Replace("\a", “”).Trim();
}
public static int GetWordCount(string text)
{
Regist();
Document doc = new Document();
DocumentBuilder builder = new DocumentBuilder(doc);
builder.Write(text);
doc.UpdateWordCount();
return doc.BuiltInDocumentProperties.Words;
}
The problems are:
The totalWords is so greatly different from the result when we do the word count with MS-PowerPoint. Why?
What’s wrong? Any suggestions?
1 Like
Hi Kai,
Thanks for posting.
I have observed your code and request you to please share the source presentation to verify the issue on our end. Please also share with us that what word count you are getting after using your code and what is actual word count. I will investigate the issue on my end once requested information will be shared.
PS: I am also moving this thread to Aspose.Slides forum as it is related to Aspose.Slides alone.
Best Regards,
Hi Muhammad, Thanks!
But we are in R&D center and we are not allowed to upload any file. Could you send me your email address? We can send it to you by email.
Hi Adnan, here is a PPT file we tested. The word count by aspose.slides is 646 while MS-PowerPoint word count is 731.
Just for your reference. Thanks!
Hi Victor,
Thanks for sharing the presentation.
I have worked with the presentation file shared by you and have tried extracting the word count using code sample provided by you. There seems to be an issue with Aspose.Slides while reading the word count. An issue with ID SLIDESNET-37658 has been created in our issue tracking system to further investigate and resolve the issue. This thread has been linked with the issue so that you may be automatically notified once the issue will be fixed.
We are sorry for your inconvenience,
Hi Mudassir,
Thanks! Hope to see your solution soon. This is really urgent for us.
Hi There!
Hwo is it going? We are expecting your solution!
Thanks!
Hi There!
How is it going? We are expecting your solution soon!
Thanks!
Hi Victor,
I have observed your comments and regret to share that at present issue is still unresolved.I have requested our product team to share feedback regarding the issue. I request for your patience till feedback is shared by our product team and issue gets resolved.
We are sorry for your inconvenience,
Hi Victor,
Our product team has investigated the issue on their end. Actually, there is incorrect count algorithm in your code. PowerPoint has much complex logic for words counting, it includes in some cases commas, dots, brackets etc.
An approximate algorithm follows below:
public static int GetWordCount(string filepath)
{
int totalwords = 0;
using (Presentation pres = new Presentation(filepath))
{
foreach (ISlide item in pres.Slides)
{
totalwords += GetWordCount(item);
totalwords += GetWordCount(item.NotesSlideManager.NotesSlide);
}
}
Debug.WriteLine(totalwords);
return totalwords;
}
private static int GetWordCount(IBaseSlide slide)
{
if (slide == null)
return 0;
int wordCount = 0;
foreach (ITextFrame textFrame in Util.SlideUtil.GetAllTextBoxes(slide))
{
foreach (IParagraph para in textFrame.Paragraphs)
{
wordCount += GetWordCount(para.Text.Replace("\v", “”).Replace("\a", “”).Trim());
}
}
return wordCount;
}
public static int GetWordCount(string text)
{
if (string.IsNullOrEmpty(text))
return 0;
int wordCount = 1;
char prevChar = char.MinValue;
char currChar, next;
for (int i = 0; i < text.Length; i++)
{
currChar = text[i];
next = i + 1 < text.Length ? text[i + 1] : ’ ';
if (IsSplitter(prevChar, currChar, next))
{
wordCount++;
}
prevChar = currChar;
}
return wordCount;
}
private static bool IsSplitter(char prevChar, char currChar, char nextChar)
{
if (char.IsWhiteSpace(currChar))
{
return !char.IsWhiteSpace(prevChar);
}
else if (currChar == ‘.’ || currChar == ‘,’)
{
return nextChar == ’ ’ || prevChar == ’ ';
}
else if (!char.IsLetterOrDigit(currChar) &&
(nextChar == ’ ’ || (char.IsLetterOrDigit(nextChar) && prevChar == ’ ')))
{
return true;
}
return false;
}
//usage
int count = GetWordCount(“SDC_VFCacheCase+_APJ.pptx”);// 731
I hope the shared information will be helpful.
Many Thanks,
Hi Mudassir,
Thanks. We tested your solution. However it only worked well for “SDC_VFCacheCase+_APJ.pptx”. When applied to other files, the word count is still so much different from the result with MS-PowerPoint.
For example, we tested your solution on file “SWG -Business Partner Enablement Overview_ECMv2.pptx” and the word count is 392. However the word count with MS-PowerPoint is 279.
I attached the files here for your information.
Hi Victor,
I have observed your comments. Please spare us some time so that we may investigate the issue . We will get back to you with feedback.
We are sorry for your inconvenience,
Hi Victor,
I have discussed the issue related to new presentation with our product team and have added presentation in our issue tracking system. This thread has been linked with the issue so that you may be automatically notified once the issue will be resolved.
Many Thanks,
Hi there,
Any progress? We are expecting solutions…
Hi Victor,
I have observed your comments. Our product team shared feedback with us that PowerPoint doesn’t show real number of words in presentation and following algorithm tries to replicate its logic. To check it you could remove all content from your presentation ‘SWG±Business+Partner+Enablement+Overview_ECMv2.pptx’ except one textbox with text “1Q14 IBM SWG Sales Academy – ECM and ILG Sessions”. PowerPoint shows that it is only 5 words in presentation, but judging by simple logic, there should be at least 9. I have shared piece of code in a text file with you please see attachments.
Best Regards,
Hi Adnan,
Thanks.
Where can I find the txet file?
Hi Victor,
I am sorry i forget to attach text file. Now you can check attachments.
Best Regards,
Hi Adnan,
Thanks. We tested your coding on some English PPT files and it worked well. However when we tested it on some Chinese PPT files, the word count is not accurate.
Would you please help us?
Hi Adnan,
Here is a Chinese PPT we tested. The word count should be 808. However the aspose statistics result is only 218.
Please see attachment.
Regards,
Victor
Hi Victor,
I have observed your comments. Our product team is investigating the issue in detail, We will share our feedback with you soon.
Best Regards,