Free Support Forum - aspose.com

Find number of words in a PDF

Hi,

Is there any way to find out the word count of a pdf file with the X and Y rectangle co-ordinates.

Thanks,
Surjit

Hi Surjit,


Thank you for contacting support. Please use the following source code:

[C#]
<span style=“font-family: “Courier New”, Consolas, Courier, monospace; font-size: small; white-space: pre; background-color: rgb(255, 255, 255);”>System.Text.StringBuilder builder = <span class=“kwrd” style=“color: rgb(0, 0, 255); font-family: “Courier New”, Consolas, Courier, monospace; font-size: small; white-space: pre;”>new<span style=“font-family: “Courier New”, Consolas, Courier, monospace; font-size: small; white-space: pre; background-color: rgb(255, 255, 255);”> System.Text.StringBuilder();<span style=“font-family: “Courier New”, Consolas, Courier, monospace; font-size: small; white-space: pre; background-color: rgb(255, 255, 255);”>
<span class=“rem” style=“color: rgb(0, 128, 0); font-family: “Courier New”, Consolas, Courier, monospace; font-size: small; white-space: pre;”>// Open document<span style=“font-family: “Courier New”, Consolas, Courier, monospace; font-size: small; white-space: pre; background-color: rgb(255, 255, 255);”>
Aspose.Pdf.Document pdfDocument = <span class=“kwrd” style=“color: rgb(0, 0, 255); font-family: “Courier New”, Consolas, Courier, monospace; font-size: small; white-space: pre;”>new<span style=“font-family: “Courier New”, Consolas, Courier, monospace; font-size: small; white-space: pre; background-color: rgb(255, 255, 255);”> Aspose.Pdf.Document(@“c:\pdf\test100\Input2.pdf”);<br style=“font-family: “Courier New”, Consolas, Courier, monospace; font-size: small; white-space: pre;”><br style=“font-family: “Courier New”, Consolas, Courier, monospace; font-size: small; white-space: pre;”><span class=“rem” style=“color: rgb(0, 128, 0); font-family: “Courier New”, Consolas, Courier, monospace; font-size: small; white-space: pre;”>// Create TextAbsorber object to extract text<span style=“font-family: “Courier New”, Consolas, Courier, monospace; font-size: small; white-space: pre; background-color: rgb(255, 255, 255);”>
TextAbsorber absorber = <span class=“kwrd” style=“color: rgb(0, 0, 255); font-family: “Courier New”, Consolas, Courier, monospace; font-size: small; white-space: pre;”>new<span style=“font-family: “Courier New”, Consolas, Courier, monospace; font-size: small; white-space: pre; background-color: rgb(255, 255, 255);”> TextAbsorber();<span style=“font-family: “Courier New”, Consolas, Courier, monospace; font-size: small; white-space: pre; background-color: rgb(255, 255, 255);”>
absorber.TextSearchOptions.LimitToPageBounds = <span class=“kwrd” style=“color: rgb(0, 0, 255); font-family: “Courier New”, Consolas, Courier, monospace; font-size: small; white-space: pre;”>true<span style=“font-family: “Courier New”, Consolas, Courier, monospace; font-size: small; white-space: pre; background-color: rgb(255, 255, 255);”>;<span style=“font-family: “Courier New”, Consolas, Courier, monospace; font-size: small; white-space: pre; background-color: rgb(255, 255, 255);”>
absorber.TextSearchOptions.Rectangle = <span class=“kwrd” style=“color: rgb(0, 0, 255); font-family: “Courier New”, Consolas, Courier, monospace; font-size: small; white-space: pre;”>new<span style=“font-family: “Courier New”, Consolas, Courier, monospace; font-size: small; white-space: pre; background-color: rgb(255, 255, 255);”> Aspose.Pdf.Rectangle(100, 200, 250, 350);<br style=“font-family: “Courier New”, Consolas, Courier, monospace; font-size: small; white-space: pre;”><span class=“rem” style=“color: rgb(0, 128, 0); font-family: “Courier New”, Consolas, Courier, monospace; font-size: small; white-space: pre;”>// Accept the absorber for first page<span style=“font-family: “Courier New”, Consolas, Courier, monospace; font-size: small; white-space: pre; background-color: rgb(255, 255, 255);”>
pdfDocument.Pages[1].Accept(absorber);<br style=“font-family: “Courier New”, Consolas, Courier, monospace; font-size: small; white-space: pre;”><br style=“font-family: “Courier New”, Consolas, Courier, monospace; font-size: small; white-space: pre;”><span class=“rem” style=“color: rgb(0, 128, 0); font-family: “Courier New”, Consolas, Courier, monospace; font-size: small; white-space: pre;”>// Get the extracted text<span style=“font-family: “Courier New”, Consolas, Courier, monospace; font-size: small; white-space: pre; background-color: rgb(255, 255, 255);”>
builder.Append(absorber.Text);<span style=“font-family: “Courier New”, Consolas, Courier, monospace; font-size: small; white-space: pre; background-color: rgb(255, 255, 255);”>
IList<<span class=“kwrd” style=“color: rgb(0, 0, 255); font-family: “Courier New”, Consolas, Courier, monospace; font-size: small; white-space: pre;”>string<span style=“font-family: “Courier New”, Consolas, Courier, monospace; font-size: small; white-space: pre; background-color: rgb(255, 255, 255);”>> words = builder.ToString().Split(<span class=“str” style=“color: rgb(0, 96, 128); font-family: “Courier New”, Consolas, Courier, monospace; font-size: small; white-space: pre;”>’ '<span style=“font-family: “Courier New”, Consolas, Courier, monospace; font-size: small; white-space: pre; background-color: rgb(255, 255, 255);”>);<br style=“font-family: “Courier New”, Consolas, Courier, monospace; font-size: small; white-space: pre;”><span class=“rem” style=“color: rgb(0, 128, 0); font-family: “Courier New”, Consolas, Courier, monospace; font-size: small; white-space: pre;”>// print the count of words extracted from PDF file<span style=“font-family: “Courier New”, Consolas, Courier, monospace; font-size: small; white-space: pre; background-color: rgb(255, 255, 255);”>
Console.WriteLine(words.Count);
<span style=“font-family: “Courier New”, Consolas, Courier, monospace; font-size: small; white-space: pre; background-color: rgb(255, 255, 255);”>