Hi,
I need to count word by page. And when I extract text from DOC, it auto generate object type such as:
HYPERLINK \l "_Toc41130032" Content1 PAGEREF _Toc41130032 \h 1
HYPERLINK "http://www.virginia.edu/registrar/forms/coursecataloginstructions.doc" CCI Instructions form
FORMCHECKBOX School/College
FORMTEXT Term/Year
Screenshot from 2020-05-25 11-40-16.png (729.4 KB)
My code as below. And for WordsPageSplitter, I refer from this: https://forum.aspose.com/t/extract-text-for-each-page/204962/2
try (Writer writer = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(sFileName, false), StandardCharsets.UTF_8)) ) {
WordsPageSplitter splitter = new WordsPageSplitter(doc);
for (int page = 1; page <= doc.getPageCount(); page++)
{
com.aspose.words.Document pageDoc = splitter.getDocumentOfPage(page);
String contents = pageDoc.getText();
}
} catch (Exception e)
{
throw e;
}
finally { }
regards,
Rapeepan