您好,近期发现在使用aspose words读取doc文档时,会将文档中的批注一起读出来。
查阅相关资料,发现使用如下操作:
com.aspose.words.Document document = new com.aspose.words.Document(docStream); document.getLayoutOptions().setShowComments(Boolean.FALSE); document.acceptAllRevisions();
这样虽然在doc转PDF时,能将批注去掉。但是真实在读取时,其实批注还是存在的。请问有什么方法能够真正的删除doc中的批注,使得读出的是文档的正文内容?谢谢
附相关代码
public List<String> readDoc(InputStream docStream) throws Exception {
List<String> paragraphTexts = Lists.newArrayList();
com.aspose.words.Document document = new com.aspose.words.Document(docStream);
document.getLayoutOptions().setShowComments(Boolean.FALSE);
document.acceptAllRevisions();
removeWordHeaderFooter(document);
document.updateListLabels();
NodeCollection childNodes = document.getChildNodes(NodeType.PARAGRAPH, true);
for (Paragraph paragraph : (Iterable<Paragraph>) childNodes) {
String paragraphText = paragraph.toString(SaveFormat.TEXT).replaceAll("\\r|\\n|\\t| ", StringUtils.EMPTY);
paragraphText = StringUtils.deleteWhitespace(paragraphText);
List<String> list = Lists.newArrayList(paragraphText.split("。|!|?"));
paragraphTexts.addAll(list);
}
return paragraphTexts;
}
private void removeWordHeaderFooter(com.aspose.words.Document doc) {
for (Section section : doc.getSections()) {
HeaderFooterCollection headersFooters = section.getHeadersFooters();
for (HeaderFooter headerFooter : headersFooters) {
headerFooter.remove();
}
}
}