I’m dealing with a Cover File, which used table.
however , i found that my code has dealt with the same coneten twice
here’s my code :
private LinkedHashMap[] getZhEnCoverVal(Document analysedDoc, CoverFormDTO coverFormDTO) {
// 标题 学位相关的副标题 培养单位 申请人 学科 指导教师 指导教师title 书脊标题 书脊申请人 时间
// var1 var2 var3 var4 var5 var6 var7
// 培养单位 申请人 指导教师
LinkedHashMap[] array = new LinkedHashMap[1];
LinkedHashMap<String, String> zhMap = new LinkedHashMap();
Map<String, String> zhData = initZhData();
Node[] nodesArray = analysedDoc.getSections().get(0).getBody().getChildNodes(NodeType.ANY, true).toArray();
Paragraph lastParagraph = null;
StringBuilder collectedText = new StringBuilder();
for (Node node : nodesArray) {
if (node instanceof Paragraph paragraph) {
String text = paragraph.getText().trim().replace(" ", "");
if (text.isEmpty()) {
continue;
}
// todo
/**
* NodeType : 5 是 table , 8是 paragraph
* (申请清华大学工商管理硕士专业学位论文) 这一句
* 又走了下面的逻辑,论文的题目 本来从正确的,就被覆盖了
*/
// 在这里收集段落文本
collectedText.append(paragraph.getText().trim()).append(ControlChar.PARAGRAPH_BREAK);
if (isChineseApplyForThesis(text)) {
if (lastParagraph != null) {
zhData.put("标题", collectedText.toString().trim());
zhData.put("书脊标题", collectedText.toString().trim());
}
}
//中文处理
processChineseText(text, zhData);
lastParagraph = paragraph;
} else if (node instanceof Table table) {
for (int i = 0; i < table.getRows().getCount(); i++) {
Row row = table.getRows().get(i);
String text = row.getText().trim().replace(" ", "");
if (text.isEmpty()) {
continue;
}
// 申请xxxxx论文 之上,就是论文标题
if (isChineseApplyForThesis(text)) {
if (i > 0) {
zhData.put("标题", table.getRows().get(i - 1).getText().trim());
zhData.put("书脊标题", table.getRows().get(i - 1).getText().trim());
}
}
processChineseText(text, zhData);
}
}
}
List<Map> zhList = coverFormDTO.getZh();
zhMap = fillMapWithData(zhList, zhData, zhMap);
array[0] = zhMap;
return array;
}
private LinkedHashMap fillMapWithData(List<Map> List, Map<String, String> data, LinkedHashMap returnMap) {
for (Map map : List) {
String varName = (String) map.get("key");
String name = (String) map.get("label");
String value = data.get(name);
returnMap.put(varName, value);
}
return returnMap;
}
public class CoverFormDTO {
private List<Map> zh;
}
here’s the test cover :
cover_1.docx (38.3 KB)
for example : the text “申请清华大学工商管理硕士专业学位论文” has been formatted twice in the above code. once as paragraph, ,once as table row.
so How should i fix my code?