How to end the DocumentVisitor at a specific place

here’s my code :

@Override
    public int visitRowStart(Row row) throws Exception {

        String text = row.toString(SaveFormat.TEXT).trim().replace(" ", "");
        if (text.isEmpty() || text.trim().isBlank()) {
            return VisitorAction.CONTINUE;
        }

        if (endOfCover(text)) {
            return VisitorAction.STOP;
        }

        System.out.println("Processing Row:  " + row.toString(SaveFormat.TEXT));
        System.out.println("End of Row");

        processChineseText(text, zhData);

        processEnglishText(text, enData);

        String paraText = text.trim();

        if (isChineseApplyForThesis(text)) {
            if (lastRow != null) {
                zhData.put("标题", chCollectedText.toString().trim());
                zhData.put("书脊标题", spineTitle.toString().trim().replaceAll("\\r?\\n", ""));
            }
            chCollectedText.setLength(0);
            spineTitle.setLength(0);
        } else {
            chCollectedText.append(paraText).append(ControlChar.PARAGRAPH_BREAK);
            spineTitle.append(paraText);
        }

        if (isEnglishApplyForThesis(text)) {
            if (lastRow != null) {
                enData.put("标题", enCollectedText.toString().trim());
                enData.put("书脊标题", enCollectedText.toString().trim());
            }
            enCollectedText.setLength(0);
        } else {
            enCollectedText.append(paraText).append(ControlChar.PARAGRAPH_BREAK);
        }

        lastRow = row;

        return VisitorAction.CONTINUE;
    }

and the endOfCover :

 private boolean endOfCover(String text) {
        return text.matches(ConverterUtils.keywordToRegex("中文摘要")) ||
                text.matches(ConverterUtils.keywordToRegex("摘要")) ||
                text.matches(ConverterUtils.keywordToRegex("学位论文公开评阅人和答辩委员会名单")) ||
                text.matches(ConverterUtils.keywordToRegex("关于学位论文使用授权的说明"));
    }

but it seems didn’t work. how can i fix my method, the the DocumentVisitor will pause at the given place. like ‘endofCover’?

@Madecho

To ensure that your DocumentVisitor stops processing at a specific point, you need to return VisitorAction.STOP from the visitRowStart method when the condition defined in your endOfCover method is met.

From your provided code, it looks like you are already checking for the end condition with the endOfCover method. However, if the visitor is not stopping as expected, you might want to ensure that the endOfCover method is correctly identifying the text that should trigger the stop action.

Here’s a refined version of your visitRowStart method with some additional debugging output to help you trace the flow:

@Override
public int visitRowStart(Row row) throws Exception {
    String text = row.toString(SaveFormat.TEXT).trim().replace(" ", "");
    if (text.isEmpty() || text.trim().isBlank()) {
        return VisitorAction.CONTINUE;
    }

    // Debugging output to see the text being processed
    System.out.println("Processing Row: " + text);

    if (endOfCover(text)) {
        System.out.println("Stopping at Row: " + text); // Debugging output
        return VisitorAction.STOP; // This should stop the visitor
    }

    // Continue processing as normal
    processChineseText(text, zhData);
    processEnglishText(text, enData);

    String paraText = text.trim();
    if (isChineseApplyForThesis(text)) {
        if (lastRow != null) {
            zhData.put("标题", chCollectedText.toString().trim());
            zhData.put("书脊标题", spineTitle.toString().trim().replaceAll("\\r?\\n", ""));
        }
        chCollectedText.setLength(0);
        spineTitle.setLength(0);
    } else {
        chCollectedText.append(paraText).append(ControlChar.PARAGRAPH_BREAK);
        spineTitle.append(paraText);
    }

    if (isEnglishApplyForThesis(text)) {
        if (lastRow != null) {
            enData.put("标题", enCollectedText.toString().trim());
            enData.put("书脊标题", enCollectedText.toString().trim());
        }
        enCollectedText.setLength(0);
    } else {
        enCollectedText.append(paraText).append(ControlChar.PARAGRAPH_BREAK);
    }

    lastRow = row;
    return VisitorAction.CONTINUE;
}

Key Points:

  1. Debugging Output: I’ve added print statements to help you see what text is being processed and when the stop condition is triggered. This can help you verify that the endOfCover method is being called correctly.
  2. Return Value: Ensure that VisitorAction.STOP is returned when the condition is met, which should halt further processing of the document.

If you still encounter issues, double-check the logic in your endOfCover method to ensure it correctly identifies the stopping condition based on the text content.

ah, thank you, its’ my problem. So my guess is right, using VisitorAction.STOP;

1 Like