Not able to fetch Rich text content control if page break is added

saurabh.arora · April 28, 2022, 6:21pm

Hi Team,

I have 2 documents , one with page break and other without page break. I am able to extract rich text content control from document without page break.I am using the following code base :

public static void main(String... args) throws Exception {
        com.aspose.words.License license = new com.aspose.words.License();
        license.setLicense("/home/saurabharora/aspose-licence.xml");
        Document document = new Document("/home/saurabharora/Downloads/Schedule 3 Without page break.docx");
        for (Object st : document.getChildNodes(NodeType.STRUCTURED_DOCUMENT_TAG, true)) {
            StructuredDocumentTag std = (StructuredDocumentTag) st;
            if ((std.getSdtType() == SdtType.RICH_TEXT)) {
                System.out.println(std.getTag());
                System.out.println(std.getTitle());
                System.out.println(std.getText());

            }
        }

    }

…
I get the following output for document without page break :

BASIC__1001__25425__138__138
Schedule
For and on behalf ofProvider Legal Entity  (Recipient)aa Name: Provider Legal Entity Signatory: aa Title: Provider Legal Entity Signatory Title:  aa Date of Signature: [  ]  aa

..............

For other document (with page break), there is no output.

Attaching document for your reference.
Schedule 3.zip (36.3 KB)

Please help .

alexey.noskov · April 29, 2022, 5:23am

@saurabh.arora In the first case (document without section breaks) you can use Node.toString method t get whole content of the structured document tag:

if ((std.getSdtType() == SdtType.RICH_TEXT)) {
    System.out.println(std.getTag());
    System.out.println(std.getTitle());
    TxtSaveOptions opt = new TxtSaveOptions();
    opt.setPreserveTableLayout(true);
    System.out.println(std.toString(opt));
}

The second document case is more complex because there are section breaks in the structured document tag. In this case Aspose.Words represents such SDTs with StructuredDocumentTagRangeStart and StructuredDocumentTagRangeEnd nodes.

So in this case, you need to extract content between these two nodes. Please see our documentation to learn how to extract content between nodes.