Get All the Text Content Before a Bookmark or Node in Word DOCX Document | Java

Hi,Team.
I want to get all the text content before a bookmark by aspose.word.
I used the method"extractContent" but didn’t get the correct result. :thinking:
Looking forward to your reply, thanks.
My Code:

DocumentBuilder builder = new DocumentBuilder(doc);
builder.moveToDocumentStart();
Run startRun = new Run(doc, "start");
builder.insertNode(startRun);
Run endRun = new Run(doc, "end");
builder.moveToBookmark(signBookMark.getName());
builder.insertNode(endRun);
ArrayList extractedNodesInclusive = extractContent(startRun, endRun, true);
Document dstDoc = generateDocument(doc, extractedNodesInclusive);
String result = dstDoc.getRange().getText();
log.info("result == "+ result);

The following is the Word document I entered and the expected output.
input.docx (13.5 KB)
out.docx (11 KB)

@asukavon,

You can get text representation of all the content that exists before a particular bookmark in Word document by using the following code of Aspose.Words for Java:

Document doc = new Document("C:\\Temp\\input.doc");
DocumentBuilder builder = new DocumentBuilder(doc);

builder.startBookmark("bm");
BookmarkEnd end = builder.endBookmark("bm");

Bookmark bookmark = doc.getRange().getBookmarks().get("sign_test");
bookmark.getBookmarkStart().getParentNode().insertBefore(end, bookmark.getBookmarkStart());

System.out.println(doc.getRange().getBookmarks().get(end.getName()).getText());

In case you have further inquiries or may need any help in future, please let us know.

Thank you for your reply!
I followed your steps and it didn’t seem to get the results I expected.
Sorry, I didn’t make it clear.
Taking input.doc as an example, I hope to get all the text before [sign_test] bookmark, that is the following:
I want to get all the text content before a bookmark by aspose.word.
I used the method"extractContent" but didn’t get the correct result.
Name age sex adress.
But with your code, i will get the following:
Name age sex adress.
Looking forward to your reply again!
input.docx (13.5 KB)

@asukavon,

The code shared in my previous post should return expected text. For the sake of correction in Aspose.Words for Java API, we have logged this problem in our issue tracking system with ID WORDSJAVA-2651. We will further look into the details of this problem and will keep you updated here on the status of linked issue. We apologize for your inconvenience.

Appreciate it. :laughing:
Wait for good news!

@asukavon,

Please check different options of OutlineOptions Class and use the following code to get the desired PDF output:

Document doc = new Document("C:\\Temp\\input.docx");
PdfSaveOptions pdfSaveOptions = new PdfSaveOptions();
pdfSaveOptions.getOutlineOptions().setDefaultBookmarksOutlineLevel(9);
doc.save("C:\\temp\\awjava-21.8.pdf", pdfSaveOptions);

You can use following simple Java code to determine a bookmark’s coordinates in Word document:

Document doc = new Document("C:\\Temp\\input (4).doc");

LayoutCollector collector = new LayoutCollector(doc);
LayoutEnumerator enumerator = new LayoutEnumerator(doc);

Bookmark bm = doc.getRange().getBookmarks().get("sign_test");

enumerator.setCurrent(collector.getEntity(bm.getBookmarkStart()));

System.out.println(enumerator.getRectangle().getBounds());

Regarding WORDSJAVA-2651, we will keep you posted here on further updates.

Thank you very much for your answer.
This problem has troubled me for a long time.
Your solution perfectly solved my problem!
I have just graduated from university, and my goal is to become an excellent developer like you.
I truly appreciate your timely help.

@asukavon,

It is great that you were able to find what you were looking for. In case you have further inquiries or may need any help in future, please let us know by posting a new thread in Aspose.Words’ forum.

The issues you have found earlier (filed as WORDSJAVA-2651) have been fixed in this Aspose.Words for Java 22.6 update.