Gptrnt
February 7, 2024, 9:10am
1
Hi,
I am extracting the content between the start and end of the bookmark. Once the extracted data is converted to the text, a line break is converted to an unknown character instead of “\n”. Because of it in UI, it shows a question mark instead of the enter. Please help me to figure it out.
Attaching the sample code WordTitleImport.zip (3.3 KB) and inputfile test (2).docx (19.0 KB)
Thank you
@Gptrnt In your case you can simply use Bookmark.getText()
to extract content of the bookmark:
Document document = new Document("C:\\Temp\\in.docx");
int i = 1;
BookmarkCollection bookmarks = document.getRange().getBookmarks();
for (Bookmark bookmark : bookmarks)
{
if (bookmark.getName().equals("title" + i))
{
String title = bookmark.getText();
i++;
System.out.println(title);
}
}
If it is required to use extractContent
, the i would suggest you to put the extracted content into a separate document and then convert to to text:
Document document = new Document("C:\\Temp\\in.docx");
int i = 1;
BookmarkCollection bookmarks = document.getRange().getBookmarks();
for (Bookmark bookmark : bookmarks)
{
if (bookmark.getName().equals("title" + i))
{
i++;
ArrayList<Node> nodes = ExtractContentHelper.extractContent(bookmark.getBookmarkStart(), bookmark.getBookmarkEnd(), false);
Document subDoc = ExtractContentHelper.generateDocument(document, nodes);
String title = subDoc.toString(SaveFormat.TEXT).trim();
System.out.println(title);
}
}
public static Document generateDocument(Document srcDoc, ArrayList<Node> nodes)
{
// Clone source document to preserve source styles.
Document dstDoc = (Document)srcDoc.deepClone(false);
// Import each node from the list into the new document. Keep the original formatting of the node.
NodeImporter importer = new NodeImporter(srcDoc, dstDoc, ImportFormatMode.USE_DESTINATION_STYLES);
for (Node node : nodes)
{
if (node.getNodeType() == NodeType.SECTION)
{
Section srcSection = (Section)node;
Section importedSection = (Section)importer.importNode(srcSection, false);
importedSection.appendChild(importer.importNode(srcSection.getBody(), false));
for (HeaderFooter hf : srcSection.getHeadersFooters())
importedSection.getHeadersFooters().add(importer.importNode(hf, true));
dstDoc.appendChild(importedSection);
}
else
{
Node importNode = importer.importNode(node, true);
dstDoc.getLastSection().getBody().appendChild(importNode);
}
}
return dstDoc;
}