We're sorry Aspose doesn't work properply without JavaScript enabled.

Free Support Forum - aspose.com

Extracting content between bookmarks

Hello, I’m running a test on the ExtractContent example shown here.

I’m using PHP with JavaBridge and Aspose Words for Java. I’m using the attached document: RFPSampleAspose.docx (452.0 KB)

Notice it has one bookmark, TestBookmark1. The bookmark start and end contain the contents one table cell in the included table.

I want to extract the content contained by the bookmark, and convert it to HTML format.

$b=$doc->getRange()->getBookmarks()->get("TestBookmark1");
$asposeCommon=new Java("AsposeWordsCommon"); // I compiled this directly from the example
$contentArray=$asposeCommon->extractContent($b->getBookmarkStart(),$b->getBookmarkEnd(),true);
$genDoc=$asposeCommon->generateDocument($doc,$contentArray);            
echo "extractcontents of " . $b->getName() . ":" . $genDoc->toString($saveFormat->HTML);

However the content of that’s displayed seems to include the last two rows of the table, far away from the named bookmark.

Can you help me understand if I’m doing something wrong? Thank you.

@backprop It seems there is a mistake in the Java code example, however, C# code example works as expected. I have manually translated C# code to Java and it works as expected too. Please try using the attached class on your side and let me know if it works fine on your side:
ExtractContentHelper.zip (3.2 KB)

Thanks alexey, it helped and is a lot closer now. However, if I use the attached document:

RFPSampleAspose.docx (452.0 KB)

There are two bookmarks, TestBookmark1 and TestBookmark2. Each is inside a table cell. If I extract content from them (using either inclusive or exclusive), I get the following result:

asposeExtract.png (13.7 KB)

In the code, I’m displaying the parent nodeType of both bookmarkStart and bookmarkEnd. The parent of each is a Paragraph, which is shown by the text. So I don’t believe the content between bookmarkStart and bookmarkEnd should include any table cells. But as you can see in the screenshot, the extractor seems to include both the preceding table cell, the following table cell, and even part of the following row.

$b=$doc->getRange()->getBookmarks()->get("TestBookmark1");
$asposeCommon=new Java("AsposeWordsCommon");
$contentArray=$asposeCommon->extractContent($b->getBookmarkStart(),$b->getBookmarkEnd(),false);
$genDoc=$asposeCommon->generateDocument($doc,$contentArray);            
echo "extractcontents of " . $b->getName() .  " whose begin/end parent is type " . java_values($b->getBookmarkStart()->getParentNode()->getNodeType()) . "/" . java_values($b->getBookmarkEnd()->getParentNode()->getNodeType()) .  ":" . $genDoc->toString($saveFormat->HTML);
$b=$doc->getRange()->getBookmarks()->get("TestBookmark2");
$asposeCommon=new Java("AsposeWordsCommon");
$contentArray=$asposeCommon->extractContent($b->getBookmarkStart(),$b->getBookmarkEnd(),false);
$genDoc=$asposeCommon->generateDocument($doc,$contentArray);            
echo "extractcontents of " . $b->getName() .  " whose begin/end parent is type " . java_values($b->getBookmarkStart()->getParentNode()->getNodeType()) . "/" . java_values($b->getBookmarkEnd()->getParentNode()->getNodeType()) .  ":" . $genDoc->toString($saveFormat->HTML);

Thank you/Слава Україні

@backprop Героям Слава. Actually, in your document both bookmarks starts in the first cell of row and end at the end of the row. You can see this if unzip your source DOCX document and exploring document.xml or by opening the document using DocumentExplorer demo application.

By the way on my side there is no content from the following row in the extracted content. See the attached output html files. docs.zip (7.2 KB)

Thanks for the analysis Alexey.

I think Word might show the visualization differently, but I do see via document.xml (I can’t get the visualizer to work but it’s on my list :slight_smile:

RFPSampleAspose.docx (452.1 KB)

When I’m in the Word UI and ask it to identify TestBookmark1, it shows the following:

asposeExtract.png (27.1 KB)

The “test” in the third column is not highlighted. But in the document.xml it is definitely part of the bookmark. There doesn’t happen to be a Windows executable version of the DocumentExplorer available is there? I’m not properly set up to get it running as it is. Thanks again. :ukraine:

@backprop Yes, MS Word seems to show the visualization differently. I have added one more bookmark in your document, which ocupy only the cell content and MS Word shows it like this:

As you can see cell break character is outside the bookmark’s range.RFPSampleAspose_modified.docx (452.1 KB)

The DocumentExplorer is .NET project and should be compiled in VS.NET.