Cant extract correctly bookmark content when bookmark contains table

stanlys2 · October 31, 2012, 10:16am

Hi,

I have a bookmark in the document that contains a table.

When i trying to get bookmark’s content, the table is nor extracted but only cell’s content. How it can be fixed?

Bookmark Bookmark = mDocument.Range.Bookmarks[“START184634325815091826172F”];

BookmarkStart bookmarkStart = Bookmark.BookmarkStart;

BookmarkEnd bookmarkEnd = Bookmark.BookmarkEnd;
ArrayList extractedNodes = ExtractContent(bookmarkStart, bookmarkEnd, false);
Aspose.Words.Document srcBookmarkContent = GenerateDocument(mDocument, extractedNodes);

srcBookmarkContent.Save(“output.doc”);

Here the ExtractContent method that was fix by Aspose after one of my posts.

        private ArrayList ExtractContent(Node startNode, Node endNode, bool isInclusive)

{

// First check that the nodes passed to this method are valid for use.

VerifyParameterNodes(startNode, endNode);
        </span><span style="background-color: rgb(159, 209, 251); color: green; ">// Create a list to store the extracted nodes.</span><span style="background-color: rgb(159, 209, 251);">
        </span><span style="background-color: rgb(159, 209, 251); color: rgb(43, 145, 175); ">ArrayList</span><span style="background-color: rgb(159, 209, 251);"> nodes = </span><span style="background-color: rgb(159, 209, 251); color: blue; ">new</span><span style="background-color: rgb(159, 209, 251);"> </span><span style="background-color: rgb(159, 209, 251); color: rgb(43, 145, 175); ">ArrayList</span><span style="background-color: rgb(159, 209, 251);">();

        </span><span style="background-color: rgb(159, 209, 251); color: green; ">// Keep a record of the original nodes passed to this method so we can split marker nodes if needed.</span><span style="background-color: rgb(159, 209, 251);">
        </span><span style="background-color: rgb(159, 209, 251); color: rgb(43, 145, 175); ">Node</span><span style="background-color: rgb(159, 209, 251);"> originalStartNode = startNode;
        </span><span style="background-color: rgb(159, 209, 251); color: rgb(43, 145, 175); ">Node</span><span style="background-color: rgb(159, 209, 251);"> originalEndNode = endNode;
        </span><span style="background-color: rgb(159, 209, 251); color: green; ">// Extract content based on block level nodes (paragraphs and tables). Traverse through parent nodes to find them.</span><span style="background-color: rgb(159, 209, 251);">
        </span><span style="background-color: rgb(159, 209, 251); color: green; ">// We will split the content of first and last nodes depending if the marker nodes are inline</span><span style="background-color: rgb(159, 209, 251);">

            while (startNode.NodeType != NodeType.Paragraph && startNode.NodeType != NodeType.Table)

startNode = startNode.ParentNode;
        <span style="color: blue; ">while</span> (endNode.NodeType != <span style="color: rgb(43, 145, 175); ">NodeType</span>.Paragraph && endNode.NodeType != <span style="color: rgb(43, 145, 175); ">NodeType</span>.Table)
            endNode = endNode.ParentNode;</span><span style="background-color: rgb(159, 209, 251);">

        </span><span style="background-color: rgb(159, 209, 251); color: blue; ">bool</span><span style="background-color: rgb(159, 209, 251);"> isExtracting = </span><span style="background-color: rgb(159, 209, 251); color: blue; ">true</span><span style="background-color: rgb(159, 209, 251);">;
        </span><span style="background-color: rgb(159, 209, 251); color: blue; ">bool</span><span style="background-color: rgb(159, 209, 251);"> isStartingNode = </span><span style="background-color: rgb(159, 209, 251); color: blue; ">true</span><span style="background-color: rgb(159, 209, 251);">;
        </span><span style="background-color: rgb(159, 209, 251); color: blue; ">bool</span><span style="background-color: rgb(159, 209, 251);"> isEndingNode = </span><span style="background-color: rgb(159, 209, 251); color: blue; ">false</span><span style="background-color: rgb(159, 209, 251);">;
        </span><span style="background-color: rgb(159, 209, 251); color: green; ">// The current node we are extracting from the document.</span><span style="background-color: rgb(159, 209, 251);">
        </span><span style="background-color: rgb(159, 209, 251); color: rgb(43, 145, 175); ">Node</span><span style="background-color: rgb(159, 209, 251);"> currNode = startNode;

        </span><span style="background-color: rgb(159, 209, 251); color: green; ">// Begin extracting content. Process all block level nodes and specifically split the first and last nodes when needed so paragraph formatting is retained.</span><span style="background-color: rgb(159, 209, 251);">
        </span><span style="background-color: rgb(159, 209, 251); color: green; ">// Method is little more complex than a regular extractor as we need to factor in extracting using inline nodes, fields, bookmarks etc as to make it really useful.</span><span style="background-color: rgb(159, 209, 251);">
        </span><span style="background-color: rgb(159, 209, 251); color: blue; ">while</span><span style="background-color: rgb(159, 209, 251);"> (isExtracting)
        {
            </span><span style="background-color: rgb(159, 209, 251); color: green; ">// Clone the current node and its children to obtain a copy.</span><span style="background-color: rgb(159, 209, 251);">
            </span><span style="background-color: rgb(159, 209, 251); color: rgb(43, 145, 175); ">CompositeNode</span><span style="background-color: rgb(159, 209, 251);"> cloneNode = (</span><span style="background-color: rgb(159, 209, 251); color: rgb(43, 145, 175); ">CompositeNode</span><span style="background-color: rgb(159, 209, 251);">)currNode.Clone(</span><span style="background-color: rgb(159, 209, 251); color: blue; ">true</span><span style="background-color: rgb(159, 209, 251);">);
            isEndingNode = currNode.Equals(endNode);

            </span><span style="background-color: rgb(159, 209, 251); color: blue; ">if</span><span style="background-color: rgb(159, 209, 251);"> (isStartingNode || isEndingNode)
            {
                </span><span style="background-color: rgb(159, 209, 251); color: green; ">// We need to process each marker separately so pass it off to a separate method instead.</span><span style="background-color: rgb(159, 209, 251);">
                </span><span style="background-color: rgb(159, 209, 251); color: blue; ">if</span><span style="background-color: rgb(159, 209, 251);"> (isStartingNode)
                {
                    ProcessMarker(cloneNode, nodes, originalStartNode, isInclusive, isStartingNode, isEndingNode);
                    isStartingNode = </span><span style="background-color: rgb(159, 209, 251); color: blue; ">false</span><span style="background-color: rgb(159, 209, 251);">;
                }

                </span><span style="background-color: rgb(159, 209, 251); color: green; ">// Conditional needs to be separate as the block level start and end markers maybe the same node.</span><span style="background-color: rgb(159, 209, 251);">
                </span><span style="background-color: rgb(159, 209, 251); color: blue; ">if</span><span style="background-color: rgb(159, 209, 251);"> (isEndingNode)
                {
                    ProcessMarker(cloneNode, nodes, originalEndNode, isInclusive, isStartingNode, isEndingNode);
                    isExtracting = </span><span style="background-color: rgb(159, 209, 251); color: blue; ">false</span><span style="background-color: rgb(159, 209, 251);">;
                }
            }
            </span><span style="background-color: rgb(159, 209, 251); color: blue; ">else</span><span style="background-color: rgb(159, 209, 251);">
                </span><span style="background-color: rgb(159, 209, 251); color: green; ">// Node is not a start or end marker, simply add the copy to the list.</span><span style="background-color: rgb(159, 209, 251);">
                nodes.Add(cloneNode);

            </span><span style="background-color: rgb(159, 209, 251); color: green; ">// Move to the next node and extract it. If next node is null that means the rest of the content is found in a different section.</span><span style="background-color: rgb(159, 209, 251);">
            </span><span style="background-color: rgb(159, 209, 251); color: blue; ">if</span><span style="background-color: rgb(159, 209, 251);"> (currNode.NextSibling == </span><span style="background-color: rgb(159, 209, 251); color: blue; ">null</span><span style="background-color: rgb(159, 209, 251);"> && isExtracting)
            {

                    currNode = currNode.NextPreOrder(currNode.Document);
                <span style="color: blue; ">while</span> (currNode.NodeType != <span style="color: rgb(43, 145, 175); ">NodeType</span>.Paragraph && currNode.NodeType != <span style="color: rgb(43, 145, 175); ">NodeType</span>.Table)
                    currNode = currNode.NextPreOrder(currNode.Document);</b><span style="background-color: rgb(159, 209, 251);">
            }
            </span><span style="background-color: rgb(159, 209, 251); color: blue; ">else</span><span style="background-color: rgb(159, 209, 251);">
            {
                </span><span style="background-color: rgb(159, 209, 251); color: green; ">// Move to the next node in the body.</span><span style="background-color: rgb(159, 209, 251);">
                currNode = currNode.NextSibling;
            }
        }</span><span style="background-color: rgb(159, 209, 251);">
        </span><span style="background-color: rgb(159, 209, 251); color: green; ">// Return the nodes between the node markers.</span><span style="background-color: rgb(159, 209, 251);">
        </span><span style="background-color: rgb(159, 209, 251); color: blue; ">return</span><span style="background-color: rgb(159, 209, 251);"> nodes;
    }</span></pre></div><div><pre style="font-family: 'Courier New'; font-size: 13px; background-color: rgb(159, 209, 251); background-position: initial initial; background-repeat: initial initial; ">        <span style="color:blue;">private</span> Aspose.Words.<span style="color:#2b91af;">Document</span> GenerateDocument(Aspose.Words.<span style="color:#2b91af;">Document</span> srcDoc, <span style="color:#2b91af;">ArrayList</span> nodes)
    {
        <span style="color:green;">//Remove empty paragraphs from the end of document</span>
        <span style="color:blue;">if</span> (srcDoc.LastSection.Body.LastChild != <span style="color:blue;">null</span>)
        {
            <span style="color:blue;">while</span> (!((<span style="color:#2b91af;">CompositeNode</span>)srcDoc.LastSection.Body.LastChild).HasChildNodes)
            {
                srcDoc.LastSection.Body.LastParagraph.Remove();
                <span style="color:blue;">if</span> (srcDoc.LastSection.Body.LastChild == <span style="color:blue;">null</span>)
                    <span style="color:blue;">break</span>;
            }
        }

        <span style="color:green;">// Create a blank document.</span>
        Aspose.Words.<span style="color:#2b91af;">Document</span> dstDoc = <span style="color:blue;">new</span> Aspose.Words.<span style="color:#2b91af;">Document</span>();
        <span style="color:green;">// Remove the first paragraph from the empty document.</span>
        dstDoc.FirstSection.Body.RemoveAllChildren();

        <span style="color:green;">// Import each node from the list into the new document. Keep the original formatting of the node.</span>
        <span style="color:#2b91af;">NodeImporter</span> importer = <span style="color:blue;">new</span> <span style="color:#2b91af;">NodeImporter</span>(srcDoc, dstDoc, <span style="color:#2b91af;">ImportFormatMode</span>.KeepSourceFormatting);

        <span style="color:blue;">foreach</span> (<span style="color:#2b91af;">Node</span> node <span style="color:blue;">in</span> nodes)
        {
            <span style="color:#2b91af;">Node</span> importNode = importer.ImportNode(node, <span style="color:blue;">true</span>);
            dstDoc.FirstSection.Body.AppendChild(importNode);
        }

        <span style="color:green;">// Return the generated document.</span>
        <span style="color:blue;">return</span> dstDoc;
    }

I am attaching the document.

Thanks,

Stanislav.

tahir.manzoor · November 1, 2012, 10:41am

Hi Stanislav,

Thanks for your inquiry. The code shared at following forum link do not extract the table nodes if BookmarkStart node is inside table’s cell.

https://forum.aspose.com/t/53465

The BookmarkStart node of shared document is inside of table’s first cell. Please use following code snippet for your requirement. Hope this helps you. Please let us know if you have any more queries.

Document doc = new Document(MyDir + "plstemp.doc");
DocumentBuilder builder = new DocumentBuilder(doc);

Bookmark Bookmark = doc.Range.Bookmarks["START184634325815091826172F"];
BookmarkStart bookmarkStart = Bookmark.BookmarkStart;
BookmarkEnd bookmarkEnd = Bookmark.BookmarkEnd;

Boolean blntable = false;
Node currentNode = bookmarkStart;
while (currentNode.NodeType != NodeType.Body)
{
    currentNode = currentNode.PreviousPreOrder(doc);
    if (currentNode.NodeType == NodeType.Table)
    {
        blntable = true;
        break;
    }
}

if (blntable == true)
{
    builder.MoveTo(currentNode.PreviousPreOrder(doc));
    bookmarkStart = builder.StartBookmark("MyBookmark");
    builder.Writeln("");
    builder.EndBookmark("MyBookmark");
}

ArrayList extractedNodes = ExtractContent(bookmarkStart, bookmarkEnd, false);
Aspose.Words.Document srcBookmarkContent = GenerateDocument(doc, extractedNodes);
srcBookmarkContent.Save(MyDir + "output.doc");

stanlys2 · November 5, 2012, 5:42am

Hi

Thank you for the answer. All the documents that I am attaching to my posts are defined by our clients and i not always can validate what they create. My problem is that i need a general method that will deal with all possible scenarios. I know that it is almost impossible so I am opening new post every time that new problem is found.

Regarding the current problem i need that Extract method will not change the document itself as i am saving it and i don’t to add new bookmarks that can make problems later. I need a fix for Extract method that will cover this case also.

Thanks,

Stanislav.

tahir.manzoor · November 6, 2012, 7:01am

Hi Stanislav,

Thanks for your inquiry. It would be great if you please share complete details about this scenario. In this case, the BookmarkStart and BookmarkEnd nodes may or may not be inside table. what are your requirements in this case? e.g in attached document the BookmarkStart node is inside second cell of first row and BookmarkEnd node is after table node. Please share all scenarios, we will share the code accordingly.

stanlys2 · November 19, 2012, 4:18am

Hi,

Thank you for the answer and i am sorry for the late reference. We thought a lot about this issue and we came to conclusion that we should avoid this situation and not to try to resolve it as it is indeed impossible to know what to do in each situation. Thank you again for the help.

Thanks,

Stanislav.

tahir.manzoor · November 19, 2012, 10:25am

Hi Stanislav,

Thanks for your feedback.

Please feel free to ask if you have any question about Aspose.Words, we will be happy to help you.