Hi,
I have read the few posts in public forums which talk about XPath use in Aspose products family.
In most cases, the answers consist in providing equivalent solutions in language apis (java, .net, …), i.e. in java by using document.getChildNodes(int nodeType, boolean isDeep)
In my case, I have to localize as faster as possible specific paragraphs in large Word documents (100 <= pages number <= 1000), i.e. paragraphs where outlineLevel equals 1.
So ok, I can search all nodes of type “Paragraph” and for each one, test if getParagraphFormat().outlineLevel + 1 == 1 but performances are very bad for large Word documents.
I do not know exactly xml scheme of Paragraph attribute but xpath request looking like “//Paragraph/ParagraphFormat[OutlineLevel=‘1’]” is it possible now or not ?
Thanks.
Sebastien
Hi Sebastien,
Document doc = new
Document(MyDir + “Table.Document.doc”);<o:p></o:p>
// This expression will extract all paragraph nodes which are descendants of any table node in the document.
// This will return any paragraphs which are in a table.
NodeList nodeList = doc.SelectNodes("//Table//Paragraph");
// This expression will select any paragraphs that are direct children of any body node in the document.
nodeList = doc.SelectNodes("//Body/Paragraph");
// Use SelectSingleNode to select the first result of the same expression as above.
Node node = doc.SelectSingleNode("//Body/Paragraph");
Hi Tahir,
"Expressions that use attribute names are not supported"
Is full support of XPath features planned in any future version of Aspose.Word ?
In my opinion, it is a very important missing feature at this level, particularly in large Word documents where searches by iterations on large results sets like //Paragraph or //Body/Paragraph are not the most efficient approaches (like in my first post example).
Thanks.
Sebastien
Hi Sebastien,
prometil:
In my case, I have to localize as faster as possible specific paragraphs in large Word documents (100 <= pages number <= 1000), i.e. paragraphs where outlineLevel equals 1.
prometil:
Is full support of XPath features planned in any future version of Aspose.Word ?
Hi Tahir,
I can not post or give you my input Word documents because I have non divulgation agreements on theirs. But, I have made a Word document example (very simple) which contains similar document structure to my input documents but without “data volume”.
Anyway, when I make this XPath request: //Paragraph, I have 56 nodes in relative results set, but with //Paragraph/ParagraphFormat[OutlineLevel=‘1’], no result at all (normally, I think I should have 2 nodes : paragraphs which contain runs whose text is Title 2 or Title 3)
So maybe my XPath request is non consistent with your Word xml scheme, I am just based on “Java api relations” between Aspose.Word domain objects.
Thanks.
Sebastien
Hi Sebastien,
Document doc = new
Document(MyDir + “test_aspose.doc”);<o:p></o:p>
Long t = System.currentTimeMillis();
NodeList nodeList = doc.selectNodes("//Paragraph");
System.out.println(System.currentTimeMillis() - t);
Long tDom = System.currentTimeMillis();
NodeCollection paras = doc.getChildNodes(NodeType.PARAGRAPH, true);
System.out.println(System.currentTimeMillis() - tDom);