Hi All,
I am new to Aspose.I went through API documentation.Good documentation :).My requirement is that i want to write some values in word doc(values can have HTML), so for writing HTML i used DocumentBuilder.insertHtml API as well as i want to read some word document which can have HTML in there nodes.I used style mode to extract node values using styles here is my code.
public static ArrayList paragraphsByStyleName(Document doc, String styleName) throws Exception
{
ArrayList paragraphsWithStyle = new ArrayList();
NodeCollection paragraphs = doc.getChildNodes(NodeType.PARAGRAPH, true);
for (Object node : paragraphs)
{
Paragraph paragraph = (Paragraph)node;
if (“Style1”.equals(paragraph.getParagraphFormat().getStyle().getName()) ||
“Style2”.equals(paragraph.getParagraphFormat().getStyle().getName()))
paragraphsWithStyle.add(paragraph);
}
return paragraphsWithStyle;
}
for (Object node : paragraphs)
{
Paragraph paragraph = (Paragraph)node;
System.out.println(paragraph.toTxt());
}
When i run this node’s text value dont give image which i have added :(…neither i am unable to find any API which can return my HTML
Can somebody help me?
Thanks & Regards,
- Amey
Hi
<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" />
Thanks for your inquiry. I think, you can try using the similar approach as suggested here to achieve what you need:
http://www.aspose.com/community/forums/230130/extract-text-with-formatting/showthread.aspx#230130
Hope this helps.
Best regards.
Hi Alexey for thanks your support but i think it will give me whole doc file in HTML no individual node value in HTML right?
Hi
<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" />
Thanks for your inquiry. As I mentioned in the above thread, there is no direct way to get HTML from a particular node of the document. So to achieve this, you should copy the node into an empty document and then convert this document to HTML. So as a result you will get HTML of document, which contains only node, you are interested in.
Best regards.
Thanks again so as per my code above..
Paragraph paragraph = (Paragraph)node;
System.out.println(paragraph.toTxt());
This has to be replaced such that i will have to getchildnodes of each Paragraph & then append in new document right? so for each node i will have to do this right?
Hi Amey,
That is partly correct. You don’t need to get ChildNodes for each paragraph, you can just run ImportNode with the second parameter as true so that it automatically imports all of paragraph’s children as well. You can use the code in the link that Alexey posted above like this:
You should first create a tempory document to import the nodes.
Document tempDoc = new Document();
Then take each node (and its children) while keeping its formatting by using this line of code:
tempDoc.FirstSection.Body.AppendChild(tempDoc.ImportNode(node, true, ImportFormatMode.KeepSourceFormatting));<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" />
This will import the node into the temporary document that you create. The true parameter means it copies the node’s children as well, ImportFormatMode.KeepSourceFormatting means that the formatting of the node is kept from the original document.
You can then send your tempDoc to Alexey’s ConvertDocumentToHtml method which will return the HTML of that node and its children.
Please ask if you need any further help.
Thanks,
for line
Hi Amey,<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" />
The node is what you need to import into a new temp doc so you can save the output as HTML. Remember that paragraph is a node, so using the code included in your first post it would look like this:
NodeCollection paragraphs = doc.getChildNodes(NodeType.PARAGRAPH, true);
for (Object node : paragraphs)
{
tempDoc.FirstSection.Body.AppendChild(tempDoc.ImportNode(node, true, ImportFormatMode.KeepSourceFormatting)
}
}
Thanks,
Thnx a lot…
Hi
<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" />
Thank you for additional information. Unfortunately, there is no other way to get HTML of a particular node.
Also, as I can see your document contains FormFields. I suppose, you just need to get values of these FormFields. If so, please see the following link to learn how to work with FormFields:
Best regards.
Thanks but i am interested in extracting
Sdsdsdsd
sd23235444444444
sdsd
hey alexy/aske can you help me?
Hi Amey,
If you are looking for a way to save certain nodes with their formatting to a database then there is a way to achieve this without converting them to HMTL. Please take a look at the documentation here. It shows how you can save a document to a database and then retrieve it. In your situation where you want to save just specific nodes then you can use the same method as above, importing specific nodes to a tempdoc and then saving the tempdoc to a database.
If you really require the nodes to be in pure HTML, you could think about just keeping the tags you need from the converted node and removing any ones you don’t such as “” and ”. This should allow you to work with individual nodes in HTML code.
Could you please clarify the problem you are having with setHtmlExportImagesFolder?
Thanks,
Hi
<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" />
Thank you for additional information. You can specify ImageFolderAlias, so your images will be available from web. Please see the following link for more information:
Also, you should note that HTML and MS Word formats are very different and it is quite difficult and sometimes impossible to preserve all features of MS Word documents in HTML. Here you can find more information about Word document’s features which are supported/unsupported upon exporting to HTML:
Best regards.
1:I am not able 2 distinguish between HtmlExportImagesFolderAlias &
ok i got some help from HtmlExportImagesFolderAlias property which will avoid local paths now remains issue with paragraphs help me on that.
Hi
<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" />
Thank you for additional information. It is perfect that you already resolved the problem with image url.
Regarding paragraphs, since these three items in your document are three paragraphs, Aspose.Words outputs them as three paragraphs. Please see the attached screenshot.
Best regards.
alexey those are HTML paragraphs ie some HTML text assigned against that node.
Hi
<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" />
Thank you for additional information. There is no difference how you inserted paragraphs (or other nodes) into the documents, anyways they are regular paragraphs.
When you insert HTML into the document, this HTML is parsed into DOM (Document Object Model), later upon saving DOM is written in the appropriate format, depending on SaveFormat you specified.
Best regards.