[Java] read HTML from Doc using Aspose.Words

AndreyN · June 3, 2010, 4:42am

Hi

Thanks for your request. With DocumentVisitor, you can define and execute custom operations that require enumeration over the document tree. Please follow these links to learn more:
https://docs.aspose.com/words/net/how-to-extract-selected-content-between-nodes-in-a-document/
https://reference.aspose.com/words/net/aspose.words/documentvisitor/
So there is no way to achieve what you need using DocumentVisitor.
Best regards,

alexey.noskov · June 3, 2010, 5:00am

Hi

Thanks for your inquiry. You can see the documentation and check whether DocumentVisitor will fulfill your requirements.
https://reference.aspose.com/words/net/aspose.words/documentvisitor/
Best regards.

amey7p · June 6, 2010, 2:03am

Hi Alexey & Adam(live chat support) for helping me, finally i figured out 1 solution to solve this issue, instead of saving images on physical drive i will store them in database using ExportImageSavingEventHandler, since there is no property which can be identified while reading doc file using Aspose , during each reading same image will get saved in DB so now DB will have multiple entries of same image.Thnx a lot

alexey.noskov · June 7, 2010, 2:00am

Hi

Thank you for additional information. To avoid duplication in your DB, you can consider creating your own mechanism to compare images. For example you can suppose that if file size and dimensions of the images are same, then the images are the same. Or you can use more complex logic.
Best regard.

adam.skelton · June 7, 2010, 4:08am

Hi Amey,
Regarding what Alexey suggested you should take a look at implementing a Hashcode, it would be the most reliable. See the link below for an example
http://forums.sun.com/thread.jspa?threadID=5345358
If you want us to help further then please attach your code and a sample document and we will be glad to give you some advice.
Thanks,