I am having a hard time finding the information I need. After about 30 minutes of scouring the forums, I figured simply posting my question would be easier.
My problem is simple. I have a series of word documents with simple text enhancements. I need to pull the text out of each document, convert them to HTML, then, say, store them in a DB table. The only enhancements I am concerned with are:
- Bold
- Italic
- Lists (I am told they are always bulleted, but who knows)
- Anchors
Opening the files are easy enough. Saving them as HTML is easy enough. However, at this point, there is so much garbage in the HTML that it is really a pain to decipher. For starters, everything is wrapped in span tags, the font differences are inline styles, and even the white space has funny tags around them.
Is there a way to use aspose to produce even slightly better HTML?
My fallback option is to simply save them as is to HTML, then with the use of some crafty regular expressions rewrite them in a somewhat better format. However, if I don’t have to do that, it would save a great deal of time.
Has anyone tried this before?