Hi:
First off I want to thank you for all of your help over the past few months. As you can probably tell from my prior questions and issues we are working on a project that uses your software to produce sophisticated documents that are built from a variety of different input sources. The input can be as straightforward as simple text strings or as complex as full blown objects created by other third party vendors. Generally your product does an excellent job of combining all of this input to produce some very good documents.
Recently however our users entered a document (html-based) into our system by way of a Telerik Rad Editor. I believe that the document may have been created in Microsoft Word first and then imported intro the Telerik tool. The document consists of a paragraph of text followed by two tables. It renders perfectly in the Rad Editor.
But when we acquire the exact same html and attempt to reproduce it as part of a larger Word document (using Aspose.Words) the tables print incorrectly. We believe that the problem is created by the structure of the HTML. (I don't have a copy with me now but I will send one to you first thing tomorrow).
We have been hoping to avoid doing any altering of html using regular expressions or other third-party tools such as HTMLAgilityPack. But for situations like what I am describing here it would seem that we are left with no choice but to attempt to 'clean' our html prior to presenting it to your program for rendering. We do this with great reluctance because it has been our experience in the past that this process can be tedius, error prone, difficult and frequently introduces new errors that are very hard to debug.
We are completely unfamiliar with the HTMLAgilityPack tool that you had previously suggested that we look into. When I send you the html shortly you will see that our problem seems to strem from the fact that our 'style' tags are embedded in
, , tags etc. Since you seem to be far more aware of the capabilities of HTMLAgilityPack could you possibly suggest a resource where we can go to see how we could use it and/or regular expressions to restructure our html so that your product can recognize it and parse through it properly?
As promised I will send the html in a few hours. In the meantime I will download the HTMLAgilityPack and see if it comes with any documentation.
I am to move on to another project in 7 days and it would be really helpful to my colleagues if we could at very least come up with a 'plan of attack' to deal with what seems to be an ongoing issue before I depart.
Once again thank you for your help and support.
|