We're sorry Aspose doesn't work properply without JavaScript enabled.

Free Support Forum - aspose.com

DocumentBuilder.InsertHtml - does it ignore classes and styles sometimes?

Hi:

In our application we need to use DocumentBuilder.InsertHtlm to add html content that comes from a Telerik RadEditor. Generally this process seems to work quite well.

But recently we saw a situation where some tabular data from the RadEditor was not rendered correctly. I am attaching the html that caused the problem. When I use this html to generate a document in the IE Browser, the html renders the tables properly.

But when I try and render the tables using the above-mentioned method the horizontal spacing of the tables is incorrect.

The last table is contained in a div (

). It seems as though the style attribute of the div is ignored completely.

Is there any chance that this method ignores certain settings for either style or class attributes? If so, what can we do to render our documents properly?

Hi

<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" />

Thanks for your request. The problem occurs because, currently, Aspose.Words does not support inheriting styles from parent elements.

Currently, Aspose.Words expects that font formatting is set in element, formatting of paragraph – in

or

elements etc…

Best regards.

Thank you for your quick response. I expected that something like this would be true.

Unfortunately we are using the Telerik Rad Editor which provides us with the html that we pass on to you. Because of this we have not, to this point, attempted any further editing of the raw html as it comes from the Rad Editor.

If I read yur comments correctly we will have no choice but to parse through the html and apply the required edits in order to cause the html to render properly in both environments. I have had to do something similar in a previous project and found the process extremely error prone and difficult.

But if that is our only solution I would want to ask your advace as to how we can best accomplish it.

Hi,

<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" />

Thanks for your inquiry. Yes, I agree that parsing HTML is very difficult and error prone approach.

I think the best way to process HTML is using HtmlAgilityPack. You can download HtmlAgilityPack here:

http://www.codeplex.com/htmlagilitypack

Hope this could help you.

Best regards.

Hi:

First off I want to thank you for all of your help over the past few months. As you can probably tell from my prior questions and issues we are working on a project that uses your software to produce sophisticated documents that are built from a variety of different input sources. The input can be as straightforward as simple text strings or as complex as full blown objects created by other third party vendors. Generally your product does an excellent job of combining all of this input to produce some very good documents.

Recently however our users entered a document (html-based) into our system by way of a Telerik Rad Editor. I believe that the document may have been created in Microsoft Word first and then imported intro the Telerik tool. The document consists of a paragraph of text followed by two tables. It renders perfectly in the Rad Editor.

But when we acquire the exact same html and attempt to reproduce it as part of a larger Word document (using Aspose.Words) the tables print incorrectly. We believe that the problem is created by the structure of the HTML. (I don't have a copy with me now but I will send one to you first thing tomorrow).

We have been hoping to avoid doing any altering of html using regular expressions or other third-party tools such as HTMLAgilityPack. But for situations like what I am describing here it would seem that we are left with no choice but to attempt to 'clean' our html prior to presenting it to your program for rendering. We do this with great reluctance because it has been our experience in the past that this process can be tedius, error prone, difficult and frequently introduces new errors that are very hard to debug.

We are completely unfamiliar with the HTMLAgilityPack tool that you had previously suggested that we look into. When I send you the html shortly you will see that our problem seems to strem from the fact that our 'style' tags are embedded in

, ,
tags etc. Since you seem to be far more aware of the capabilities of HTMLAgilityPack could you possibly suggest a resource where we can go to see how we could use it and/or regular expressions to restructure our html so that your product can recognize it and parse through it properly?

As promised I will send the html in a few hours. In the meantime I will download the HTMLAgilityPack and see if it comes with any documentation.

I am to move on to another project in 7 days and it would be really helpful to my colleagues if we could at very least come up with a 'plan of attack' to deal with what seems to be an ongoing issue before I depart.

Once again thank you for your help and support.

Hi

<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" />

Thanks for your request. HTMLAgilityPack downloadable package includes CHM documentation. So you can use it.

Best regards.

For the time being we will not have htmlAgilitypack available so we have limited our 'cleaning' of html to regular expressions for now.

I am in the process of revieewing each one of our documents by hand. While doing this I have noticed that, while much of the style information is contained in

, tags etc. and will, I guess, be ignored by your poduct, much of it is contained in

s and . This is sufficient so that most of our documents should be able to be handled acceptably by you.

I will continue this review and, as best I can revise the html to meat your needs.

I am attaching some html that almost produces a completely acceptable document. The only issue is that the tables within the document should be centered on the report. They are not.

The html seems well formed and contains two DIV taks that enclose the paragraphs:

Unfortunately the tables are not centered. What are we doing wrong?

Thanks for your help.

Hi

<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" />

Thanks for your inquiry. This occurs because currently, Aspose.Words does not support inheriting styles from parent elements. I already linked your request to the appropriate issue and you will be notified as soon as it is resolved.

Best regards.

The issues you have found earlier (filed as WORDSNET-2021) have been fixed in this .NET update and this Java update.


This message was posted using Notification2Forum from Downloads module by aspose.notifier.
(25)