Custom ParagraphFormat.Styles

Hi there
I currently use Aspose.Words for generating a Word document from HTML source code by the DocumentBuilder.InsertHtml Methode. Now I’d like to convert something like <p class="MyStyle">Blahblah</p> into ParagraphFormat.Style = Doc.Styles["MyStyle"]. I tried to create a document Visitor which traverses all paragraph node after inserting but the problem is, that the “MyStyle” information seems to be lost. Also the InsertHtml Method doesn’t seem to try to choose the correct Style but always assigns the standard Style. Is there any other way to perform what I need?

Dear Greetings
Samuel

Hello
Thanks for your request. During InsertHtml whole formatting is taken from HTML snippet. If you insert HTML with no formatting specified, default formatting is used for inserted content.
I cannot reproduce the problem on my side using the following code:

Document doc = new Document();
DocumentBuilder builder = new DocumentBuilder(doc);
string html = @"<html>
        <head>
        <style type='text/css'>
            .myStyle { color:Red;}
        </style>
        </head>
        <body>
            <p class='myStyle'>This is paragraph text.</p>
        </body>
     </html>"
builder.InsertHtml(html);
doc.Save("C:\\Temp\\out.doc");

When you open produced document, you will have a paragraph with “myStyle” applied.
Best regards,

Hello AndreyN
Thanks for your reply. Point is, the style should not be defined within the HTML code but rather be defined within a separate word template. So I load a word template with an exisiting style named “MyStyle” and I want Aspose to choose this format style using the class attribute of the paragraph tag. Its no problem, when I have to code something, just tell me where to start. My observation was, that DocumentBuilder always chooses the standard style no matter what class attribute a p tag within the input HTML code has. Probably because there was no corresponding css information. Well as mentioned, I don’t want to use any CSS information within the HTML code since I need to decouple the formating within the HTML Editor and the formating of the resulting word document.

Hello
Thank you for additional information. I think in this case you can try inserting your HTML with CSS into the one empty document and then insert this document into your template with predefined styles. You should create a style with the same name in your destination document and insert HTML document with ImportFormatMode.UseDestinationStyles option. For example, see the following code:

// Open source HTML.
Document src = new Document("Test.html");
// Open destination document. (The docuemtn contains predefined style with name "myStyle")
Document dst = new Document(@"Test001\dst.doc");
// Append source document to the destination with ImportFormatMode.UseDestinationStyles option.
dst.AppendDocument(src, ImportFormatMode.UseDestinationStyles);
// Save output.
dst.Save("out.doc");

Best regards,

Hi AndreyN
Thanks again for the quick response. This looks like a workaround but is probably not the most elegant way to come along with the issue. Let me explain our current situation in little more detail.
We have an HTML base Editor (implemented by an embedded browser), wich of course uses CSS for formatting that text during the editing. However, that CSS information is not stored within the article database, its pure xhtml instead. Then there are people responsible for the Documents which create a pure word template. Know the application I’am currently writing composes a final Word document using one or more Article (pure xhtml) knowing that for example a text within a
shall be converted into a paragraph using a style named “Silver” like Document.Styles[“Silver”] if such a Style exists within the Word Template.

I could use your approach by ripping apart an incoming pure xhtml document, inserting some trash css information so that the document builder realizes there is actualy some css information available so he creates those document styles. Then append the “Intermediate Document” built by the DocumentBuilder to the final document using ImportFormatMode.UseDestinationStyles.

This would propably work, looks to me however like a pretty nasty hack! :-p

Do you have another idea?

Best regards
Samuel Lörtscher

Hi Samuel,
Thanks for your inquiry.
I’m afraid these techniques are the only work arounds for the time being. I can also suggest a small variation on your proposed solution which doesn’t involve editing the input which might be more suitable for you.
Using Andrey’s suggestion and your proposed solution, but instead of editing the incoming xhtml to include styles tags, you could instead parse the data and for each style name found create a new style in the temporary document with this name. Then you insert the HTML into the document as described and append/insert it into your real document using destination styles. This should result with the content being formatted with the correct styles.
Hopefully this technique is okay for you for the time being. We apologise for any inconvenience.
Thanks

Hi aske012
This is still not gonna work since the word template is not emtpy and we use bookmarks within the word template to define where to put in the article texts. So we cannot simply append the intermediate document to the final (word template) one, since it’s, as mentioned, not empty. I probably have to go this way:

  1. Parse the InputText for any occurence of <p class="[^"]+"[^>]*>, maintain a Hashset of all different class names and generate some fake CSS information into the HTML header.

  2. Replace any class = “bla” by class = “_bla”

  3. Insert the HTML with DocumentBuilder.InsertHtml

  4. Perform a Tree traversal on the final document and replace any ParagraphFormat.Style with an underscore by the not underscored version within the word template.

  5. Remove any Document.Styles which names start with an underscore

But this is a true Mac Gyver kind of a solution :-p

Hi
Thank you for additional information. Actually, Adam and Andrey suggested you to use code like this to insert a document into your master document:
https://docs.aspose.com/words/java/insert-and-append-documents/
So, you should create a new document from your HTML, then use InsertDocument method to insert it into the master document at the desired location.
Unfortunately, your algorithm will not work because you cannot delete styles from the document. So styles with underscores will remain.
Best regards,

The issues you have found earlier (filed as WORDSNET-1432) have been fixed in this .NET update and this Java update.

This message was posted using Notification2Forum from Downloads module by aspose.notifier.
(1)