Html to Rtf problem

Hello
If you run my sample to convert html to rtf you’ll see the result is not shown correctly both in Word 2021 or Windows 10 WordPad:
WindowsApplication396.zip (6.2 MB)

WordPad will not render back colors at all.
Word 2021 will not honor the table width and spacings.
Any workaround? :slight_smile:

@australian.dev.nerds Please note, Aspose.Words is designed to work with MS Word documents. HTML documents and MS Word documents object models are quite different and it is not always possible to provide 100% fidelity after conversion one format to another. When import HTML documents Aspose.Words in most cases mimics MS Word behavior. If you try converting your HTML to RTF using MS Word, you will get almost the same result as Aspose.Words result:
out.zip (11.4 KB)

1 Like

Hello and thanks, when I open html in Word 2021 and save as RTF I see:

OK, anyway, when loading an html or mhtml to be saved as RTF, which Rtf save options should be used IF I need to import the generated RTF directly to Aspose Email as the BodyRTF of MapiMessage?

Not sure if Outlook uses a specific kind of RTF for Mapi Message Body RTF.
Thanks.

@australian.dev.nerds Content in your document is formatted with DIVs. There is no direct analog of DIV elements in MS Word documents, usually the DIV s are converted to paragraphs in Aspose.Words DOM.
You can set HtmlLoadOptions.BlockImportMode to preserve DIVs and MS Word does:

HtmlLoadOptions opt = new HtmlLoadOptions();
opt.BlockImportMode = BlockImportMode.Preserve;
Document doc = new Document(@"C:\Temp\in.htm", opt);
doc.Save(@"C:\Temp\out.rtf");

out.zip (2.3 KB)

Regarding using RTF as mail body it is better to ask Aspose.Email team.

1 Like

Thanks, when converting Eml or Mhtml (both are the same) to Pdf can we control to add the source attachments to be added to the Pdf as attachments?

Also, when using Document.Save can we check if save was successful?
Here something mentioned:
https://reference.aspose.com/words/net/aspose.words.saving/saveoutputparameters/

Not sure if can be used to check the save success status or not?
Some SDKs return boolean for save/load operations.
If yes, how to perform this check?

  • Kindly note that I use Document.Save to save as in all possible formats of Aspose Words, so shall be usable on all save types.

When saving as html with HtmlSaveOptions, css styles are saved to disc with this text inside:

/*******************************************/
/* Styles for E:\SubContact27.htm */
/*******************************************/

How to disable all of such similar comments inside all output file types, not necessary and will increase file size? Thanks :slight_smile:

@australian.dev.nerds

You can add embedded OLE object into the document using DocumentBuilder.InsertOleObject. Please see our documentation to learn how to work with embedded OLE objects:
https://docs.aspose.com/words/net/working-with-ole-objects/

For example see the following simple code:

Document doc = new Document();
DocumentBuilder builder = new DocumentBuilder(doc);
builder.InsertOleObject(@"C:\Temp\in.docx", false, true, null);
PdfSaveOptions opt = new PdfSaveOptions();
opt.EmbedAttachments= true;
doc.Save(@"C:\Temp\out.pdf", opt);

If there are problems upon saving or processing document, Aspose.Words throws an exception.

Unfortunately, there is no option to disable this comment. You can remove it by postprocessing the generated CSS file.

1 Like

Thanks, source eml/ mhtml can have any kind of file type attached, what about target Pdf document, also can have any kind of file type attached?

What other formats support such attachments? Doc, Docx etc?

No any save option to automate adding of source attachments to the target file?

Sure, but what about save output parameters? Since didn’t find enough info on docs, or sample to check it.

Do you find it wise to add a save option to disable it? Consult some developers, to me, seems unnecessary while increasing the output size in huge amounts of data.

@australian.dev.nerds

  1. Embedded OLE objects are supported in DOC, DOCX, RTF, XML (Word 2003 and Word 2007 XML), ODT and PDF formats.

  2. SaveOutputParameters is returned to the caller after a document is saved and contains additional information that has been generated or calculated during the save operation. The caller can use or ignore this object. Currently this object contains only Content-Type of the saved document.

  3. We will consider adding such option. I have logged the feature request as WORDSNET-25621.

Hello,
First of all, when loading Html or Mhtml and saving as Docx or Rtf, the HR tag is lost or not rendered:
<hr>

Second, as my 1st post in this topic, if you check my source Html and result files:
docs.zip (91.5 KB)

When saving as PDF and other formats, it’s still good, but saving to Docx and Rtf is not:
The problem with Docx is the back color/ table back color not extending to the fixed width, like when saving to other formats!

The problem with Rtf (which is think not your fault, just looking for workaround) is that the back color is not rendered in Rtf, I think Rtf supprts Text Highligh back color.
When that back color is not rendered, texts with White color will be invisible in the target file.

Before you advised about BlockImportMode.Preserve but to convert my html to documents with the same output look, since you mentioned there’s no equalivant of DIV elements in Word documents, I can change DIV to something else that can be converted to Word perfectly, what do you recommend to change my DIVs to?
Thanks.

@australian.dev.nerds

I cannot reproduce the problem on my side. I have used the following simple HTML as an input document and as I can see horizontal rule is properly preserve in output documents:

<html>
<body>
    <p>Some paragraph</p>
    <hr />
    <p>Some other paragraph</p>
</body>
</html>

Code for testing:

Document doc = new Document(@"C:\Temp\in.html");
doc.Save(@"C:\Temp\out.docx");
doc.Save(@"C:\Temp\out.rtf");
doc.Save(@"C:\Temp\out.pdf");

Input and output documents: test_docs.zip (28.0 KB)

Background is properly rendered if open RTF document in MS Word or OpenOffice. But simple viewers like WordPad does not show background.

You can use regular paragraph instead of DIV tags. P html tag corresponds a Paragraph node in Aspose.Words DOM.
Alternatively, you can use a centered table with cell paddings and fixed width.
But you should note, in general it is impossible to preserve original HTML document formatting when convert HTML to word formats, due to difference in their document object models and rules used by browsers and MS Word.

1 Like

Hi, I mean this in Wordpad

I will make sample project for other cases :slight_smile:

Sorry those results were not shown correctly in Wordpad, Word 2021 seems to render them correctly.

Although docx version seems to need a minor touch, if you compare these 2 pics, docx first and last row are higher than rtf:
images.zip (165.0 KB)

Not critical to me, if you think worth to check:
WinApp2.zip (16.8 KB)

@australian.dev.nerds

You can use HTML like the following:

<html>
	<body>
		<div>
			<p>
				<span>Test </span><span style="background-color:#ffff00">test</span><span> test</span>
			</p>
		</div>
	</body>
</html>

highlight color in this case will be preserved in RTF and will be properly displayed in both MS Word and WordPad.

I am afraid, WordPad cannot be used as an etalon viewer, due to it’s limited functionality. As you know MS Word documents are flow documents and their appearance in the viewer depends on the viewer’s layout engine implementation and the level of the document format specification support. The same document might looks differently when open it in MS Word, Open Office and WordPad.