Convert RTF Word Document to XHTML (or HTML5) using C# .NET Code

I have been asked if Aspose is able to convert an rtf document to xhtml. Admittedly I know very little about XHTML. Using words I see that I can convert rtf to html but not XHTML specifically, is this possible?

Alternatively I have discovered that I can convert rtf to pdf using Words and then convert to XHTML using Aspose PDF. Is this the best way?

@aultmike,

You can use HtmlVersion.Xhtml option and specify true to HtmlSaveOptions.ExportXhtmlTransitional property to convert RTF to XHTML with Aspose.Words for .NET API alone. Please check the following C# code:

Document doc = new Document("C:\\temp\\input.rtf");

HtmlSaveOptions options = new HtmlSaveOptions()
{
    HtmlVersion = HtmlVersion.Xhtml,
    ExportXhtmlTransitional = true,
    PrettyFormat = true
};

doc.Save("C:\\Temp\\21.3.html", options);

Great! Thanks for the reply! Is this possible in the version I am using? 15.2.0?

@aultmike,

I am afraid, this code will not work with 15.2.0 version of Aspose.Words. According to release notes, we added HtmlVersion Enumeration in 16.11.0 version of Aspose.Words for .NET.

Please also note that we do not provide support for older released versions of Aspose.Words. We also do not provide any fixes or patches for old versions of Aspose APIs. All fixes and new features are always added into new versions of our APIs. So, we suggest you to please upgrade to the latest (21.3) version of Aspose.Words for .NET.

Greetings, Could you review the attached file? I would like to convert this to XHTML but the resulting output doesnt look like the doc. Is there anything that can be done?

FYI this document started as a PDF. I converted it a DOCX using Aspose.PDF.202102111034020970.zip (4.3 MB)

@aultmike,

Aspose.Words tries to mimic the behavior of MS Word. It is not always guaranteed that the Aspose.Words and MS Word generated output HTML files will look exactly the same in web browsers as the input Word documents. This is because of file format differences between Word and HTML (and some other limitations). I have attached MS Word 2019 and Aspose.Words 21.4 generated (x)HTML files here for your reference:

So, this is an expected behavior. Alternatively, may be you should convert this document to ‘HTML Fixed format’ i.e. converting the Word document to the HTML format using absolutely positioned elements (HtmlFixed) should produce the expected output for you. But, please note that HtmlFixed is not a XHTML format:

Document doc = new Document("C:\\temp\\202102111034020970\\202102111034020970.docx");

HtmlFixedSaveOptions options = new HtmlFixedSaveOptions()
{
    ExportEmbeddedImages = true,
    ExportEmbeddedCss = true,
    PrettyFormat = true
};

doc.Save("C:\\Temp\\202102111034020970\\21.4.html", options);

Great! Thanks for the reply! That DID work better! I’m attaching an example with an image… instead of being behind the text its below it… using the XHTML setting is there anyway to remedy this?with image.zip (5.7 MB)

@aultmike,

But, MS Word 2019 also produces a similar output when saving this DOCX to HTML and Aspose.Words tries to mimic the behavior of MS Word.

So, this seems to be a limitation of HTML based formats.

You may use Aspose.Words’ Html-Fixed format to workaround this problem as well.

Thanks! I appreciate your time and attention.