I am wondering if its possible to display HTML data when using Aspose.Words and generating the pdf.
We get data from an external source that has HTML tags, in our code, we have escaped the HTML tags to display only the data. If we did not escape the HTML tags, it would display data with HTML tags when the output(PDF) is generated.
Example: If I have indented lists in the external source, can I display them as indented lists in my output?
Summary
Aspose.Words can read raw HTML, convert it to native Word objects, and then render those objects correctly when the document is saved as PDF. The key is to provide the HTML without HTML‑entity escaping so that the built‑in HTML importer can recognise tags such as <ul>, <ol>, and <li> and turn them into proper Word lists.
Why it happens
When the HTML is escaped (e.g., <ul>…</ul>) Aspose.Words treats the content as plain text, so the tags appear literally in the output PDF. The library’s HTML importer parses standard HTML elements and maps them to Word structures (lists, tables, headings, etc.). If the importer receives genuine HTML, the resulting Word document contains native list objects, which are rendered with the correct indentation in PDF.
What to do next
Supply unescaped HTML – If the source string is HTML‑entity encoded, decode it first (e.g., System.Net.WebUtility.HtmlDecode in .NET) so that the markup is raw HTML.
Import the HTML into the document – Choose one of the following approaches:
a. Load the HTML into a temporary Document and insert it
using Aspose.Words;
using System.IO;
using System.Text;
// Example HTML fragment (replace with your actual content)
string html = "<ul><li>First item</li><li>Second item</li></ul>";
// Load the HTML into a separate document
var loadOptions = new LoadOptions { LoadFormat = LoadFormat.Html };
var htmlDoc = new Document(new MemoryStream(Encoding.UTF8.GetBytes(html)), loadOptions);
// Main document that will become the PDF
var mainDoc = new Document();
var builder = new DocumentBuilder(mainDoc);
// Insert the HTML document at the current cursor position
builder.InsertDocument(htmlDoc, ImportFormatMode.KeepSourceFormatting);
// Save as PDF – the list will be rendered with proper indentation
mainDoc.Save("Result.pdf");
b. Insert the HTML directly with InsertHtml
using Aspose.Words;
var builder = new DocumentBuilder(new Document());
builder.InsertHtml("<ul><li>First item</li><li>Second item</li></ul>");
builder.Document.Save("Result.pdf");
Tip: If you encounter unexpected list formatting, the first method (loading into a separate document) is generally more robust.
Validate the HTML – Ensure the fragment is well‑formed (all tags closed, UTF‑8 encoding). Aspose.Words processes structural tags; complex CSS or JavaScript is ignored.
Verify the PDF – Open the generated Result.pdf and confirm that the lists appear with the expected indentation. If further styling is required, you can adjust the Word list using the List and ListFormat APIs after insertion.
Next steps if issues persist –
Double‑check that the HTML string is truly unescaped.
Try the alternative import method (a vs b).
If the problem remains, provide the exact HTML snippet and the language/SDK version you are using so that a more targeted code example can be prepared.
I could not fully verify this explanation against the retrieved sources. Please double-check the behaviour in your environment and confirm the exact SDK/version.
SDK version unclear — refer to the official release notes before applying this guidance.