Bug Report: HTML Definition Lists (DL/DT/DD) Incorrectly Mapped to L/LI in Tagged PDF
| Product | Aspose.HTML for .NET |
|---|---|
| Versions | Aspose.HTML 26.1.0, Aspose.PDF 26.2.0 |
| Platform | .NET 10, Windows 11 |
| Severity | PDF/UA-1 WARNING — every page containing
|
| PDF/UA Clause | 7.1:2.4.1 (ISO 14289-1) |
| Warning Message | “Possibly inappropriate use of a ‘LI’ structure element” |
| Related Ticket | HTMLNET-6957 (Path object not tagged — separate issue) |
Summary
When converting HTML to tagged PDF using Aspose.Html.Converters.Converter.ConvertHTML() with IsTaggedPdf = true, HTML definition lists (<dl>, <dt>, <dd>) are mapped to generic list structure elements (L, LI, Lbl, LBody) in the PDF tag tree.
This produces a structure that PDF/UA-1 validators flag as “Possibly inappropriate use of a ‘LI’ structure element” on every page that contains definition lists. The source HTML contains zero <ul>, <ol>, or <li> elements — all 787 L/LI/Lbl/LBody structure elements in the output PDF originate exclusively from <dl>/<dt>/<dd> elements.
Reproduction Steps
Minimal HTML (test_dl_bug.html)
<!doctype html> DL Mapping Bug Reproduction
Equipment Details
- ID
- FUG001
- Name
- FUG001ABC12
- Type
- Fugitive 3D
- Status
- OP
- Location
- Building A, Room 102
- Coordinates
- 45.123, -93.456
C# Code
using var htmlDoc = new Aspose.Html.HTMLDocument(“test_dl_bug.html”); var options = new Aspose.Html.Saving.PdfSaveOptions(); options.IsTaggedPdf = true; Aspose.Html.Converters.Converter.ConvertHTML(htmlDoc, options, “output.pdf”); // Validate with Aspose.PDF using var pdfDoc = new Aspose.Pdf.Document(“output.pdf”); bool valid = pdfDoc.Validate(“validation.xml”, Aspose.Pdf.PdfFormat.PDF_UA_1); // validation.xml contains: “Possibly inappropriate use of a ‘LI’ structure element”
Expected vs Actual Mapping
| HTML Element | Actual PDF Structure Type | Expected PDF Structure Type |
|---|---|---|
<dl> |
L (List) | Role-mapped DL type, or Table |
<dt> |
LI → Lbl | Role-mapped DT type, or TH |
<dd> |
LI → LBody | Role-mapped DD type, or TD |
The L/LI mapping is semantically incorrect for definition lists. A PDF L (List) with LI children represents an ordered or unordered list with Lbl (bullet/number) and LBody (content) children. Definition list terms (<dt>) are not “labels” in the bullet/number sense, and definitions (<dd>) have a different relationship to their terms than list body content has to a bullet.
Actual PDF Tag Tree (from Debug Dump)
Walking the tag tree of the generated PDF with pdfDoc.TaggedContent.RootElement reveals this structure for each <dl>:
Key observations:
- Each
<dt>and<dd>becomes a separateLI, losing the term–definition pairing - The
Lblchild of eachLIis empty (no bullet or number exists in definition lists) - The validator flags this because standard list items should have meaningful
Lblcontent (bullet markers, numbers) and the structure doesn’t match the L/LI content model
Validation Output
Warning on 19 of 23 pages:
Severity="Warning" Clause="7.1" Code="7.1:2.4.1" — "Possibly inappropriate use of a 'LI' structure element"
Test Results from Real Document
| Document | bret_v5.html (government environmental report) |
|---|---|
| Pages | 23 |
|
87 |
|
0 (zero) |
| L/LI/Lbl/LBody elements in PDF tag tree | 787 (all from
|
| Pages with LI warning | 19 of 23 |
The document contains only <dl class="kv"> definition lists for key-value data (equipment IDs, names, types, locations). It has absolutely no unordered or ordered lists. Every LI warning in the validation output traces back to the DL→LI mapping.
Why the Mapping Is Incorrect
Semantic Mismatch
The PDF specification (ISO 32000-1, Table 336) defines L/LI for sequential lists (bulleted, numbered, or simple item lists). The Lbl child is intended for the list marker (bullet, number) and LBody for the item content. Definition lists have a fundamentally different structure: a term paired with its definition, not a marker paired with body text.
PDF 2.0 Standard Types
PDF 2.0 (ISO 32000-2:2020, Table 369) added standard structure types specifically for definition lists: DL, DT, DD. While PDF/UA-1 is based on PDF 1.7, the correct approach for PDF 1.7 documents is to use role mapping to map custom structure types (e.g., /DL, /DT, /DD) to their closest standard base types, or to use a semantically appropriate alternative like Table/TR/TH/TD for key-value definition lists.
How Other Converters Handle This
| Converter | DL Mapping Approach |
|---|---|
| Adobe Acrobat (HTML export) | Role-mapped custom types with Table fallback |
| Chromium (IronPDF/Puppeteer) | Uses role mapping: DL→L, DT→LI, DD→LI with proper Lbl/LBody split |
| LibreOffice (PDF export) | Custom role-mapped DL/DT/DD types |
Impact
- Every document with
- elements generates LI warnings in PDF/UA validation
- No workaround from consumer code — the structure type mapping happens internally during HTML→PDF conversion; there is no API to control it
- Post-processing is not viable —
StructureTypeonAspose.Pdf.LogicalStructure.StructureElementis read-only; existing structure elements cannot be remapped after conversion - ADA/Section 508 audits — while technically warnings (not errors), automated accessibility audit tools flag these and they require manual justification in compliance reports
- Misleading structure tree — assistive technologies (screen readers) announce definition list content as “list, 8 items” instead of “definition list” with term/definition pairs, reducing comprehension for users relying on semantic structure
Suggested Fix
We see two viable approaches, in order of preference:
Option A: Role-Mapped Custom Types (Preferred)
Add a role map entry in the PDF’s StructTreeRoot that maps custom /DL, /DT, /DD structure types to appropriate standard types. This is the approach recommended by PDF accessibility experts for PDF 1.7 documents:
% In the StructTreeRoot’s RoleMap: /DL /Table % or /L if Table is not appropriate /DT /TH % term = header /DD /TD % definition = data cell
Then use these custom types instead of L/LI when converting <dl>/<dt>/<dd> elements.
Option B: Direct Table Mapping
Map <dl> directly to Table, each <dt>/<dd> pair to a TR with TH (term) and TD (definition). This is semantically appropriate for key-value definition lists and avoids the LI warning entirely.
Environment
| Aspose.HTML | 26.1.0 (NuGet) |
|---|---|
| Aspose.PDF | 26.2.0 (NuGet) |
| Runtime | .NET 10.0, Windows 11 x64 |
| Conversion API | Aspose.Html.Converters.Converter.ConvertHTML() with PdfSaveOptions.IsTaggedPdf = true |
| Validation API | Aspose.Pdf.Document.Validate(path, PdfFormat.PDF_UA_1) |
Request
Please update Aspose.HTML’s tagged PDF renderer to properly handle <dl>, <dt>, and <dd> elements using either role-mapped custom structure types or direct table mapping, so that the resulting PDF tag tree accurately represents definition list semantics and does not trigger PDF/UA-1 “inappropriate LI” warnings.
Windsor Solutions — February 2026