<!doctype html>
Bug Report: Path Objects Not Tagged as Artifacts in Tagged PDF Output body { font-family: Segoe UI, Arial, sans-serif; font-size: 10pt; line-height: 1.5; color: #222; max-width: 900px; margin: 20px auto; padding: 0 16px; } h1 { font-size: 16pt; color: #1a1a2e; border-bottom: 3px solid #1a1a2e; padding-bottom: 4px; margin-bottom: 8px; } h2 { font-size: 13pt; color: #16213e; border-bottom: 1px solid #ccc; padding-bottom: 3px; margin: 20px 0 8px 0; } h3 { font-size: 11pt; color: #0f3460; margin: 14px 0 4px 0; } pre { background: #f5f5f5; border: 1px solid #ddd; border-radius: 4px; padding: 10px 14px; font-family: Consolas, monospace; font-size: 9pt; overflow-x: auto; white-space: pre-wrap; line-height: 1.4; } code { font-family: Consolas, monospace; font-size: 9pt; background: #f0f0f0; padding: 1px 4px; border-radius: 2px; } table.info { border-collapse: collapse; margin: 8px 0; font-size: 9pt; } table.info th, table.info td { border: 1px solid #999; padding: 4px 8px; text-align: left; vertical-align: top; } table.info th { background: #e8eaf6; width: 160px; } .error { background: #fdecea; border-left: 4px solid #d32f2f; padding: 8px 12px; margin: 8px 0; } .note { font-style: italic; color: #555; margin: 6px 0; } .tag { display: inline-block; font-size: 8pt; font-weight: bold; padding: 1px 6px; border-radius: 3px; background: #d32f2f; color: #fff; } hr { border: none; border-top: 1px solid #ddd; margin: 20px 0; }Bug Report: Path Objects Not Tagged as Artifacts in Tagged PDF Output
| Product | Aspose.HTML for .NET + Aspose.PDF for .NET |
|---|---|
| Versions | Aspose.HTML 26.1.0, Aspose.PDF 26.2.0 |
| Platform | .NET 10, Windows |
| Severity | PDF/UA-1 FAILURE — every page, every document |
| PDF/UA Clause | 7.1:1.1 (ISO 14289-1, Section 14.8) |
| Error Message | "Path object not tagged" |
Summary
When converting HTML to tagged PDF using Aspose.Html.Converters.Converter.ConvertHTML()
with IsTaggedPdf = true, decorative vector path operators generated from CSS borders
and background fills are not wrapped in BMC("Artifact")/EMC marked-content sequences.
This causes a "Path object not tagged" PDF/UA-1 validation error on every page of every converted document.
These paths are purely decorative (CSS border on headings, table cell borders, background-color fills)
and should be marked as Artifacts per PDF/UA-1, which requires that all content in a tagged PDF
is either part of the structure tree or explicitly marked as an artifact.
Reproduction Steps
Minimal HTML (test_path_bug.html)
<!doctype html>
<html lang="en">
<head>
<meta charset="utf-8" />
<style>
h2 {
background-color: #D3D3D3;
border: 1px solid #000;
padding: 4px 6px;
font-size: 11pt;
}
table {
width: 100%;
border-collapse: collapse;
}
th, td {
border: 1px solid #000;
padding: 4px 5px;
}
th { background-color: #D9D9D9; }
</style>
<title>Path Bug Reproduction</title>
</head>
<body>
<h1>Test Document</h1>
<h2>Section with border and background</h2>
<table>
<thead>
<tr><th scope="col">Column A</th><th scope="col">Column B</th></tr>
</thead>
<tbody>
<tr><td>Value 1</td><td>Value 2</td></tr>
</tbody>
</table>
</body>
</html>
C# Code
using var htmlDoc = new Aspose.Html.HTMLDocument("test_path_bug.html");
var options = new Aspose.Html.Saving.PdfSaveOptions();
options.IsTaggedPdf = true; // Enable tagged/accessible PDF output
Aspose.Html.Converters.Converter.ConvertHTML(htmlDoc, options, "output.pdf");
// Validate
using var pdfDoc = new Aspose.Pdf.Document("output.pdf");
bool valid = pdfDoc.Validate("validation.xml", Aspose.Pdf.PdfFormat.PDF_UA_1);
// valid == false, validation.xml contains "Path object not tagged" on every page
Expected vs Actual
| Expected | Decorative path operators (CSS borders, background fills) are wrapped in
BMC("Artifact") ... EMC marked-content sequences, so they are excluded
from the structure tree and pass PDF/UA-1 validation. |
|---|---|
| Actual | Path operators (re, m, l, S,
f, etc.) appear in the content stream outside any marked-content
sequence. The PDF/UA validator reports "Path object not tagged" for every page. |
Validation Output
Severity="Error" Clause="7.1" Code="7.1:1.1(14.8)" — "Path object not tagged"
Test Results from Real Documents
| Document | Pages | "Path not tagged" Errors |
|---|---|---|
| bret_v2.html (definition lists + 1 table) | 12 | 12 (one per page) |
| dannie_v2.html (large table with nested sub-tables) | 2 | 2 (one per page) |
| Minimal repro (heading + table) | 1 | 1 |
The error occurs regardless of document complexity. Any HTML with CSS borders or background colors will trigger it on every page.
Content Stream Analysis
Inspecting the PDF content stream shows the problem clearly. The path operators that draw CSS borders
and background fills sit outside any BDC/BMC/EMC marked-content
block:
% === Tagged text content (correct) ===
/P <</MCID 0>> BDC % Begin marked content (paragraph)
BT
/F1 11 Tf
(Section with border) Tj
ET
EMC % End marked content
% === Decorative border (BUG: no BMC/EMC wrapper) ===
0.502 0.502 0.502 rg % Set gray fill color
56.7 725.3 680.0 18.5 re % Draw rectangle (heading background)
f % Fill
0 0 0 RG % Set black stroke
56.7 725.3 680.0 18.5 re % Draw rectangle (heading border)
S % Stroke
% ^^^ These operators are NOT inside any marked content sequence
What the output should look like:
% Decorative border (correct: wrapped as Artifact) BMC /Artifact % Mark as artifact 0.502 0.502 0.502 rg 56.7 725.3 680.0 18.5 re f 0 0 0 RG 56.7 725.3 680.0 18.5 re S EMC % End artifact
CSS Properties That Trigger This Bug
| CSS Property | PDF Path Operation | Tagged? |
|---|---|---|
border: 1px solid #000 | re ... S (rectangle + stroke) | No |
background-color: #D3D3D3 | re ... f (rectangle + fill) | No |
border-collapse: collapse on table | m ... l ... S (line segments) | No |
border-bottom: 1px solid | m ... l ... S (line + stroke) | No |
hr element | m ... l ... S | No |
Every CSS property that generates a vector drawing operation in the PDF content stream is affected.
Impact
- 100% failure rate — every HTML document with any CSS borders or backgrounds fails PDF/UA validation
- Cannot be worked around from code — the path operators are generated internally by Aspose.HTML's rendering engine; there is no API to control artifact tagging during conversion
- Post-processing is extremely fragile — manually inserting
BMC("Artifact")/EMCoperators after conversion requires matching Aspose's internal operator ordering, which can change between versions - ADA/Section 508 compliance — this is the single remaining blocker for full PDF/UA-1 compliance in our conversion pipeline (all other issues have been resolved via post-processing or HTML author fixes)
Workaround Attempted (Not Viable)
We considered post-processing the PDF content stream to wrap orphaned path operators in
BMC("Artifact")/EMC blocks. This approach is not viable because:
- It requires parsing Aspose's internal operator ordering to distinguish "orphaned" paths from paths that are already inside a marked-content sequence
- The operator indices and groupings can change with any Aspose version update
- There is no reliable way to distinguish decorative paths (should be artifacts) from meaningful paths (e.g., SVG content that should be in the structure tree)
- Inserting operators at wrong positions can corrupt the content stream
Environment
| Aspose.HTML | 26.1.0 (NuGet) |
|---|---|
| Aspose.PDF | 26.2.0 (NuGet) |
| Runtime | .NET 10.0, Windows 10/11 x64 |
| Conversion API | Aspose.Html.Converters.Converter.ConvertHTML() with PdfSaveOptions.IsTaggedPdf = true |
| Validation API | Aspose.Pdf.Document.Validate(path, PdfFormat.PDF_UA_1) |
Request
Please update Aspose.HTML's tagged PDF renderer to wrap all decorative path operators (those generated
from CSS borders, background colors, and fills) in BMC("Artifact") ... EMC marked-content
sequences. This is required by PDF/UA-1 clause 7.1 (ISO 14289-1, 14.8) which states that all content
in a conforming tagged PDF must be either part of the structure tree or marked as an artifact.
Windsor Solutions — February 2026