Customizing TOC entry styles in HTML for conversion to DOCX

Hello everyone,

I’m working on a Java project using Aspose.Words 24.8, where I receive an HTML file containing various headings (h1, h2, h3) and a section to generate a dynamic Table of Contents (TOC) when converting to DOCX. Here is a minimal example to reproduce:

<html>
<head>
    <meta content="text/html; charset=utf-8" http-equiv="Content-Type"/>
    <meta content="text/css" http-equiv="Content-Style-Type"/>
    <title></title>
    <style type="text/css">
        @page Section_1 { size:595.3pt 841.9pt; margin:70.85pt 85.05pt; -aw-footer-distance:35.4pt; -aw-header-distance:35.4pt }div.Section_1 { margin:70.85pt 85.05pt; page:Section_1 }
    </style>
</head>
<body style="line-height:108%; font-family:Aptos; font-size:11pt">
  <div class="Section_1">
    <div style="-aw-sdt-tag:''">
      <p style="margin-top:12pt; margin-bottom:0pt; page-break-inside:avoid; page-break-after:avoid; line-height:108%; font-size:16pt">
        <span style="font-family:'Aptos Display'; color:#0f4761">Contents</span>
      </p>
      <p style="margin-top:0pt; margin-bottom:5pt">
        <span style="-aw-field-start:true"></span>
        <span style="-aw-field-code:' TOC \\o &quot;1-3&quot; \\h \\z \\u '"></span>
        <span style="-aw-field-separator:true"></span>
      </p>
      <p style="margin-top:0pt; margin-bottom:8pt">
        <span style="-aw-field-end:true"></span>
        <span style="-aw-import:ignore">&#xa0;</span>
      </p>
    </div>
    <p style="margin-top:0pt; margin-bottom:8pt">
      <span style="-aw-import:ignore">&#xa0;</span>
    </p>
    <h1 style="margin-top:18pt; margin-bottom:4pt; page-break-inside:avoid; page-break-after:avoid; line-height:108%; font-size:20pt">
      <a name="_Toc175311268">
        <span style="font-family:'Aptos Display'; font-weight:normal; color:#0f4761">Chapter 1</span>
      </a>
    </h1>
  </div>
</body>
</html>

The Java code imports the HTML, updates fields, and saves the document as DOCX:

Document document = new Document("new-document.html");

document.updateFields();
document.updatePageLayout();

document.save("new-document.docx");

This works fine: everything is dynamic, with updatable page numbers. However, I now need to customize the styles of the TOC entries. Unfortunately, I haven’t found a way to do this directly through the HTML. Note that the TOC entries are generated after field updates, so I can’t apply inline styles. I need to set global styles instead.

I’m aware that I can customize TOC styles in Java code like this:

document.getStyles().getByStyleIdentifier(StyleIdentifier.TOC_1).getFont().setBold(true);

But my conversion class needs to handle HTML files from different sources, each with specific style requirements. So I need to find a way to define these TOC styles (TOC_1, TOC_2, TOC_3) directly in the HTML. Is it possible to add custom styles in the HTML head section with specific class names so that Aspose.Words will recognize and apply them correctly?

I’ve tried experimenting with class names like TOC1, _TOC_1, and _Toc1, but nothing seems to work. The generated TOC entries always use the default formatting.

Thank you in advance for any guidance or advice.

sample-html-and-java-code.zip (1.3 KB)

@vitorcd I am afraid there is no way to define TOC styles in HTML. Please note, Aspose.Words is designed to work with MS Word documents. HTML documents and MS Word documents object models are quite different and it is not always possible to provide 100% fidelity after conversion one format to another. In most cases Aspose.Words mimics MS Word behavior when work with HTML.

For other styles, I see that it is possible to associate a css class with a native aspose document style, by including a rule with the -aw-style-name property, such as:

-aw-style-name: hyperlink;
-aw-style-name: heading1;
-aw-style-name: heading2;

Could you introduce the customization of the TOC styles in a future release by adding support for other values for this property, such as toc1, toc2 etc.?

I understand that conversion from HTML it is not the main feature of Aspose.Words, but in my organization we use it exclusively for conversion from and to HTML and PDF, and it is good enough. There are only a few rough spots that are preventing us from abandoning other conversion tools that we still use to complement what is lacking in Aspose.Words.

@vitorcd
We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.

Issue ID(s): WORDSNET-27319

You can obtain Paid Support Services if you need support on a priority basis, along with the direct access to our Paid Support management team.

Thank you, Alexey! I hope you will be able to address this issue in the future.

1 Like