Issue with List Formatting When Converting HTML to DOCX Using Aspose.Words

Hello,

I am trying to convert an HTML file to a DOCX file using Aspose.Words. The original HTML file contains lists that are formatted using CSS element selectors. The structure of the HTML is as follows:

<ul>
  <li style="list-style-type: none;">
    <ul>
      <li>Information on the risk of herpetic infection in patients treated by Imlygic</li>
      <li>Information on the risk of disseminated herpetic infection in immunocompromised individuals treated by Imlygic</li>
      <li>Recommendation regarding accidental exposure of Imlygic to HCPs</li>
      <li>To always wear protective gown/laboratory coat, safety glasses and gloves while preparing or administering Imlygic</li>
      <li>To avoid contact with skin, eyes, mucous membranes and ungloved direct contact with injected lesions or body fluids of treated patients</li>
      <li>Instruction on first aid after accidental exposure</li>
      <li>Immunocompromised and pregnant healthcare professionals should not prepare and administer Imlygic</li>
      <li>Recommendation regarding the accidental transmission of Imlygic from patient to close contacts or HCPs</li>
      <li>Instructions on how to behave after administration/accidental transmission, how often the dressing has to be changed, and who should not change the dressing</li>
      <li>Instructions to minimize the risk of exposure of blood and body fluids to close contacts for the duration of Imlygic treatment and for 30 days after the last administration, including avoiding:
        <ul>
          <li>Sexual intercourse without a latex condom</li>
          <li>Kissing if either party has an open mouth sore</li>
          <li>Common usage of cutlery, crockery, and drinking vessels</li>
          <li>Common usage of injection needles, razorblades, and toothbrushes</li>
        </ul>
      </li>
      <li>Adequate waste disposal and decontamination, following the recommendations for disposal of biohazardous waste</li>
      <li>Information on Imlygic use in pregnancy</li>
      <li>Instructions on handling possible adverse events, including providing batch numbers when reporting adverse drug reactions</li>
    </ul>
  </li>
</ul>

The CSS applied to this content is:

ul, ol {
    margin: 0;
    padding: 0;
    list-style-position: outside;
}

/* Apply custom list styling */
ul, ol {
    padding-left: 0.5cm; /* Bullet point distance from left margin */
}

/* Style for list items */
li {
    padding-left: 0.5cm; /* Text distance from bullet point */
    margin-left: 0.5cm; /* Ensure consistent indentation for wrapped lines */
}

When rendering this HTML in a browser, the formatting appears correct. Additionally, when loading this HTML file in Aspose.Words and exporting it as HTML, the rendering remains consistent. However, when converting the HTML file to DOCX, the formatting of lists is altered. The exported HTML from Aspose.Words is as follows:

<ul type="disc" style="margin:0pt; padding-left:0pt">
    <li style="margin-left:28.31pt; padding-left:14.19pt; font-family:serif; -aw-font-family:'Symbol'; -aw-font-weight:normal; -aw-number-format:''">
        <span style="font-family:'Times New Roman'; font-weight:bold">Guide for healthcare professionals</span>
        <span style="font-family:'Times New Roman'"> shall contain the following key elements:</span>
        <ul type="circle" style="margin-right:0pt; margin-left:0pt; padding-left:0pt">
            <li style="margin-left:28.4pt; padding-left:14.15pt; -aw-font-family:'Courier New'; -aw-font-weight:normal; -aw-number-format:'o'">
                <span style="font-family:'Times New Roman'">Information on the risk of herpetic infection in patients treated by Imlygic</span>
            </li>
            <li style="margin-left:28.4pt; padding-left:14.15pt; -aw-font-family:'Courier New'; -aw-font-weight:normal; -aw-number-format:'o'">
                <span style="font-family:'Times New Roman'">Information on the risk of disseminated herpetic infection in immunocompromised individuals treated by Imlygic</span>
            </li>
            <li style="margin-left:28.4pt; padding-left:14.15pt; -aw-font-family:'Courier New'; -aw-font-weight:normal; -aw-number-format:'o'">
                <span style="font-family:'Times New Roman'">Recommendation regarding accidental exposure of Imlygic to HCPs</span>
            </li>
            <li style="margin-left:28.4pt; padding-left:14.15pt; -aw-font-family:'Courier New'; -aw-font-weight:normal; -aw-number-format:'o'">
                <span style="font-family:'Times New Roman'">To always wear protective gown/laboratory coat, safety glasses, and gloves while preparing or administering Imlygic</span>
            </li>
        </ul>
    </li>
</ul>

aspose_words_html

The formatting in the DOCX file does not correctly maintain the list indentation and bullet styles as defined in the original HTML and CSS. Could you provide guidance on how to ensure that the original list formatting is preserved when converting HTML to DOCX using Aspose.Words?

aspose_words_docx

Thank you for your assistance.
original.jpg (328 KB)
aspose_words_html.jpg (352 KB)
aspose_words_docx.jpg (280 KB)

@fdt You should note, that Aspose.Words is designed to work with MS Word documents. HTML documents and MS Word documents object models are quite different and it is not always possible to provide 100% fidelity after conversion one model to another. In most cases Aspose.Words mimics MS Word behavior when work with HTML.

In addition, it looks like the provided HTML does not match the screenshots. I put HTML and CSS into one file and converted it to DOCX using Aspose.Words and MS Word and the output produced by Aspose.Words is closer to what is displayed in browser:

Document doc = new Document(@"C:\Temp\in.html");
doc.Save(@"C:\Temp\out.docx");

Input: in.zip (1.1 KB)
MS Word result: ms.docx (14.4 KB)
Aspose.Words result: out.docx (9.3 KB)

Thank you for your response. I’ll investigate the issue.

1 Like