Content controls inside bulleted/numbered lists disappear when converting DOCX to HTML

Hello,

I am converting a DOCX to HTML and the content controls that are inside bulleted/numbered lists do not appear in the resulting HTML.

As you can see in the HTML file attached, the title and tag of the content controls outside the lists appear in the HTML as styles:
-aw-sdt-tag: aaaParagraph1; -aw-sdt-title: Paragraph1;
-aw-sdt-tag: aaaParagraph3; -aw-sdt-title: Paragraph3;

The conversion is done with the following code:

var convertedDocument = new Document("C:…\Document1.docx");
convertedDocument.Save("C:…\TranslatedDocument1.html");

Please note that the file “ContentControlsConversionTest_CONVERTED.HTML.docx” should be renamed to “ContentControlsConversionTest_CONVERTED.HTML”, I had to add the .docx at the end as otherwise it wouldn’t allow me to upload the file.

It seems like there is some kind of issue in Aspose while converting this, please let me know if this is the case or I should do something else to obtain those content controls in the resulting HTML.

Thank you for your time.

Hi John,

Thanks for your inquiry. Could you please also zip and attach your expected HTML here for our reference. We will investigate the structure of your expected document as to how you want your final output be generated like.

Best regards,

Hello,

I am attaching the expected output file here.

I guess that if you are selecting the whole bullet point the -aw-sdt-tag and -aw-sdt-tag should be in the style of the li. If you have multiple li inside the content control, each one of them will have the same -aw-sdt-tag and -aw-sdt-tag (This is the way it is working now for normal paragraphs).

As a side note, if a content control is only part of the bullet point it is working. As it creates a span with -aw-sdt-tag and -aw-sdt-tag in its style.

Best Regards

Hi John,

Thanks for your inquiry. For the sake of any correction, we have logged this problem in our issue tracking system as WORDSNET-13629. Our product team will further look into the details of this problem and we will keep you updated on the status of correction. We apologize for your inconvenience.

Best regards,

Hi John,

Regarding WORDSNET-13629, our product team has completed the work on your issue and has come to a conclusion that this issue and the undesired behavior you’re observing is actually not a bug. So, we will close this issue as ‘Not a Bug’.

Starting from 15.3.0 version of Aspose.Words, we had decided to enclose content controls with div or span tags in Html and I am afraid, we won’t be able to change this behavior. We had implemented this behavior as per WORDSNET-11443.

Best regards,

Hi Awais,

Sorry for the delay.
Then from your answer I understand that the style “-aw-sdt-tag: aaaParagraph1; -aw-sdt-title: Paragraph1;” should appear in a div or span surrounding the li that is contained in the content control. However, in my tests, the “-aw-sdt-tag” and “-aw-sdt-title” styles do not appear anywhere in the HTML. I get the same HTML as if there wasn’t any content control in the docx.
Can you please verify this and tell me if I am doing something wrong?
I am attaching a very basic test with a document and the translated html, you can see that the “-aw-sdt-tag” and “-aw-sdt-title” styles do not appear there.

Thank you.

Hi John,

Thanks for your inquiry. You generated Html using an old version i.e. Aspose.Words for .NET 15.3.0. Here is the Html produced on my end by using Aspose.Words for .NET 16.6.0:

<html>
<head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
    <meta http-equiv="Content-Style-Type" content="text/css" />
    <meta name="generator" content="Aspose.Words for .NET 16.6.0.0" />
    <title>
    </title>
</head>
<body>
    <div>
        <div style="-aw-sdt-tag:'ContentControlTag'; -aw-sdt-title:'ContentControlTitle'">
            <ul type="disc" style="margin:0pt; padding-left:0pt">
                <li style="margin-left:28.06pt; margin-bottom:10pt; line-height:115%; padding-left:7.94pt; font-family:serif; font-size:11pt; -aw-font-family:'Symbol'; -aw-font-weight:normal; -aw-number-format:''">
                    <span style="font-family:Calibri">Text inside Content Control</span>
                </li>
            </ul>
        </div>
    </div>
</body>
</html>

Best regards,

Hi Awais,

Thanks for your reply. I updated to v16.7 and it is working as you wrote.
However, I think there is an issue when selecting the whole line of a bulleted/numbered list, when converting to HTML it creates a div (as shown in your code) surrounding the whole ul element instead of the li element. Then in all li appears <span style=“color: rgb(34, 34, 34); font-family: Consolas, “Lucida Console”, monospace; font-size: 12px; white-space: pre-wrap; background-color: rgb(255, 255, 255);”>"<span style=“color: rgb(34, 34, 34); font-family: Consolas, “Lucida Console”, monospace; font-size: 12px; white-space: pre-wrap; background-color: rgb(255, 255, 255);”>m<span style=“color: rgb(34, 34, 34); font-family: Consolas, “Lucida Console”, monospace; font-size: 12px; white-space: pre-wrap; background-color: rgb(255, 255, 255);”>argin-bottom:10pt;" as I guess is trying to add the space after the list. The problem is that this happens even when you have several elements, showing a new line space between each of them in the browser, while in the original doc those spaces did not exist.
When selecting some part of the text these new lines do not appear as the content control is converted as a span inside the li.
I think a good way around this would be to convert content controls always as a span (and put it around the text that they have inside, thought I understand this would require more changes into the DOCX structure):

<html>
<head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
    <meta http-equiv="Content-Style-Type" content="text/css" />
    <meta name="generator" content="Aspose.Words for .NET 16.6.0.0" />
    <title>
    </title>
</head>
<body>
    <div>
        <ul type="disc" style="margin:0pt; padding-left:0pt">
            <li style="margin-left:28.06pt; margin-bottom:10pt; line-height:115%; padding-left:7.94pt; font-family:serif; font-size:11pt; -aw-font-family:‘Symbol’; -aw-font-weight:normal; -aw-number-format:‘’">
                <span style="font-family:Calibri"><span style="-aw-sdt-tag:‘ContentControlTag’; -aw-sdt-title:‘ContentControlTitle’">Text inside Content Control</span></span>
            </li>
        </ul>
    </div>
</body>
</html>

Could you check if this would be possible? We would need to display documents exactly the same way as they are in DOCX in HTML.

I’m attaching 2 documents with their respective HTML output, one of them has the whole line selected when creating the content control, the other one only some text. In the HTML you can notice the new line space between point 1 and point 2 in the file “Out selecting whole lines.html”.

Thank you.

Hi John,

Thanks for your inquiry. For the sake of any correction, we have logged this problem in our issue tracking system as WORDSNET-14039. Our product team will further look into the details of this problem and we will keep you updated on the status of this issue. We apologize for your inconvenience.

Best regards,

The issues you have found earlier (filed as WORDSNET-14039) have been fixed in this Aspose.Words for .NET 16.10.0 update and this Aspose.Words for Java 16.10.0 update.

This message was posted using Notification2Forum from Downloads module by aspose.notifier.