Html to word to pdf, image in table overflows

Hello.

In our product, we offer the possibility to our users to import a word document, edit it in a custom TinyMCE instance as HTML, and then export it to a Word or PDF file.

After converting a word file with a custom format and alignment on numbered list, one of our user added a table after a paragraph that has a negative text-indent in its style, and some images i the table cells.
When exported to word, the image vertical baseline is out of the table cell because it’s snaped on the negative text indent and the table is aligned with the paragraph indent. When saved as PDF, the image portion that is out of the table is not displayed.

I’m not sure if it’s unintended that the overflowing part of the image is hidden in the PDF, but I think whether or not there’s a text-indent in the HTML, the image should not be placed outside of the table in word.

Thank you

The HTML of the numbered paragraphs:

<div style="widows: 0; orphans: 0; font-family: 'Liberation Serif'; font-size: 12pt; line-height: 1.15;">
<div>
<ol style="margin: 0pt; padding-left: 0pt;" type="1">
    <li
        style="margin-left: 36pt; text-indent: -18pt; font-family: Arial; font-weight: bold; list-style-position: inside;">
        <!-- ... -->
    </li>
    <li
        style="margin-left: 36pt; text-indent: -18pt; font-family: Arial; font-weight: bold; list-style-position: inside;">
        <span style="font-weight: normal;">Sed non mi metus.</span>
        <ol class="awlist2" style="margin-right: 0pt; margin-left: 0pt; padding-left: 0pt;" type="1">
            <li style="text-indent: -17.85pt; -aw-list-padding-sml: 3.33pt;"><span
                    style="width: 3.33pt; font: 7pt 'Times New Roman'; display: inline-block; -aw-import: ignore;"
                    class="mceNonEditable">&nbsp; </span><span style="font-weight: normal;">Integer finibus
                    tempor nulla. Cras ex velit, sollicitudin sit amet molestie sed, iaculis in justo. Duis
                    pellentesque dolor vehicula massa pulvinar, id sagittis ex euismod. Maecenas
                    malesuada</span>
                <table style="width: 200px;" border="1">
                    <colgroup>
                        <col>
                        <col>
                    </colgroup>
                    <tbody>
                        <tr>
                            <td><img src="blob:http://localhost/068b7e95-2e58-43a2-befd-939380f04e01" alt="" width="186"
                                    height="55"></td>
                            <td><img src="blob:http://localhost/b3bc7424-a3d2-4cd6-8a9d-b18b7f3113ab" alt="" width="186"
                                    height="55"></td>
                        </tr>
                    </tbody>
                </table>
                <span style="font-weight: normal;">&nbsp;</span><br><span
                    style="font-weight: normal; -aw-import: ignore;" class="mceNonEditable">&nbsp;</span>
            </li>
            <li style="text-indent: -17.85pt; -aw-list-padding-sml: 3.33pt;"><!-- ... --></li>
        </ol>
    </li>
    <li
        style="margin-left: 36pt; text-indent: -18pt; font-family: Arial; font-weight: bold; list-style-position: inside;">
        <!-- ... -->
    </li>
    <!-- ... -->
</ol>
</div>
</div>

HTML view in tinyMCE:

DOCX Version:

PDF Version:

Original word file:
leo_test_lorem_ipsum.docx (7.1 KB)

Exported word file with images:
- Preview(1).docx (19.6 KB)

@concord_tech First of all, please note, Aspose.Words is designed to work with MS Word documents. HTML documents and MS Word documents object models are quite different and it is not always possible to provide 100% fidelity after conversion one format to another. In most cases Aspose.Words mimics MS Word behavior when work with HTML documents.

I have tested conversion of your HTML snippet to DOCX and PDF using the following simple code and the latest 24.1 version of Aspose.Words for Java:

Document doc = new Document("C:\\Temp\\in.html");
doc.save("C:\\Temp\\out.docx");
doc.save("C:\\Temp\\out.pdf");

The result looks the same in DOCX and PDF:
out.docx (17.4 KB)
out.pdf (40.6 KB)

But I see the images in the table are truncated in at the left in output DOCX and PDF.

We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.

Issue ID(s): WORDSNET-26549

You can obtain Paid Support Services if you need support on a priority basis, along with the direct access to our Paid Support management team.

Hey @alexey.noskov

This is precisely the issue. Thanks for your answer.

I’m looking forward to the fix.

Have a good one

1 Like

Hey @alexey.noskov
I see the issue ticket is marked as postponed. Could you tell me what this means in your workflow? Is there any chance it would be addressed in a near future?

As always, thanks for your support.
Have a good one

@concord_tech We have completed analyzing the issue. li element has text-indent: -17.85pt that is read as ParagraphFormat.FirstLineIndent . This property is propagated on each table cell inside list item. And this causes the problem.
Postponed status means that the issue is not yet scheduled for development and it is not likely it will be fixed in one of the nearest releases. Please accept our apologies for your inconvenience.

1 Like