Insert Tab (Arrow) Character with Hex Code during HTML to DOCX Conversion using C# or Java

Hi, I am trying to convert html to docx. I have been successful until the html contains some hex code like

&#x2003

if I open the document in word it show a different character like this image.png (668 Bytes)
the little circle after Q: and the expected character should be image.png (602 Bytes)
the little arrow after Q: should I need to specify any encoding while saving my docx document? I already tried replacing the hex code for &emsp and it didn’t work.

Thanks in advance

@maik1,

Please note that the   hex character (also known as  ) actually represents a ‘Unicode Hex Character’ that when inserted in HTML is not even visible when viewing with web browsers such as Google Chrome. For example, please try to open the following HTML with Google Chrome:

<html>
<head>
    <title></title>
</head>
<body>
    Hello &#x2003; World
</body>
</html>

And when you convert such HTML using MS Word’s Save As command or using Aspose.Words programmatically, the output DOCX Word document will show you a little circle as can be seen in your first image. This seems to be an expected behavior.

However, if you would like to see a ‘Rightwards Arrow’ character, then please use &#x2192; in your HTML like this:

<html>
<head>
    <title></title>
</head>
<body>
    Hello &#x2192; World
</body>
</html>

Or if I am missing something, then please elaborate your inquiry a bit further. Also, please ZIP and upload your expected Word document showing the desired arrow character here for testing. We will then investigate the scenario further on our end and provide you more information.

Thanks for your answer.

Well actually the arrow represents a tab space but when converting from HTML to DOCX aspose.words interpreted it as an space or something like that. Also the arrow is not visible it is just visible enabling the all characteres in Word, any other idea?

@maik1,

Please try converting the following HTML to DOCX by using Aspose.Words:

<html>
<head>
    <title>
    </title>
</head>
<body>
    <div>
        <p>
            <span>Left</span>
            <span style="width:29.64pt; display:inline-block">&#xa0;</span>
            <span>Right</span>
        </p>
    </div>
</body>
</html> 

You will find Tab character in output DOCX document. (see screenshot.png (9.4 KB))

Hi, the code didn’t work, is there a way to perform a hanging indent from html to docx? the idea to have the tab is to action the hanging indent.

Thanks, regards.

how is the width calculated on this span? is it a random number? or is it something standard?

< span style=“width:29.64pt; display:inline-block”> 

@maik1,

Please see the following HTML:

<html>
<head>
    <title>
    </title>
</head>
<body>
    <div>
        <p>
            <span>Left</span>
            <span style="width:144pt; display:inline-block">&#xa0;</span>
            <span>Right</span>
        </p>

        <p style="text-indent:-36pt;">
            <span>Paragraph with hanging indent</span>
        </p>
    </div>
</body>
</html>  

Please note that 72 points are equal to 1 inch.

The 144 points value in first Paragraph indicates that when you view this with web browser, there will be 2-inches distance between ‘Left’ and ‘Right’ words. However, this won’t have any effect in Word document.

The text-indent:-36pt; in second Paragraph will be transformed into 0.5 inch hanging indentation when you convert it to Word document. Hope, this helps.