Insert HTML in Word Document using C# .NET | Transform Font Size Styles from Pixel to Points

Hello,

We were using Aspose.Words component, where we insert the HTML into our word template. The issue we are seeing is, let say the HTML which I’m going to insert is having font-size in pixel format, but generated document showing less font size visually.

What I understood is that, 12px font-size in HTML is not equal to 12pt in word document (rather is shows arround 8pt). What is the better way to fix that with out changing HTML font-sizes?

@srinudhulipalla,

Please ZIP and attach the following resources here for testing:

  • Your simplified source Word document
  • The HTML string that you want to insert in above template Word document
  • Aspose.Words for .NET 21.4 generated output DOCX file showing the undesired behavior
  • Your expected DOCX file showing the desired output. You can create this document manually by using MS Word.

As soon as you get these pieces of information ready, we will start further investigation into your scenario and provide you more information.

Here are the supporting files to replicate the problem from your side.

Attached files:
Program.zip (1.0 KB)
Expected.zip (37.1 KB)

I have also created Expected.docx file manually. As you can see Output.docx is created with font 10.5 Arial, but expecting 14 Arial.

How do we fix at Aspose without altering our HTML code?

@srinudhulipalla,

But, MS Word 2019 also produces a similar output when saving attached HTML file as DOCX format and Aspose.Words 21.4 tries to mimic the behavior of MS Word.

So, this is an expected behavior of Aspose.Words.

Hi,

Didn’t know whether you have noticed the issue. The actual issue is related to font-size. I have clearly conveyed in the below attached image. Hope this clarify the issue:

image.png (39.3 KB)

Is there anyway to fix it in Aspose.Words?

@srinudhulipalla,

Please check if the following workaround is acceptable for you?

string htmlString = File.ReadAllText("C:\\Temp\\expected\\input_html.html");

Document oDoc = new Document("C:\\Temp\\expected\\input.docx");

oDoc.NodeChangingCallback = new HandleNodeChanging_FontSizePointsToPixels();

FindReplaceOptions options = new FindReplaceOptions();
options.Direction = FindReplaceDirection.Forward;
options.MatchCase = false;
options.FindWholeWordsOnly = true;
options.ReplacingCallback = new WordDocReplaceHandler();

oDoc.Range.Replace("Hello", htmlString, options);

oDoc.Save("C:\\temp\\expected\\21.4.docx");

public class HandleNodeChanging_FontSizePointsToPixels : INodeChangingCallback
{
    void INodeChangingCallback.NodeInserted(NodeChangingArgs args)
    {
        // set back font size of every Run node from points to pixels
        if (args.Node.NodeType == NodeType.Run)
        {
            Aspose.Words.Font font = ((Run)args.Node).Font;
            font.Size = ConvertUtil.PointToPixel(font.Size);
        }
    }

    void INodeChangingCallback.NodeInserting(NodeChangingArgs args)
    {
        // Do Nothing
    }

    void INodeChangingCallback.NodeRemoved(NodeChangingArgs args)
    {
        // Do Nothing
    }

    void INodeChangingCallback.NodeRemoving(NodeChangingArgs args)
    {
        // Do Nothing
    }
}

internal class WordDocReplaceHandler : IReplacingCallback
{
    ReplaceAction IReplacingCallback.Replacing(ReplacingArgs e)
    {
        try
        {
            Regex regHTML = new Regex(@"<\s*([^ >]+)[^>]*>.*?<\s*/\s*\1\s*>");
            bool isHTML = regHTML.IsMatch(e.Replacement);

            DocumentBuilder builder = new DocumentBuilder((Document)e.MatchNode.Document);

            if (e.MatchNode.GetText().ToLower().Contains("mergefield"))
            {
                if (isHTML)
                {
                    builder.MoveToMergeField(e.Match.Value);
                    builder.InsertHtml(e.Replacement, true);
                }

                return ReplaceAction.Skip;
            }
            else
            {
                if (isHTML)
                {
                    builder.MoveTo(e.MatchNode);
                    builder.InsertHtml(e.Replacement);
                    e.Replacement = string.Empty;
                }

                return ReplaceAction.Replace;
            }
        }
        catch (Exception)
        {
            return ReplaceAction.Replace;
        }
    }
}

Thank you, that workarround seems to work. But I have another sample HTML, if you see the word “Joo” have 11px in HTML and generated output have 11.5pt. Any other alternative to fix?

input_html.zip (4.5 KB)

@srinudhulipalla,

We have logged your requirement in our issue tracking system. Your ticket number is WORDSNET-22220. We will further look into the details of this requirement and will keep you updated on the status of the linked issue.

@srinudhulipalla,

Regarding WORDSNET-22220, we have completed the analysis of this issue and concluded to close this issue with “not a bug” status. The analysis reveals the following:

Like MS Word, Aspose.Words stores font size in half points. When font size is loaded from HTML, it is converted from pixels to points and is rounded up to the nearest half point. For example: 14px => 14 / 96 * 72 = 10.5pt; 11px => 11 / 96 * 72 = 8.25, rounded to 8.5pt. This rounding is also performed when the font size is modified by setting the “Font.Size” value. That’s why 11px is converted and rounded up to 8.5pt on loading and then converted and rounded up to 11.5pt by the workaround code. This behavior is by design and it is the same what MS Word does.

Unfortunately, the only correct way to get the same font size as in HTML is to change units of “font-size” values in the HTML document from “px” to “pt” in order to get rid of all conversions and roundings. For example, by using a regex replace after HTML is loaded to a string in memory.

Thank you for your details and it makesense on what you are saying. Two things I need help here:

One is, as you mention 11px => 11 / 96 * 72 = 8.25, rounded to 8.5pt. But when I use 11px in my HTML it becomes 11.5 in output document. May I know how this conversion happend?

Second is, I would like to continue to use the callback function NodeChangingCallback on the document. But one issue I have seen is, if my HTML is already in points, like in attached example then below given code is still trying to convert. Meaning that if I have 11pt in my HTML, then output is shwoing as 14.5. How to avoid this?

void INodeChangingCallback.NodeInserted(NodeChangingArgs args)
{
    // set back font size of every Run node from points to pixels
    if (args.Node.NodeType == NodeType.Run)
    {
        Aspose.Words.Font font = ((Run)args.Node).Font;
        font.Size = ConvertUtil.PointToPixel(font.Size);
    }
}

output.zip (19.2 KB)

Thanks.

@srinudhulipalla,

We have logged these details in our issue tracking system and will keep you posted here on further updates.

Sure, thank you. I will wait…

@srinudhulipalla,

When HTML is loaded to the document model:
11px => 11 / 96 * 72 = 8.25pt, rounded up to 8.5pt
After that, when the INodeChangingCallback.NodeInserted processes the text, ConvertUtil.PointToPixel does this:
8.5pt => 8.5 / 72 * 96 = 11.333px, which is then treated as 11.333pt and is rounded up to 11.5pt in the setter of Font.Size.

The only way is to somehow pass the information about the units of “font-size” values from HTML to INodeChangingCallback.NodeInserted and conditionally disable the conversion. When HTML is loaded by Aspose.Words, all font sizes are converted to points, so that information cannot be retrieved from the document model.

Thanks for the more details for my first query.

On the second one, tried to disable conversion by certain condition. No where I can see the source unit type (px/pt) to disable the condition. Is there any workarround to disable the conversion based on unit type?

@srinudhulipalla,

I am afraid, the whole problem we are trying to solve (treat “px” in HTML as “pt”) looks like a hack, and there is no simple and logical solution. If you like the approach that uses INodeChangingCallback.NodeInserted, we would recommend to parse the source HTML using regular expressions, extract “font-size” values and check whether they are in “px” or “pt”. This information can then be passed to INodeChangingCallback.NodeInserted in order to disable the ConvertUtil.PointToPixel conversion in case font sizes are specified in “pt”.

If we were writing this code, however, we wouldn’t use INodeChangingCallback.NodeInserted. Instead, we would pre-process source HTML and replace “px” in “font-size” declarations with “pt” using regular expressions. Then we would let Aspose.Words load modified HTML normally.

Ok, thank you. I will consider changing font-size to pt using regular expression over the INodeChangingCallback.NodeInserted

@srinudhulipalla,

In case you have further inquiries or may need any help in future, please let us know by posting a new thread in Aspose.Words forum.