We're sorry Aspose doesn't work properply without JavaScript enabled.

Free Support Forum - aspose.com

DOCX to HTML conversion issue with single quote and double quote character using C#

We use Aspose.Words component to convert our word file into html file. However, the single quote character and double quote in the source word file is changed to a different special character in the target html file. This converted character shows up in browser as question mark. how can I resolve it?
Following is our conversion code:

public IActionResult GetFirstFile()
        {
            SetLicence();
            var dir = @"C:\Aspose";          
            var Source = new Aspose.Words.Document(System.IO.Path.Combine(dir, "Res.docx"));
            MemoryStream stream = new MemoryStream();
            HtmlSaveOptions options = new HtmlSaveOptions(SaveFormat.Html)
            {
                ExportTextInputFormFieldAsText = true,
                ExportImagesAsBase64 = true,
                CssStyleSheetType = CssStyleSheetType.Embedded,
                ExportFontsAsBase64 = true,
                ExportPageMargins = true,
                ExportShapesAsSvg = true,
                HtmlVersion = HtmlVersion.Html5,
                UseHighQualityRendering = true,
                ExportTocPageNumbers = true,
                ExportRelativeFontSize = true,
                ExportDocumentProperties = true,
            };
            options.ExportHeadersFootersMode = ExportHeadersFootersMode.PerSection;
            Source.Save(stream, options);
            var html = Encoding.ASCII.GetString(stream.ToArray());

            ViewBag.HTML = html;

            return View();
        }

also attached the document file. pls checkRes.docx (14.6 KB)

@vineeth.pv MS Word automatically replace double and single quotes with special opening and closing double and single quotes characters. You can replace them with regular double and single quotes characters before conversion to HTML. For example see the following code:

Document doc = new Document(@"C:\Temp\Res.docx");
FindReplaceOptions opt = new FindReplaceOptions();
opt.ReplacingCallback = new QuoteReplacingCallback();
doc.Range.Replace(new Regex("[“”‘’]"), "", opt);
doc.Save(@"C:\Temp\out.html", new HtmlSaveOptions { PrettyFormat = true });
private class QuoteReplacingCallback : IReplacingCallback
{
    public ReplaceAction Replacing(ReplacingArgs args)
    {
        switch (args.Match.Value)
        {
            case "“":
            case "”":
                args.Replacement = "\"";
                break;
            case "‘":
            case "’":
                args.Replacement = "'";
                break;
            default:
                return ReplaceAction.Skip;
        }

        return ReplaceAction.Replace;
    }
}

@alexey.noskov working fine for single quotes and double quotes. but DOCX to HTML conversion issue hyphen(–)

@vineeth.pv The same applied to the hyphen - MS Word automatically replaces a hyphen between words with a dash. So you can replace it using the same approach suggested above. Just modify regular expression like this:

doc.Range.Replace(new Regex("[“”‘’–]"), "", opt);

and add the following case:

case "–":
    args.Replacement = "-";
    break;