Invalid surrogate pair

Hi, I’ve come across an issue when converting Word documents to xhtml.

System.ArgumentException The surrogate pair is invalid. Missing a low surrogate character.    at System.Xml.XmlTextEncoder.WriteRawWithSurrogateChecking(String text)
   at System.Xml.XmlTextWriter.WriteRaw(String data)
   at  ?. ?   ( ? )
   at  ?.??   ( ? )
   at  ?.??   ( ? )
   at  ?.??   ( ? )
   at  ?.??   ( ? )
   at  ?.??   ( ? )
   at  ?.??   ( ? )
   at  ?.??   ( ? )
   at ??.??   ( ? )
   at  ?.??   ( ? )
   at  ?.??   ( ? )
   at    .( ?  ,  ?  )
   at    .      ( ?  )
   at  ? . ?    ( ?  )
   at Aspose.Words.Document.( ?  )
   at Aspose.Words.Document.(Stream , String , SaveOptions )

Here’s a minimal reproduction document:
Invalid surrogate pair.docx (100.9 KB)

Let me know if you require more details.
Thanks in advance.

@njlgad Unfortunately, I cannot reproduce the problem on my side using the latest 22.1 version of Aspose.Words for .NET. Which version of Aspose.Words do you use? Please try using the latest version and of the problem still there, provide simple code that will allow us to reproduce the problem.

Hi @alexey.noskov,

We are using the latest version of Aspose.Words (22.1)

The following test fails:

[Test]
public void Test()
{
    var license = new License();
    license.SetLicense(new MemoryStream(Licences.Aspose_Words));

    var asposeDocument = new Document("C:\\Temp\\Invalid surrogate pair.docx");

    var options = new HtmlFixedSaveOptions { SaveFormat = SaveFormat.HtmlFixed, };

    using (var htmlBytes = new MemoryStream())
    {
        asposeDocument.Save(htmlBytes, options);
    }
}

Also I’ve tried converting the document using Convert Files Online - Word, PDF, HTML, JPG And Many More, and although the conversion did not fail, the HTML rendering is not correct (it’s missing the emoji).

I’ve also reattached the document but this time embedding the font characters used (in case it’s part of the problem)
Invalid surrogate pair.docx (123.2 KB)

@njlgad Thank you for additional information. I have managed to reproduce the problem. For a sake of correction it has been logged as WORDSNET-23379. We will keep you informed and let you know once it is resolved.