Hi,
I found an issue when using Aspose Words to export a .docx containing Private Use Area (PUA) Unicode.
When I use MS Word to read an .rtf that contains PUA Unicode chars and then save it as a .docx, it saves the PUA chars as text.
When I use Aspose to read the .rtf and then export it as a .docx, it saves the PUA chars as symbols.
The behavior is the same when Aspose reads a .docx containing PUA chars created by MS Word and then saves it as a .docx. In the original .docx created by MS Word, the PUA chars are text. But in the .docx created by Aspose, the PUA chars are saved as symbols.
Here is the code sample using .rtf:
public static void TestTextWithPUA()
{
string testFolder = Path.Combine(inputFolder, "TestPUA");
Document document = new Document(Path.Combine(testFolder, "TestPUA.rtf"));
document.Save(Path.Combine(testFolder, $"TestPUA.[SavedWithAspose].rtf"), SaveFormat.Rtf);
document.Save(Path.Combine(testFolder, $"TestPUA.[SavedWithAspose].docx"), SaveFormat.Docx);
}
I attached the sample file “TestPUA.rtf” and the outputs, from Aspose and MS Word. I also attached a screenshot showing the issue in the .docx inner XML.
Looking to the inner .docx xml we can see the follwoing difference:
Saved with MS Word:
<w:t></w:t>
Saved with Aspose:
<w:sym w:font="Unicode BMP Fallback SIL" w:char="F735" />
So what I need to know is: How do I force Aspose to save PUA unicode chars as text, instead of symbols, like MS Word does, when exporting to .docx?
Thank you
TestPUA.zip (634.8 KB)
Screenshot.png (58.4 KB)