Free Support Forum - aspose.com

Aspose.words - generates question marks "?" where they shouldn't be. any ideas?

aspose.words - generates question marks “?” where they shouldn’t be.


Any ideas?
Is there a list of characters that ASPOSE.words can’t handle or html tags?
It seems it isn’t just 1 special character, there is many characters that ASPOSE.Words can’t handle.

note: It’s a lot of rich text from the user’s infopath form I can’t control. We extract it from the .xml of infopath. I tried to take the raw text into “file.html”. Then used word2010 > insert > object > text from file. and word2010 renders the rawFile.html ok with no “?” question marks. The problem is when I try to use the ASPOSE.WORDS to create the word file from the stream of html.

  • Using - dll’s only
  • Aspose.Words dlls .NET 11.10.0
  • net3.5_ClientProfile
  • c# .net
  • input is rich text xhtml from users that are copied and pasted into infopath.

below is the word file. All those question marks ?'s “?” are not from the original input.

<span style=“font-size:11.0pt;mso-bidi-font-size:12.0pt;
font-family:“Arial Black”,“sans-serif”;mso-fareast-font-family:“Arial Black”;
mso-bidi-font-family:“Arial Black””>Testing rich text feature:
<span style=“font-size:11.0pt;mso-bidi-font-size:12.0pt;font-family:“Arial Black”,“sans-serif”;
mso-fareast-font-family:“Arial Black”;mso-bidi-font-family:“Arial Black””> <span style=“font-family:“Calibri”,“sans-serif”;mso-fareast-font-family:Calibri”><o:p></o:p>

Users use ?word? and copy and paste the formatted text.

  • One
  • Two
  • Three

?

Blah blah blal

Let?s try a picture here:

?


// source is the object.
...
..
.
s.Append("

Brief Description
").Append(Source.BriefDescriptionXHtml).Append("

");
...etc
..
.
---------------------------
Aspose.Words.License license = new License();
license.SetLicense("Aspose.Total.lic");

Stream stream = new MemoryStream(System.Text.Encoding.ASCII.GetBytes(html.Html));
Aspose.Words.Document doc = new Aspose.Words.Document(stream);
//// //-- debugging, only saving to client's drive for developer debugging. don't keep in production version.
doc.Save(targetLibraryName + proposal.Filename.Replace(".xml", ".doc"));

-----------------------------------------------------------------------------


debug watch:
Source.BriefDescriptionXHtml
"
\r\n
\r\n \r\n Testing rich text feature:\r\n \r\n
\r\n
\r\n Users use “word” and copy and paste the formatted text.\r\n
\r\n
    \r\n
  • \r\n \r\n One\r\n \r\n
  • \r\n
  • \r\n \r\n Two\r\n \r\n
  • \r\n
  • \r\n \r\n Three\r\n \r\n
  • \r\n
\r\n
\r\n
\r\n
\r\n Blah blah blal\r\n
\r\n
\r\n Let’s try a picture here: \r\n
\r\n
\r\n \r\n
\r\n
\r\n
"


Hi Shawn,


Thanks for your inquiry. Please note that Aspose.Words tries to mimic the same behavior as MS Word do. Upon processing HTML, some features of HTML might be lost. You can find a list of limitations upon HTML exporting/importing here:
http://www.aspose.com/docs/display/wordsnet/Load+in+the+HTML+%28.HTML%2C+.XHTML%2C+.MHTML%29+Format
http://www.aspose.com/docs/display/wordsnet/Save+in+the+HTML+%28.HTML%2C+.XHTML%2C+.MHTML%29+Format

I have used Source.BriefDescriptionXHtml as input for both Aspose.Words and MS Word.
MS Word2010 > insert > object > text from file, please see the MSWord2010.docx. MS Word do not import Source.BriefDescriptionXHtml correctly.

By using Aspose.Words, the output is : AsposeOut.docx.

Could you please attach your complete input html here for testing? I will investigate the issue on my side and provide you more information.