We’re converting some HTML to WordML (docx). The html contains bullets (or numbering) on which the a color has been set. The color for the contents of each bullet is converted, but the actual bullets are shown as black in the resulting document. The attached files show the issue.
Additional to the conversion issue above, the apostrophe character does not convert correctly when copied from MS Word or Outlook into the HTML to convert. The resulting output in Word after conversion is: ‘
The reason for this is because when you type an apostrophe in Word, instead for the expected ’ (U+39) character when you copy text, you get a ‘ (U+8216) character.
Thanks for your request. I managed to reproduce the problem with list items formatting on my side. Your request has been linked to the appropriate issue. You will be notified as soon as the problem is resolved.
Regarding the second problem, Could you please attach sample HTML here for testing? I will check the issue and provide you more information.
Thank you for additional information. I cannot reproduce the problem on my side. I use the latest version of Aspose.Words for testing. You can download it from here:
I downloaded the latest version, and I still have the same issue. It’s obvious an encoding issue. Do you set any properties on your Aspose.Words conversion instance that can influence encoding?
Ahh. I think that I have not been clear enough about the problem, so let me expand a little more. I do not get the HTML by saving a document as HTML (either from word or via Aspose). The HTML I get is from copying the text from word and pasting into a html document. Here are steps to reproduce:
Type the following into a word document: ‘copying form’ “Word”
Open a text editor and enter :
Copy the the text in 1 from word into the text document between the and tags.
Save the html and convert the resulting document to Word 2007 format using Aspose.Words.
Note that in 1 you should make sure that Word insert the appropriate (problematic) start and end “apostrophes” and “quotes”. Also note that this isn’t literally what we’re doing, but it does show the simplest form of replicating the problem we have. We actually have a html editor, into which the text from 1 is pasted. The during a publishing process in our application, the html is parsed into a Word 2007 document along with lots of other info. To get the html into the Word document, we use Aspose.Words to first convert it to WordML, and then process the resulting ML into the docx. It is during the conversion from HTML to WordML that the issue shows up.
I hope that explains the problem much better. Apologies for not being more clear about it upfront.
Thank you for additional information. I suppose, I can slip the first three steps because you already have attached the HTML document. So on my side I open your HTML using Aspose.Words and save it as DOCX document (see the attachment in my previous post).
Could you please also attach your output DOCX document here?
It’s weird that you can’t reproduce it. I’m trying to think of differences in the way your test app is implemented and how I use Aspose this side that could influence this. For one, I don’t have the HTML in a file, only a string. I create MemoryStream and a StreamWriter and then read the html into the memory stream. Then for the conversion, I create an Aspose document from the MemoryStream, and save it to another MemoryStream as SaveFormat.Docx. Here is the code:
1 using (MemoryStream htmlStream = new MemoryStream())
2 using (StreamWriter writer = new StreamWriter(htmlStream))
3 {
4 writer.Write(html);
5 writer.Flush();
6 doc = new Aspose.Words.Document(htmlStream, String.Empty, Aspose.Words.LoadFormat.Html, String.Empty);
7
8 using (MemoryStream mlStream = new MemoryStream())
9 using (StreamReader sr = new StreamReader(mlStream))
I have already provided the output docx in the initial attachment. I’ll include it again (result.docx). But I have another issue with converting HTML to WordML.
When I have a table in the html that is aligned to center or right by way of a paragraph, the alignment in the resulting docx is gone. The attachment show the issue.
Thanks for your inquiry. You are right, the problem occur because encoding issue. But the problem is not in Aspose.Words, it occur upon reading and writing string. Please try to specify Default encoding as shown in the following code:
string html = File.ReadAllText(@“Test001\Source.html”, Encoding.Default);
using (MemoryStream htmlStream = new MemoryStream())
using (StreamWriter writer = new StreamWriter(htmlStream, Encoding.Default))
{
writer.Write(html);
writer.Flush();
Document doc = new Document(htmlStream, String.Empty, LoadFormat.Html, String.Empty);
doc.Save(@“Test001\out.docx”);
}
Hope this helps.
Regarding the second issue (with table alignment), the problem occur because, currently, Aspose.Words does not support inheriting styles from parent elements. I linked your request to the appropriate issue. You will be notified as soon as it is resolved.