Bullets and numbered list color not being converted from HTML to WordML

Hi,

We’re converting some HTML to WordML (docx). The html contains bullets (or numbering) on which the a color has been set. The color for the contents of each bullet is converted, but the actual bullets are shown as black in the resulting document. The attached files show the issue.

We use Aspose.Words v6.6.0.0.

Regards,

Charl Marais

Hi,

Additional to the conversion issue above, the apostrophe character does not convert correctly when copied from MS Word or Outlook into the HTML to convert. The resulting output in Word after conversion is: ‘

The reason for this is because when you type an apostrophe in Word, instead for the expected ’ (U+39) character when you copy text, you get a ‘ (U+8216) character.

Regards

Charl Marais

Hi Charl,

Thanks for your request. I managed to reproduce the problem with list items formatting on my side. Your request has been linked to the appropriate issue. You will be notified as soon as the problem is resolved.

Regarding the second problem, Could you please attach sample HTML here for testing? I will check the issue and provide you more information.

Best regards.

Hi,

The apostrophes and qoutes problem is shown in the attached file.

Regard,

Charl Marais

Hi

Thank you for additional information. I cannot reproduce the problem on my side. I use the latest version of Aspose.Words for testing. You can download it from here:

http://www.aspose.com/community/files/51/.net-components/aspose.words-for-.net/category1188.aspx

I also attached my output documents.

Best regards.

Alexey,

I downloaded the latest version, and I still have the same issue. It’s obvious an encoding issue. Do you set any properties on your Aspose.Words conversion instance that can influence encoding?

Thanks,

Charl

Hi Charl,

Thanks for your inquiry. No, I do not specify any settings. I just open the source document and save it as HTML.

Please attach your output HTML here.

Best regards.

Ahh. I think that I have not been clear enough about the problem, so let me expand a little more. I do not get the HTML by saving a document as HTML (either from word or via Aspose). The HTML I get is from copying the text from word and pasting into a html document. Here are steps to reproduce:

  1. Type the following into a word document: ‘copying form’ “Word”

  2. Open a text editor and enter :

  3. Copy the the text in 1 from word into the text document between the and tags.

  4. Save the html and convert the resulting document to Word 2007 format using Aspose.Words.

Note that in 1 you should make sure that Word insert the appropriate (problematic) start and end “apostrophes” and “quotes”. Also note that this isn’t literally what we’re doing, but it does show the simplest form of replicating the problem we have. We actually have a html editor, into which the text from 1 is pasted. The during a publishing process in our application, the html is parsed into a Word 2007 document along with lots of other info. To get the html into the Word document, we use Aspose.Words to first convert it to WordML, and then process the resulting ML into the docx. It is during the conversion from HTML to WordML that the issue shows up.

I hope that explains the problem much better. Apologies for not being more clear about it upfront.

Regards,

Charl Marais

Hi

Thank you for additional information. I suppose, I can slip the first three steps because you already have attached the HTML document. So on my side I open your HTML using Aspose.Words and save it as DOCX document (see the attachment in my previous post).

Could you please also attach your output DOCX document here?

Best regards.

Hi Alexey,

It’s weird that you can’t reproduce it. I’m trying to think of differences in the way your test app is implemented and how I use Aspose this side that could influence this. For one, I don’t have the HTML in a file, only a string. I create MemoryStream and a StreamWriter and then read the html into the memory stream. Then for the conversion, I create an Aspose document from the MemoryStream, and save it to another MemoryStream as SaveFormat.Docx. Here is the code:

1 using (MemoryStream htmlStream = new MemoryStream())

2 using (StreamWriter writer = new StreamWriter(htmlStream))

3 {

4 writer.Write(html);

5 writer.Flush();

6 doc = new Aspose.Words.Document(htmlStream, String.Empty, Aspose.Words.LoadFormat.Html, String.Empty);

7

8 using (MemoryStream mlStream = new MemoryStream())

9 using (StreamReader sr = new StreamReader(mlStream))

10 {

11 doc.Save(mlStream, Aspose.Words.SaveFormat.Docx);

12 mlStream.Position = 0;

13

14 pValue = ExtractHtmlConvertedWordML(mlStream);

15 }

16 }

I have already provided the output docx in the initial attachment. I’ll include it again (result.docx). But I have another issue with converting HTML to WordML.

When I have a table in the html that is aligned to center or right by way of a paragraph, the alignment in the resulting docx is gone. The attachment show the issue.

Charl Marais

Hi

Thanks for your inquiry. You are right, the problem occur because encoding issue. But the problem is not in Aspose.Words, it occur upon reading and writing string. Please try to specify Default encoding as shown in the following code:

string html = File.ReadAllText(@“Test001\Source.html”, Encoding.Default);

using (MemoryStream htmlStream = new MemoryStream())

using (StreamWriter writer = new StreamWriter(htmlStream, Encoding.Default))

{

writer.Write(html);

writer.Flush();

Document doc = new Document(htmlStream, String.Empty, LoadFormat.Html, String.Empty);

doc.Save(@“Test001\out.docx”);

}

Hope this helps.

Regarding the second issue (with table alignment), the problem occur because, currently, Aspose.Words does not support inheriting styles from parent elements. I linked your request to the appropriate issue. You will be notified as soon as it is resolved.

Best regards.

Alexey, you’re the man. Thanks. Works like a charm.

Charl Marais

The issues you have found earlier (filed as WORDSNET-2021) have been fixed in this .NET update and this Java update.


This message was posted using Notification2Forum from Downloads module by aspose.notifier.
(34)

The issues you have found earlier (filed as WORDSNET-1633) have been fixed in this .NET update and this Java update.


This message was posted using Notification2Forum from Downloads module by aspose.notifier.