HTMl Table output does not match for English and Chinese language content

Hello Team,

We are using InsertHTML() method for printing the document.

However when the output is taken in English and Italian language content, it renders correctly.

Still when the same string is used to output the Chinese content the HTML table does not adhere to the English and Italian output.

Output of all the above mentioned languages is attached for your reference.

Regards,

Hi

Thanks for your request. Could you also attach your HTML here for testing? I will check the issue and provide you more information.

Best regards,

Hello,

Here is the text input attached.

You might have to translate the content in other languages.

Regards,

Hi

Thank you for additional information. I managed to reproduce the problem. You will be notified as soon as it is resolved.

Best regards,

Hello Dwarika!

Thank you for your patience.

I have addressed this issue. It is a known problem. See this thread:

I have re-linked the thread to the corresponding issue. You’ll be notified when it’s fixed. As a workaround you can import documents, not insert them. This is really not a good idea to insert a whole document. DocumentBuilder.InsertHtml is intended to insert document fragments, not entire documents. Please let us know if you have any other questions.

Regards,

Hello,

Thank you for your reply & suggestion.

Your workaround solution is not matching to our requirements since it is not possible for us to import the document.

As you said this is known issue for you, when we will expect a fix for this issue?

Is it possible for you to fix this issue in up-coming release of Aspose.

Waiting for your reply.

Thanks & Regards,

Dwarika

Hi Dwarika,

Thanks for your inquiry. Unfortunately, it is difficult to promise you something regarding this issue at the current stage. So, currently, the only way to resolve the problem on your side is using suggested workaround.

Best regards.

Hi alexey,

Thanks for your service & support,

I just want to add some more information here that this problme is not resolved in Aspose word 7.0.0.0

I am using direct method document.SaveToPdf() for converting word document into PDF document.

Is it possible to fix this problem in future version of Aspose?

Thanks & Regards,

Dwarika

Hi Dwarika,

Thanks for your request. The issue is still unresolved. Unfortunately, I cannot promise you that it will be resolved before next hotfix. So, I think, you should use the suggested workaround until the issue is unresolved.

Best regards.

Hello Alexey,

Thanks very much for the continuous support and providing quick solutions to the issues we are facing.

Is it possible for you give us tentative date by which we can expect the fix of this issue.

We will be in the process of a major release to our client and this is one of the major concern.

In this case the work-around provided by you do not fit into out current architecture.

and surprising this happening with only Chinese Text, we have other languages like German, Arabic, Polish, Finnish with these languages the output is not disturbed. Is it as expected.

kindly look into this.

best regards,

Dwarika

Hi Dwarika,

Thanks for your inquiry. But, unfortunately, I cannot provide you any additional information regarding this issue at the moment. You will be notified as soon as there is something new regarding this issue.

Best regards.

Hi Alexey,

Thanks for your continuous support.

With regards to the above issue for formatting table in chinese language i want to provide some more information to you.

If i tried to generate the report in MS_WORD 2007 docx format the table is coming in proper format.

Please refer attached document in docx format.

I think this will help you to solve the actual problem.

I have some doubts regarding with this.

1> Why the table is coming proerly in word 2007 docx format but not in other word formats( eg. word 2003 format ) ?

2> Is there any aditional parameter that we are mising out to while generating report in word 2003?

Waiting for your reply…!!!

Thanks & Regards,

Dwarika

Hi Dwarika,

Thank you for additional information. Docx and Doc are absolutely different formats and different writers are used to export document in these formats. That is why formatting might be different.

By the way, have you tried using a workaround, I suggested in the following thread?

Best regards.

The link suggested by you cannot be implemented in our project because:

We dynamically create blank word page and then use InsertHTML method to insert the HTML formatted text.

We are not vreating any intermediate HTML document.

Our project is about to go “LIVE”. At this point of time we cannot change the implementation as suggested by you.

Please suggest some other way so that we dont have to change the current logical flow for inserting HTML.

Hi Dwarika,

Thanks for your request and additional information. But I am confused a bit. If you cannot change your HTML and do not want to change the code, how would you like to work the issue around? The only way to work the problem around, I see at the moment, is the code I suggested you.

Best regards.

Hi Alexey,

Thank you for your suggestion & service.

We tried & implemented the workaround that you had suggested in previous post. But unfortunately it is not working as per the requirements.

I am using Aspose word version 7.0.0.0

The smaple code is as follows.

private void button17_Click(object sender, EventArgs e)

{

string path = Path.GetDirectoryName(System.Windows.Forms.Application.ExecutablePath);

//Initialize Aspose license.

Aspose.Words.License lic = new Aspose.Words.License();

lic.SetLicense(path + @"\Aspose.Total.lic");

Aspose.Pdf.License lic1 = new Aspose.Pdf.License();

lic1.SetLicense(path + @"\Aspose.Total.lic");

Document doc = new Document();

DocumentBuilder documentBuilder = new DocumentBuilder(doc);

//set the page size.

//set the page set-up

PageSetup pageSetup = documentBuilder.CurrentSection.PageSetup;

pageSetup.PaperSize = PaperSize.A4;

documentBuilder.Font.Name = “arial unicode MS”;

string str = File.ReadAllText(@“C:\Documents and Settings\amit.saraf\Desktop\chinese(PRB)\Copy of CHZ.txt”);

byte[] htmlBytes = Encoding.UTF7.GetBytes(str);

// Create html stream.

MemoryStream htmlStream = new MemoryStream(htmlBytes);

// Create temporary docuemnt from stream.

Document tmp = new Document(htmlStream, “”, LoadFormat.Html, “”);

// Insert temporary docuemnt into the main document.

InsertDocument(documentBuilder.CurrentParagraph, tmp);

doc.Save(“chinese.doc”);

StartWord(“chinese.doc”);

}

public void InsertDocument(Node insertAfterNode, Document srcDoc)

{

// Make sure that the node is either a pargraph or table.

if ((!insertAfterNode.NodeType.Equals(NodeType.Paragraph)) &

(!insertAfterNode.NodeType.Equals(NodeType.Table)))

throw new ArgumentException(“The destination node should be either a paragraph or table.”);

// We will be inserting into the parent of the destination paragraph.

CompositeNode dstStory = insertAfterNode.ParentNode;

// This object will be translating styles and lists during the import.

NodeImporter importer = new NodeImporter(srcDoc, insertAfterNode.Document, ImportFormatMode.KeepSourceFormatting);

// Loop through all sections in the source document.

foreach (Section srcSection in srcDoc.Sections)

{

// Loop through all block level nodes (paragraphs and tables) in the body of the section.

foreach (Node srcNode in srcSection.Body)

{

// Let’s skip the node if it is a last empty paragarph in a section.

if (srcNode.NodeType.Equals(NodeType.Paragraph))

{

Paragraph para = (Paragraph)srcNode;

if (para.IsEndOfSection && !para.HasChildNodes)

continue;

}

// This creates a clone of the node, suitable for insertion into the destination document.

Node newNode = importer.ImportNode(srcNode, true);

// Insert new node after the reference node.

dstStory.InsertAfter(newNode, insertAfterNode);

insertAfterNode = newNode;

}

}

}

Please refer the attached documents.

1>The sample text document containing the input text in chinese format.

2> The output of the sample code in word format.

I have some doubts regarding with this

1> Is this issue is ocurred due to “Arial Unicode Font”?

2> Is there any Other work-around that we can try out since we need to have some solution to this problem ASAP.

3> Is this problem persist for all asian languages like chinese,korean etc?

4> Can we expect this problem to be get resolved in coming Aspose update ?

if no then please can you give us some rough timeline nearby which we can expect the exact solution from aspose ?

Waiting for your Reply…!!!

Regards,

Dwarika

Hi Dwarika,

Thank you for additional information. I found, that the problem occurs because AllowAutoFit option is set. So as a workaround, you can try using code like the following:

// Get HTML string.

string html = File.ReadAllText(@“Test001\CHZ.html”);

// Create document and Documentbuilder.

Document doc = new Document();

DocumentBuilder builder = new DocumentBuilder(doc);

// Insert HTML into the document.

builder.InsertHtml(html);

// Reset AllowAutoFit option for each row in the document.

NodeCollection rows = doc.GetChildNodes(NodeType.Row, true);

foreach (Row row in rows)

row.RowFormat.AllowAutoFit = false;

// Save output document.

doc.Save(@“Test001\out.doc”);

Hope this helps.

Best regards.

Hello Alexey,

We tried with the work around you have provided in the earlier post.

Below is the modified code as suggested by you. Take a look at code and let us know whether it is correctly implemented or not?

string path = Path.GetDirectoryName(System.Windows.Forms.Application.ExecutablePath);

//Initialize Aspose license. Aspose Word version 9.0.0.0

Aspose.Words.License lic = new Aspose.Words.License();

lic.SetLicense(path + @"\Aspose.Total.lic");

Aspose.Pdf.License lic1 = new Aspose.Pdf.License();

lic1.SetLicense(path + @"\Aspose.Total.lic");

Document doc = new Document();

DocumentBuilder documentBuilder = new DocumentBuilder(doc);

//set the page size.

//set the page set-up

PageSetup pageSetup = documentBuilder.CurrentSection.PageSetup;

pageSetup.PaperSize = PaperSize.A4;

documentBuilder.Font.Name = “Arial Unicode MS”;

string filePath = @“D:\Projects\CreateMultiCandidateMOO\CreateMultiCandidateMOO\bin\Debug\Chinese Content.html”;

We are using HTML file to render chinese text. Attached is the HTML file in zip format.

string strHTML = File.ReadAllText(filePath);

// Insert HTML into the document.

documentBuilder.InsertHtml(strHTML);

// Reset AllowAutoFit option for each row in the document.

NodeCollection rows = doc.GetChildNodes(NodeType.Row, true);

foreach (Aspose.Words.Tables.Row row in rows)

row.RowFormat.AllowAutoFit = false;

//save document

doc.Save(@“Chinese Content Aspose 9.0.doc”);

Following files are attached with this post

  1. Chinese Content.txt - original Chinese content in HTML

  2. Chinese Content Aspose 9.0.doc - output after modifying the code as given above

  3. Bullets in Table.PNG - missing bullets in doc output

Kindly save “Chinese Content.txt” with HTML extn and open it in browser. I am unable to attach this with HTML extn as upload does not allow us to do so.

Now take a look at “Chinese Content.zip” which is original HTML file i have used to generate the word document

Our observation is,

  1. Work around suggested by you still does not solve the problem. Open “Chinese Content Aspose 9.0.doc” and take a look. Table still goes out side the margins set on the page.

  2. After modifying the code the bullets which are present in the “Chinese Content.html” file goes missing. Take a look at table in “Chinese Content Aspose 9.0.doc” second column forth row.

  3. “Bullets in Table.PNG” shows how the bullets are present in the “Chinese Content.html”

Do let us know whether we need to modify the code further?

Regards,

Dwarika

Hi Dwarika,

Thanks you for additional information.

  1. The workaround works for the HTML you have posted in your first post. It does not works for the newly attached HTML because it is formatted differently. Maybe you should play with your HTML in order to resolve the problem.

  2. Bullets do not appear because UL tags are missed in your HTML.

Best regards,

Hi Alexey,

Post dated 05-27-2010 is updated with the comments.(Highlighted with yellow)

Take a look and let us know.

Regards,

Dwarika