We're sorry Aspose doesn't work properply without JavaScript enabled.

Free Support Forum - aspose.com

Problem in colspan while converting HTML to Doc File

Hello,

Problem in colspan while converting HTML to doc file.

If you have colspan = "4" then when you are converting to doc it contains 4 repeated records , same like if you have given colspan ="2" then two repeated records are there.

i have searched in aspose forum to find out the solution and i got the below code to solve the issue ,

NodeCollection cells = doc.GetChildNodes(NodeType.Cell, true);<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" />

foreach (Cell cell in cells)

{

// Check whether cell is merged with previouse.

if (cell.CellFormat.HorizontalMerge == CellMerge.Previous ||

cell.CellFormat.VerticalMerge == CellMerge.Previous)

{

// Remove content from the cell.

cell.RemoveAllChildren();

}

}

but i have one problem with this , i am getting the HTML values from database ans storing it in the string variable , so in that condition i cannot use the above code.

I am using aspose 6.6, is that solved in new version?

pleas help!!

Hi

Thanks for your request. Could you please provide us your HTML string and output document produced on your side? We will check the issue and provide you more information.

Best regards,

Hello,

I have attached the Html code file,

In .net code i used the following code to get the HTML value,

Document doc = new Document(filePath);

string htmlText = doc.GetText();

if you see in the htmlText the same description showing 4 times in the place where i have given colspan = “4” , but if you convert this in to word file only one is visible and other 3 is invisible. if you copy and paste in some where you can find the hidden values.

The main problem with the colspan , if you give colspan =“3” then we will get three records , if you take print out you can find the words overlapping.

How to solve this issue? i am using Aspose 6.6.

Thanks.

Hi

Thank you for additional information. Actually, your code does not get HTML string. Your code just extracts plain text from the document. If you need to extract HTML string, you should use code like the following:

public string ConvertDocumentToHtml(Document doc)

{

string html = string.Empty;

// Save docuemnt to MemoryStream in Hml format

using (MemoryStream htmlStream = new MemoryStream())

{

doc.Save(htmlStream, SaveFormat.Html);

// Get Html string

html = Encoding.UTF8.GetString(htmlStream.GetBuffer(), 0, (int)htmlStream.Length);

}

// There could be BOM at the beggining of the string.

// We should remove it from the string.

while (html[0]!='<')

html = html.Substring(1);

return html;

}

This code will convert your document to HTML and returns the HTML string.

Also, it is not quite clear for me why you cannot use the workaround you have found. Just run the code right after loading the document.

Best regards,

Hello,

Thanks for your reply,

but this is not reply i am expecting from you, i am not using HTML document .
i am getting the HTML value from the database and i stored that in the document and i attached here to make it privacy.

i got the workaround but its having the solution only if we have HTML code stored in the document but in my case i am getting my HTML code from database , so i want to eliminate the extra columns that was generated by aspose because aspose not supporting colspan.

Thanks.


Hi

Thank you for additional information. But it is still not clear for me why you cannot use the workaround. What is the difference from where you get the HTML from file or from database? Ok, for example, you get your HTML from a database as string. Then you insert this HTML into a document and then you need to extract plain text from the document. In this case your code will look like this:

// Get HTML string. In your case you get it from database,

// in my case I get it from file.

string html = File.ReadAllText(@"Test001\HtmlCode.html");

// Create a document and insert HTML into the document.

Document doc = new Document();

// DocumentBuilder will help us to insert HTML.

DocumentBuilder builder = new DocumentBuilder(doc);

builder.InsertHtml(html);

// Here we use the workaround to remove content from the merged cells.

NodeCollection cells = doc.GetChildNodes(NodeType.Cell, true);

foreach (Cell cell in cells)

{

// Check whether cell is merged with previouse.

if (cell.CellFormat.HorizontalMerge == CellMerge.Previous ||

cell.CellFormat.VerticalMerge == CellMerge.Previous)

{

// Remove content from the cell.

cell.RemoveAllChildren();

}

}

// Now we extract plain text from the document.

string plainText = doc.ToTxt();

// Print the extracted text.

Console.WriteLine(plainText);

Hope this helps.

Best regards,