I am converting HTML to XLSX. Some cells contain line breaks (
). These cells are rendered in a strange way. I tested Aspose.Cells 16.12.0 with some HTML samples (see below) using the following code.
class ProgramSample HTML #1: space between br and slash; table not wrapped in div.
{
static void Main(string[] args)
{
Thread.CurrentThread.CurrentCulture = new CultureInfo(“en-US”);
Console.Out.WriteLine(“My Aspose Console”);
HTMLLoadOptions opts = new HTMLLoadOptions(LoadFormat.Html);
Workbook wb = new Workbook(“HtmlInput.html”, opts);
wb.Worksheets[0].AutoFitColumns();
wb.Save(string.Format(“out.{0}.xlsx”, Guid.NewGuid()), SaveFormat.Xlsx);
Console.Out.WriteLine(“Done.”);
Console.In.ReadLine();
}
}
<table><tr><td>One<br />Two<br />Three<br />Four<br />Five</td></tr></table>Sample HTML #2: space between br and slash; table wrapped in div.
<div><table><tr><td>One<br />Two<br />Three<br />Four<br />Five</td></tr></table></div>Sample HTML #3: no space between br and slash; table not wrapped in div.
<table><tr><td>One<br/>Two<br/>Three<br/>Four<br/>Five</td></tr></table>Sample HTML #4: no space between br and slash; table wrapped in div.
<div><table><tr><td>One<br/>Two<br/>Three<br/>Four<br/>Five</td></tr></table></div>Results:
- Cell value truncated at first line break; i.e. A1 = “One”.
- Cell value truncated at first line break; i.e. A1 = “One”.
- Cell value is one contiguous string without line breaks; i.e. A1 = "OneTwoThreeFourFive"
- Cell value is split up across multiple rows; i.e. A1 = “One”, A2 = “Two”, A3 = “Three”, A4 = “Four”, A5 = "Five"
To add to the confusion, the behavior totally changes when I add a rowspan (>1) to the TD tag. Then I get a merged cell (spanning the number of rows specified in the rowspan attribute) that does contain the complete value, including line breaks. This was clearly the result of a bugfix implemented in 16.12.0. The bugfix is a real improvement for cells with a rowspan, so please leave it like that. My current concern is with cells without a rowspan (or rowspan = 1); their behavior should be more in line with aforementioned bugfix.
Returning to the 4 samples above, in my opinion they should all behave the same. Personally, I’d vote for having the entire cell content in A1, including line breaks (in line with how Aspose currently handles rowspanned cells). But if you prefer to split across rows (like my sample #4 did; and also how Excel itself renders HTML), then that’s fine with me too.
Please note that the behavior of non-rowspanned cells is not something introduced in 16.12.0; I noticed the same behavior in 16.11.0. Version 8.9.0 had better behavior; at present I am stuck there out of fear of breaking existing reports in our application.