Reading Rich Text Formatting from Excel using Aspose Cells - two issues

I am trying to read in an Excel cell, including formatting that is character specific within the cell, using Aspose Cells, and am not having success.

My preferred method, and the method I anticipated using, was to simply access the .HtmlString property of the cell. The formatting I need is reasonably basic - bold, italics, underlining, tabs, colors, carriage returns, but I do need them on a character by character basis (i.e. rich-text, not just whole-cell formats), and the HtmlString property does not seem to do this type of formatting at all. My company recently purchased the tool, assuming based on the presence of this property that basic characteristics such as these would be covered, and it is very disappointing and poses a large issue / bug for us that they are not.

In the absence of this functionality, I also tried to do the tedious parsing process myself, and found that even the API for character by character manipulation does not seem to work for reading in cells. I can read the text in fine, but when I try to look at cell.Character(0,1).Font.IsItalic and similar properties, I get false even though the first character in the cell in Excel is in Italic font. Wondering whether I'm doing something wrong or I've hit a bug?

I'm running this code (after registering license, opening workbook, worksheet, etc.):

Cell c = worksheetIn.Cells[j + 1, i];

string a = c.Value.ToString();

bool isItalic = c.Characters(0, 1).Font.IsItalic;

String a ends up with the correct text value of the cell I'm trying to read.

But isItalic ends up false, even though the following is the text as it appears in the cell in the Excel 2003 document:

Put in a formatted description that I can test the import of.

Likewise, Bolding and other formatting are not detected.

Am I missing something, or if not, then is it possible for you to fix these issues?

Many Thanks!

Hi,

Thanks for providing us details.

Well, it's not a bug rather a limitation for Cell.Characters method when you retrieve the formatting attributes for individual chars, it will only have the format settings for the part of the richtext. We will soon check if we can change the internal logic to cover the whole text formattings (evaluate individual char's formatting) in a cell while retreiving the formatting for every individual cell for your need.

Sorry for any inconvenience.

Hi,

Please try the attached fix (4.6.0.2). Kindly refer to the following sample code and create your own accordinly for your need. The procedure described below will explain how you can get the formattings/style of an individual character in the cell value.

Sample code:

..............

internal static bool IsItalic(int startIndex, int length, ArrayList chars)
{
for (int i = 0; i < chars.Count; i++)
{
Characters chs = (Characters)chars[i];
if (chs.StartIndex >= startIndex && chs.StartIndex < startIndex + length)
{
if (!chs.Font.IsItalic)
{
return false;
}
}
}
return true;
}

static void Main(string[] args)
{
Workbook workbook = new Workbook();
//// this.TestCalculate();


workbook.Open(@"F:\FileTemp\book2.xls");

Aspose.Cells.Cells cells = workbook.Worksheets[0].Cells;

ArrayList chars = cells["B2"].GetCharacters();
Console.WriteLine(IsItalic(0, 1, chars));
}


Thank you.

I have tried testing with the new dll and using the IsItalic function as in your code sample, but it still does not seem to produce the expected results. The GetCharacters() function does not seem to work correctly - when I use the code below:

Cell c = worksheetIn.Cells[j + 1, i];

ArrayList mychar = c.GetCharacters();

int myCharCount = mychar.Count;

string myText = c.Value.ToString();

variable myText contains the full, correct text (without formatting), but myCharCount is only 4. The full text in the Excel box is:

Sample with a word in BOLD and a word in ITALICS

So it seems that the full text is not being returned by c.GetCharacters().

Thanks for your help.

Hi,

I think you may try to change/replace the IsItalic function as given below for your need, it will return valid/fine results when calling the function now:

internal static bool IsItalic(int startIndex, int length, ArrayList chars)
{
for (int i = 0; i < chars.Count; i++)
{
Characters chs = (Characters)chars[i];
if (chs.StartIndex >= startIndex)
{
if (chs.StartIndex < startIndex + length)
{
if (!chs.Font.IsItalic)
{
return false;
}
}
}
else if (chs.StartIndex + chs.Length > startIndex)
{
if (!chs.Font.IsItalic)
{
return false;
}
}

}
return true;
}

Similarly, you can have IsBold function like this:

internal static bool IsBold(int startIndex, int length, ArrayList chars)
{
for (int i = 0; i < chars.Count; i++)
{
Characters chs = (Characters)chars[i];
if (chs.StartIndex >= startIndex)
{
if (chs.StartIndex < startIndex + length)
{
if (!chs.Font.IsBold)
{
return false;
}
}
}
else if (chs.StartIndex + chs.Length > startIndex)
{
if (!chs.Font.IsBold)
{
return false;
}
}

}
return true;
}

If you still find any issue inquiring Italic, Bold characters in the cell, please post your sample codes, template file etc. here, we will check it soon.

Thank you.