Japanese Word split into Individual Japanese characters when an Excel Chart is converted to SVG

When an Excel Chart containing Japanese words is being converted to SVG using Aspose, some Japanese words are being split into individual characters.



One pattern we have noticed that causes the issue seems to be when the Japanese text has a “()” outside of axis categories



Code Snippet

public class AsposeSvgConverter {

public static void main(String[] args) throws FileNotFoundException, Exception {

Properties props = System.getProperties();

// Store fonts that you want to use, if any, at the following location: ${java.io.tmpdir}/aspose/fonts/

props.setProperty(“Aspose.Cells.FontDirExc”, System.getProperty(“java.io.tmpdir”) + “/aspose/fonts”);

// Store excel files that you want to process at the following location: ${java.io.tmpdir}/aspose/sample.xlsx

Workbook book = new Workbook(new FileInputStream(System.getProperty(“java.io.tmpdir”) + “/aspose/sample.xlsx”));

ImageOrPrintOptions imgOptions = new ImageOrPrintOptions();

imgOptions.setSaveFormat(SaveFormat.SVG);

for (Object obj : book.getWorksheets()) {

Worksheet sheet = (Worksheet) obj; // Assuming only one chart is present on chart sheet

com.aspose.cells.Chart chart = sheet.getCharts().get(0);

chart.toImage(“sample.svg”, imgOptions); } }}

Hi,


Thanks for the sample code and details.

Could you try our latest version/fix: Aspose.Cells for Java v17.4.0 if it makes any difference.
If you still find the issue, kindly do provide us your template Excel file, output SVG image and font file(s) used in the workbook, we will check it soon. Also, provide some screenshot(s) to highlight the problematic areas. This will help us really to evaluate your issue precisely to consequently figure it out soon.

Thank you.

We are using version 17.3.0. When we tried version 17.4.0

Words are still split into individual characters

Font selection now has changed; earlier it was picking up MS P Gothic and now it uses MS Gothic (so this might be a different issue?)



We did more investigations and we have the following observations. The words get split up into individual characters because of the presence of parenthesis. There are two variations of parenthesis; one from the ASCII set with code point 40 for opening parenthesis and another a double byte version of parenthesis for Japanese fonts with code point of 65288

i.e.,

Character “(” Code point Single byte: 40

Character “(” Code point Single byte: 65288



Character “)” Code point Single byte: 41

Character “)” Code point Single byte: 65289



When we mix the single byte characters with Japanese characters, Aspose seems to split the word into individual characters. However, if we use the double byte version instead, it retains it as a single word. Using an English keyboard with the default settings, when we type parenthesis it picks the single byte version by default.



public static void main(String[] args) {

System.out.println(Character.codePointAt(String.valueOf("("), 0)); // Note that there is no space before “(“. However, depending on the font/application it renders double byte characters the way you notice in sysout

System.out.println(Character.codePointAt(String.valueOf(")"), 0));

System.out.println(Character.codePointAt(String.valueOf("("), 0));

System.out.println(Character.codePointAt(String.valueOf(")"), 0));

}

Output

65288

65289

40

41

Hi,


I did evaluate your issue a bit and could not find it. The font used is “MS PGOTHIC”. I am using the following sample code with your template file, both chart and output SVG file are identical:
e.g
Sample code:

Workbook book = new Workbook(“F:\Files\japanese word1\test1.xlsx”);
ImageOrPrintOptions imgOptions = new ImageOrPrintOptions();
imgOptions.setSaveFormat(SaveFormat.SVG);
for (Object obj : book.getWorksheets()) {
Worksheet sheet = (Worksheet) obj; // Assuming only one chart is present on chart sheet
com.aspose.cells.Chart chart = sheet.getCharts().get(0);
chart.toImage(“F:\Files\japanese word1\out1sample1.svg”, imgOptions); }


See the screenshot comparing original chart (in MS Excel) Vs output SVG for your reference:
http://prntscr.com/f0zuep

Note: I am using our latest version/fix: Aspose.Cells for Java v17.4.2

Thank you.

Hi Amjad,

We tested using 17.4.2 but have the same observations. When you look at the SVG in IE it looks as if the Text is not split into words but if you try selecting the 2 different words and do an Inspect Element you will see a difference between the way the 2 words are treated. One remains as text and the other has been split into individual characters

Hi,


Thanks for providing us further details.

I got your point now. I observed the issue as you mentioned by using your sample code with your template file. I found that Japanese word splits into individual Japanese characters when an Excel Chart is converted to SVG. For confirmation, I opened the SVG file in the browser (e.g Google chrome). I right click on some data label’s part and Inspect the element. I found the issue as you described, see the screenshot for your reference:
http://prntscr.com/f1cxcy
e.g
Sample code:

Workbook book = new Workbook(“F:\Files\japanese word1\test1.xlsx”);
ImageOrPrintOptions imgOptions = new ImageOrPrintOptions();
imgOptions.setSaveFormat(SaveFormat.SVG);
for (Object obj : book.getWorksheets()) {
Worksheet sheet = (Worksheet) obj; // Assuming only one chart is present on chart sheet
com.aspose.cells.Chart chart = sheet.getCharts().get(0);
chart.toImage(“F:\Files\japanese word1\out1sample1.svg”, imgOptions); }

I have logged a ticket with an id “CELLSJAVA-42267” for your issue. Our concerned developer from product team will evaluate your issue soon.

Thank you.

Hi,


We got a response from product team for the logged ticket “CELLSJAVA-42267”. Well, this is default behavior when rendering to SVG file format. We must split a word of east asian languages to individual letters to place to better position for
some cases.

Thank you.

The behavior doesn’t seem consistent. Our observation is that it does the splitting when it encounters a single byte parenthesis and not for double byte. Any reason for this?

"We did more investigations and we have the following observations. The words get split up into individual characters because of the presence of parenthesis. There are two variations of parenthesis; one from the ASCII set with code point 40 for opening parenthesis and another a double byte version of parenthesis for Japanese fonts with code point of 65288
i.e.,
Character “(” Code point Single byte: 40
Character “(” Code point Single byte: 65288

Character “)” Code point Single byte: 41
Character “)” Code point Single byte: 65289

When we mix the single byte characters with Japanese characters, Aspose seems to split the word into individual characters. However, if we use the double byte version instead, it retains it as a single word. Using an English keyboard with the default settings, when we type parenthesis it picks the single byte version by default. "

Hi Amjad, can we check with the product team if there is a way to override this behavior so that we always get exactly what’s in the Excel in the SVG as well.

Hi,


Thanks for sharing your findings/comments.

I have logged it against your issue “CELLSJAVA-42267” into our database for product team’s investigation. Please spare us little time so product team could analyze it thoroughly.

Once we have an update on it to share with you or we require more info from your side, we will let you know here.

Thank you.

Hi,


Our product team has evaluated your issue further. Well, in older versions, we don’t split Japanese text/words to individual characters. We receive some cases where our users reported issues. In those cases, the Japanese string was too long (i.e., greater than 50 characters) and would be rendered to multiple lines. We found the characters cannot be right aligned when we don’t split to individual characters. Now, we only split Japanese string to individual characters where the text is in some shape (e.g. TextBox). In charts, we don’t split Japanese text to individual characters as the text is short.

We think splitting or not splitting the SVGs are same. Only the records of splitted SVG is more than not splitted SVG.

Thank you.