Feature request for TxtLoadOptions

spygordon · November 17, 2011, 4:26am

Hi,

We are about to start using Aspose Cells also for reading csv files.

We know that we can expect files from different countries and thus will need to allow different types of number formats.

What we would like is

- A new method in TxtLoadOptions: setLocale(Locale) that could be used instead of setCountry and setRegion. Normally info about country and region is not available to us but a locale has been chosen.

- We would sometimes need to be able to provide exact info about which decimal separator and thousand separator to use when loading the file. So, TxtLoadOptions.setDecimalSeparator and TxtLoadOptions.setThousandSeparator is also requested.

Best Regards

Claes

shakeel.faiz · November 17, 2011, 4:34am

Hi,

Thanks for your posting and requesting new features.

We will look into them and implement it and update you asap.

This issue has been logged as CELLSJAVA-40037.

amjad.sahi · November 18, 2011, 5:06am

Hi,

We have supported LoadOptions.setLocale(Locale) for your requirement. Please try the new fix v7.0.3.1 (attached). But for the “DecimalSeparator” and “ThousandSeparator”, we think it should be determined by the given locale, just like MS Excel does when loading a CSV file. If you really need to specify one Locale and different separators of another locale when loading CSV files, we will look into this requirement later on to check whether we can support it because it will conflict with other logic of parsing numeric values.

Thank you.

spygordon · November 18, 2011, 3:20pm

Hi,

Thanks for adding this method so quickly!

I understand your take on the request for decimal- and thousand/grouping-separator, however I feel that I should explain why this would be very useful for us and possibly for others:

In our platform we normally use the Locale of the user uploading a file for number formating etc.

But it is very common that members of our team receive files to upload from our customers that were produced not only with other csv delimiters and encoding, but also with different numeric formating. So for us it would repeatedly save time if we for each individual file could explain how numbers are formated, regardless of the Locale for the user that submits the file.

Thanks for taking this under consideration

Claes

shakeel.faiz · November 18, 2011, 11:16pm

Hi,

Thanks for your comment/feedback. I have logged it in our database. We will update you asap.

shakeel.faiz · November 20, 2011, 10:06pm

Hi,

Can it fit your requirement if we let you set the specified Format object when reading CSV file? The API may like:

C#

TxtLoadOptions.setValueParser(java.text.Format[][] formats, java.lang.String[][] customs)

For the parameter ”formats”, formats[0] is all possible Format objects will be used to try to parse values of the first column in CSV template file, formats[1] is all possible Format objects will be used for the second column, ...etc. The last one(formats[formats.length-1]) will be used for corresponding and all following columns. “customs” is the custom string used for formatting the cell(Style.setCustom()) corresponding with “formats”.

To parse number with different separators, code like following:

C#

DecimalFormatSymbols s = new DecimalFormatSymbols();
s.setDecimalSeparator(',');
TxtLoadOptions.setValueParser(new java.text.Format[][]{{ new DecimalFormat("##.###", s),},}, new java.lang.String[][]{{“##.###", },});

If this API can fit your requirement, we will try to provide it for you in next fix.

spygordon · November 21, 2011, 1:15am

Hi,

We will receive CSVs with hundreds of columns and rarely know before hand which columns will contain what data. But if I understand your suggestion correctly the code pasted below would provide the suggested format for all columns in the CSV? If so, this is indeed interesting because stating once what the number format is expected to be for the entire CSV is what we are looking for. But if it will cause issues with the columns that contain text values it will not be good for us because the files will most likely always contain both types of data. Also worth pointing out is that the columns with numeric values also contain a couple of text values in the top rows - these values contain info that tells us what type of data to expect from the column (not the format however) - will that cause problems with this solution?

DecimalFormatSymbols s = new DecimalFormatSymbols();

s.setDecimalSeparator(’,’);

TxtLoadOptions.setValueParser(new java.text.Format[][]{{ new DecimalFormat("##.###", s)}}, new java.lang.String[][]{{“##.###"}});

Thank you for looking into this!

Claes

amjad.sahi · November 21, 2011, 2:13am

Hi,

In fact the given java.text.Format objects are only the preferred parsers for parsing values in CSV file. We will try to use those user specified parsers to parse value at first, if one parser is matched, then we will use the parsed value and corresponding formatting string for that cell. If all those parsers failed, we will continue to use existing logic of parsing values, just like what will be done by current version when reading a CSV file. For a common string value, it will be kept as string value for corresponding cell, without failure or value lost.

Thanks for your understanding!

spygordon · November 21, 2011, 2:43am

Thanks, then this sounds like a great solution!

Best Regards

Claes

shakeel.faiz · November 23, 2011, 4:21am

Hi,

Please try the new fix. Aspose.Cells for Java v7.0.3.2

We have added some new APIs for user custom parsers (a bit different from what we have discussed but more flexible):

1. ICustomParser

This interface allows users to add their custom value parser for parsing string values to other proper cell value object. Its methods:

1. java.lang.Object parseObject(java.lang.String value)

Parses given string to proper value object.

2. java.lang.String getFormat()

Gets the formatting pattern for last invocation of parseObject(java.lang.String).

2. TxtLoadOptions

public void setPreferredParsers(com.aspose.cells.ICustomParser[] parsers)

For the parameter “parsers”, just as we have discussed, parsers[0] is the parser will be used for the first column in text template file, parsers[1] is the parser will be used for the second column, ...etc. The last one(parsers[parsers.length-1]) will be used for all other columns start from parsers.length-1.

With those new APIs, it is simple for you to implement your own value parser. Here is the example:

class MyParser0 implements ICustomParser

{

private java.text.Format[] formats;

private String[] patterns;

private java.text.ParsePosition parsePos = new java.text.ParsePosition(0);

private int index;

public MyParser(java.text.Format[] formats, String[] patterns)

{

this.formats = formats;

this.patterns = patterns;

}

public Object parseObject(String value)

{

index = -1;

final int pos = value.length();

for(int i=0; i<formats.length; i++)

{

parsePos.setIndex(0);

Object v = formats[i].parseObject(value, parsePos);

if(parsePos.getIndex() == pos)

{

index = i;

return v;

}

return null; //return null to let Cells component try other built-in parsers to parse this value. To keep given value as string for the cell, here can return the value itself

}

public String getFormat()

{

if(index > -1 && patterns != null && index < patterns.length)

{

return patterns[index];

}

return null;

}

spygordon · November 23, 2011, 5:40am

Thank you very much, this looks really cool!

Best Regards

Claes

aspose.notifier · December 20, 2011, 5:06am

The issues you have found earlier (filed as CELLSJAVA-40037;CELLSJAVA-40037) have been fixed in this update.

This message was posted using Notification2Forum from Downloads module by aspose.notifier.