Manage more files - get filter criteria

m.cambiolo · June 18, 2011, 4:26am

Hi,

I'm looking for a smart way to manage 15 files. Every file has same structure and more than 30000 rows, less than 65000 .

Every input file has a column key. In output I need a file every distinct value of key.

Can you help me?

I'm trying to use Autofilter, but how can I get get criteria value of Autofilter. Can be the correct way?

There are others solutions?

INPUT

FF1_clone1.csv

key	Campo1	Campo 2
22595	22595_1_campo1.1	22595_1_campo2.1
22595	22595_1_campo1.2	22595_1_campo2.2
23222	23222_1_campo1.3	23222_1_campo2.3

FF1_clone2.csv

key	Campo1	Campo 2
22595	22595_2_campo1.1	22595_2_campo2.1
23222	23222_2_campo1.2	23222_2_campo2.2
23222	23222_2_campo1.3	23222_2_campo2.3

FF1_clone3.csv

key	Campo1	Campo 2
22595	22595_3_campo1.1	22595_3_campo2.1
23222	23222_3_campo1.2	23222_3_campo2.2
23227	23227_3_campo1.3	23227_3_campo2.3

OUTPUT

FF1_22595.xls

key	Campo1	Campo 2
22595	22595_1_campo1.1	22595_1_campo2.1
22595	22595_1_campo1.2	22595_1_campo2.2
22595	22595_2_campo1.1	22595_2_campo2.1
22595	22595_3_campo1.1	22595_3_campo2.1

FF1_23222.xls

key	Campo1	Campo 2
23222	23222_1_campo1.3	23222_1_campo2.3
23222	23222_2_campo1.2	23222_2_campo2.2
23222	23222_2_campo1.3	23222_2_campo2.3
23222	23222_3_campo1.2	23222_3_campo2.2

FF1_23227.xls

key	Campo1	Campo 2
23227	23227_3_campo1.3	23227_3_campo2.3

amjad.sahi · June 18, 2011, 12:21pm

Hi,

Please check the document for your reference:
http://www.aspose.com/documentation/.net-components/aspose.cells-for-.net/data-filtering-and-validation.html

Thank you.

m.cambiolo · June 19, 2011, 4:17am

Hi,

sorry but I cannot find information that I need in this article.

I’d like to do something like:

//Creating AutoFilter by giving the cells range of the heading row

worksheet.AutoFilter.Range = “A1:B1”;

//Filtering columns with specified values

worksheet.AutoFilter.Filter(1, “Bananas”);

But I haven’t the string Banana. I have to set filter for every criteria of Autofilter dinamically.

How can I retry this list of criteria in the code?

amjad.sahi · June 19, 2011, 1:17pm

Hi,

Well, “Bananas” is just an example, you may replace it with your desired string. I am still not sure about your requirement, could you create a sample example in MS Excel (manually) and post the file here with explanation. This sample would show how you can implement your task in MS Excel and what steps you have made to do the task, we will check it soon.

Also, kindly check the attributes and methods of AutoFilter class, may be you can find your desired options for your need. Also, try our latest versions e.g 6.0.0:
http://www.aspose.com/documentation/.net-components/aspose.cells-for-.net/aspose.cells.autofiltermembers.html
(the class contains almost all the possible options that MS Excel allows).

Thank you.

m.cambiolo · June 19, 2011, 3:14pm

Ok,

I try to explain you my requirement.

I have some files CSV or Excel, (File1.xls, File2.xls), these files have same structure. One column is the key, and I have to manage File1.xls e File2.xls and finally obtain a file every key value, with the values (File_1111.xls, File_2222.xls and File_3333.xls). I don’ know key values without open and read the file.

This is my requirement.

I’m thinking to use Autofilter on first file (File1.xls), and get the list of filter criteria (1111, 2222, 3333). How can I get this list of value? Maybe FilterColumnCollection?

amjad.sahi · June 19, 2011, 4:57pm

Hi,

Well, I think there is no such feature in MS Excel to perform your desired task completely. You have to use your own code to detect the key values dynamically and then you may use Autofilter features of MS Excel to filter the customized list of data based on your key values. You may save different files based on your filtered data list. Also, you have to use your own code to gather different (similar) data lists into a file accordingly.

I have written a sample code for your reference here if it can help you a bit.

Sample code:
Workbook workbook = new Workbook(“e:\test2\File1.xls”);

string [] keyvalues = {“1111”,“2222”,“3333”};
Worksheet worksheet = workbook.Worksheets[0];

AutoFilter autofilter = worksheet.AutoFilter;
autofilter.Range = “A1:C” + worksheet.Cells.MaxDataRow +1;
for (int i = 0; i < keyvalues.Length; i++)
{

worksheet.AutoFilter.Filter(0, keyvalues[i]);
workbook.Save(“e:\test2\out_File” + i.ToString() + “.xls”);

}

Thank you.

m.cambiolo · June 20, 2011, 2:17am

Thanks,

so there isn’t an automatic way to obtain this list of value

string [] keyvalues = {“1111”,“2222”,“3333”};

from an Excel file.

Right?

amjad.sahi · June 21, 2011, 1:39am

Hi,

I think you may use your own code to get the distinct values in a range/column. See the following sample code, I have used HashTable to add some distinct values according.

Sample code:

Workbook wb = new Workbook(@“e:\test2\File1.xls”);
Hashtable hashTable = new Hashtable();
Cells cells = wb.Worksheets[0].Cells;
for (int i = 1; i < cells.Rows.Count; i++)
{
Row row = cells.Rows.GetRowByIndex(i);
Aspose.Cells.Cell cell = row.GetCellOrNull(0);
if (cell != null && cell.Type != CellValueType.IsNull)
{
string cellValue = cell.StringValue;
if (hashTable[cellValue] == null)
{
hashTable.Add(cellValue, cellValue);
}
}
}

Worksheet worksheet = wb.Worksheets[0];

AutoFilter autofilter = worksheet.AutoFilter;
autofilter.Range = “A1:C” + worksheet.Cells.MaxDataRow + 1;
foreach (string key in hashTable.Keys)
{
worksheet.AutoFilter.Filter(0, key);
wb.Save(“e:\test2\myoutput_File” + key.ToString() + “.xls”);
}

Thank you.

m.cambiolo · June 21, 2011, 10:17am

Ok, thanks.

I wrote yesterday this code, and it works fine.

public Boolean buildExcelFile(String[] inputFiles, String nomeTipologia, Int32 colonnaCodiceUog) {

Boolean bRet = true;

foreach (String inputFile in inputFiles)

{

//Instantiate Text File’s LoadOptions

String estensionInputFile = inputFile.Split(’.’)[inputFile.Split(’.’).Length - 1].ToLower();

TxtLoadOptions txtLoadOptions = null;

switch (estensionInputFile)

{

case “txt”:

case “csv”:

txtLoadOptions = new TxtLoadOptions();

txtLoadOptions.Separator = Convert.ToChar(";");

txtLoadOptions.Encoding = System.Text.Encoding.UTF8;

break;

case “xls”:

txtLoadOptions = new TxtLoadOptions(LoadFormat.Excel97To2003);

break;

case “xlsx”:

txtLoadOptions = new TxtLoadOptions(LoadFormat.Xlsx);

break;

default:

bRet = false;

break;

}

Workbook workbookSource = new Workbook(inputFile, txtLoadOptions);

Worksheet wsSource = workbookSource.Worksheets[0];

ArrayList codiciUog = new ArrayList();

Int32 idx = 0;

log.Info(“Inizio ciclo per recupero uog da file - righe:” + wsSource.Cells.MaxRow.ToString());

while (idx <= wsSource.Cells.MaxRow)

{

try

{

Cell uog = wsSource.Cells.GetCell(idx, colonnaCodiceUog);

if (uog != null)

{

if (!codiciUog.Contains(uog.StringValue))

{

codiciUog.Add(uog.StringValue);

}

catch (Exception ex) {

bRet = false;

}

idx++;

}

foreach (String codiceUog in codiciUog)

{

Worksheet wsDestination = null;

Workbook workbookDestination = null;

String[] files = Directory.GetFiles(".", nomeTipologia + "" + codiceUog + “.xls”);

Int32 intCounter = 1;

if (files.Length == 1)

{

workbookDestination = new Workbook(files[0]);

wsDestination = workbookDestination.Worksheets[0];

intCounter = wsDestination.Cells.MaxRow + 1;

}

else

{

workbookDestination = new Workbook();

wsDestination = workbookDestination.Worksheets[0];

}

for (int iRow = 0; iRow <= wsSource.Cells.MaxDataRow; iRow++)

{

try

{

Cell uog = wsSource.Cells.GetCell(iRow, colonnaCodiceUog);

if (uog != null)

{

if (uog.StringValue.Equals(codiceUog))

{

wsDestination.Cells.CopyRow(wsSource.Cells, iRow, intCounter);

intCounter = intCounter + 1;

}

else

{

bRet = false;

}

catch (Exception ex)

{

bRet = false;

}

try

{

workbookDestination.Save(nomeTipologia + “_” + codiceUog + “.xls”);

}

catch (Exception e) {

bRet = false;

}

return bRet;

}

I didn’t use Autofilter beacause I don’w want rows with differente key in output files.

I’m worried about performance.

Because for each input file I open and save the output file on file system.

Thanks a lot.

Maurizio

shakeel.faiz · June 23, 2011, 7:13am

Hi,

Please try saving your workbook in memory stream rather than saving in on disk. This will increase the performance.

Please see the subsection inside the documentation topic: Saving Files

2. Saving File to a Stream

m.cambiolo · June 27, 2011, 8:14am

Thanks a lot.

Last question.

If I use the object Workbook in place of memory stream can I have performance improvement?

Or memory stream is the best way? or is it the same, make no difference?

Something like

Workbook workbookDestination = null;

workbookDestination = (Workbook)fileToExport[nomeTipologia + "" + codiceUog];

if(workbookDestination == null){

workbookDestination = new Workbook();

}

…

//elaboration

…

fileToExport[nomeTipologia + "" + codiceUog] = workbookDestination;

Bye

Maurizio

shakeel.faiz · June 29, 2011, 2:03pm

Hi,

Since you are saving a workbook object inside dictionary (hash) object, so you do not even need to recreate it. It will definitely be having more performance then saving the workbook in memory stream and then recreating the workbook from memory stream.

If you are saving lot of workbook objects in a dictionary (hash) object, then you will have a memory concern, because it is potentially going to consume lot of memory.

I read your code above, if you are opening the input file once and then after processing it save on disk and then continue with next and don’t touch the previous one again, then you do not need even a memory stream, your code is well-optimized.

Memory stream is good for only intermediate conversions. Suppose, you want to save your workbook in some file say ABC and then you have to load file ABC file again, then it is better for you to save it in memory then saving in file (harddisk).