Achieving Fixed-Width Text-to-Column Splitting and Advanced Import Settings in Aspose.Cells

Kavii · October 28, 2024, 4:01pm

Is it possible to achieve fixed-width text-to-column splitting and advanced text import settings using Aspose.Cells? I’m asking because the TxtLoadOptions class in Aspose.Cells only provides options for splitting by delimiter, not fixed-width.

amjad.sahi · October 28, 2024, 6:19pm

@Kavii,

Aspose.Cells concentrates on delimiter-based splitting for text-to-column functionality, but fixed-width column splitting is not directly supported in the TxtLoadOptions class. I think you can accomplish the task by defining the fixed-width structures and then manually parsing each line before writing it to individual cells in a worksheet for your requirements. See the sample code for your reference.

//Define fixed-width column specifications (example widths: 5, 10, 8)
int[] columnWidths = {5, 10, 8}; //Adjust based on actual column widths

Workbook workbook = new Workbook();
Worksheet worksheet = workbook.getWorksheets().get(0);

//Load the text file and read line by line
String filePath = "d:\\files\\sampletextfile.txt";
try (BufferedReader br = new BufferedReader(new FileReader(filePath)))
{
     String line;
     int rowIndex = 0;
     while ((line = br.readLine()) != null)
     {
         List<String> parsedColumns = parseFixedWidthLine(line, columnWidths);

         //Write parsed columns to cells in the current row
         for (int colIndex = 0; colIndex < parsedColumns.size(); colIndex++)
         {
              worksheet.getCells().get(rowIndex, colIndex).setValue(parsedColumns.get(colIndex).trim());
         }
         rowIndex++;
     }
}
//If required
worksheet.autoFitColumns();

//Save the workbook
workbook.save("d:\\files\\out1.xlsx");

......

//Function to parse a line based on fixed-width specifications
private static List<String> parseFixedWidthLine(String line, int[] columnWidths) {
        List<String> columns = new ArrayList<>();
        int startPos = 0;
        for (int width : columnWidths) {
            int endPos = Math.min(startPos + width, line.length());
            columns.add(line.substring(startPos, endPos));
            startPos += width;
      }
        return columns;
}

Hope, this helps a bit.

Kavii · October 29, 2024, 6:59am

Thank you for the response. Is it possible to achieve advanced text import settings using Aspose.Cells, specifically to set decimal and thousand separators for recognizing numeric data?

Kavii · October 29, 2024, 7:01am

Delimiter-based splitting currently only allows a single separator at a time, effectively functioning as a single checkbox. How can I enable multiple delimiters simultaneously, such as tab, semicolon, comma, and space?

Kavii · October 29, 2024, 7:01am

Is it possible to retrieve the split values after using a delimiter to separate them?

johnson.shi · October 29, 2024, 7:42am

@Kavii

For decimal and thousand separators, it is determined by the locale of the environment or specified for TxtLoadOptions.CultureInfo. So generally you may specify the proper locale according to the separators used in your text content to make them be parsed correctly. If you have more special settings or requirement for parsing numeric values, you may also use your custom parser(TxtLoadOptions.PreferredParsers) to load them.

We are afraid currently it is not supported to use multiple separators when importing text contents. Multiple separators will increase the complexity of importing and affect the performance of it, so we may investigate it later but we are not sure whether we can support it. We think the better way should be that you replace all other separators to specific one with simple logic and then importing the updated content with Aspose.Cells.

We are not sure which value you are talking about. Generally after importing the content, split values will be put into cells one by one. You may retrieve them easily by Cells.Item or Cells.GetEnumerator(). If it is not what you expect, please describe your requirement in more details so we can help you further.

Kavii · October 30, 2024, 3:42am

Suppose cell A1 contains the value “a&b&&c&d&&d&e”. I’m using the following code to split this value by the delimiter ‘&’:

TxtLoadOptions txt_load_options;
txt_load_options.SetTreatConsecutiveDelimitersAsOne(false);
txt_load_options.SetSeparator(‘&’);
worksheet.GetCells().TextToColumns(0, 0, 1, txt_load_options);

After this, the text is split into separate cells. However, I don’t know exactly how many cells the values occupy. If I try using worksheet.GetCells().GetEnumerator(), it retrieves all the cells in the worksheet, not just the split cells. This doesn’t help when the worksheet already has other values, as I only want to get the split cells. How can I fetch only the cells resulting from this split operation?

Kavii · October 30, 2024, 3:47am

After using the text-to-columns feature to split data, feature specify column data formats like General, Text, Date, Skip, or set a destination cell? Is this achievable with Aspose.Cells?

John.He · October 30, 2024, 6:33am

@Kavii
We will support obtaining column count after calling Cells.textToColumns method. The style will be determined based on the data, with numbers being set to the number style and dates being set to the date style.

We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.

Issue ID(s): CELLSJAVA-46154

You can obtain Paid Support Services if you need support on a priority basis, along with the direct access to our Paid Support management team.

johnson.shi · October 30, 2024, 7:47am

@Kavii

And for the data formats of the split data, we will improve textToColumns too to let TxtLoadOptions.getLoadStyleStrategy() determine how to handle the formats.

We have opened the following new ticket(s) in our internal issue tracking system for this requirement.

Issue ID(s): CELLSJAVA-46155

amjad.sahi · October 31, 2024, 8:01am

@Kavii,

We are pleased to inform that your issues (Ticket IDs: “CELLSJAVA-46154” and “CELLSJAVA-46155”) have been resolved. The fixes/enhancements will be included in the upcoming release (Aspose.Cells v24.11) that we plan to release in the first half of November 2024. You will be notified when the next version is released.

PS. Please note, in the new version (Aspose.Cells v24.11) Cells.TextToColumns() will return the total columns count of the split data.

aspose.notifier · November 13, 2024, 9:12am

The issues you have found earlier (filed as CELLSJAVA-46155,CELLSJAVA-46154) have been fixed in Aspose.Cells for Java 24.11.