Detecting file format from stream with Aspose.Cells

Hi,

We are currently trying to outline a strategy for detecting the Aspose component that could handle a stream (starting from Detect file types from stream).

The code we use for detecting if Aspose.Cells 20.2.0 can handle a stream is:

                    var canHandle = false;
                    try
                    {
                        input.Position = 0L;
                        
                        var fileFormatInfo = Aspose.Cells.FileFormatUtil.DetectFileFormat(input);
                        if (fileFormatInfo.LoadFormat != Aspose.Cells.LoadFormat.Unknown)
                        {
                            //MS Excel formats
                            canHandle = true;
                        }
                        else if (fileFormatInfo.FileFormatType != Aspose.Cells.FileFormatType.XML) //exclude XML based formats like VDX
                        {
                            var loadOptions = new Aspose.Cells.LoadOptions(LoadFormat.CSV);
                            using (new Workbook(input, loadOptions))
                            {
                                canHandle = true;
                            }
                        }
                    }
                    catch (Exception)
                    {
                    }

We already started to see problems with this generic approach:

  1. we needed to add the exclusion for formats that Aspose.Diagram should handle: VDX detects as XML; this one was easy
  2. the real problem we run into was with the code that is trying to cover detection of CSV by trying to load the stream as CSV (How to Detect the File Format - #4 by babar.raza says that this cannot be accomplished through utilities functions); this code will work for CSV but unfortunately also for many more file formats that Aspose.Cells cannot handle: PNG, DWG (CAD format) and TXT.

We just want to make sure that we have a piece of code that handles only formats that Aspose.Cells really knows how to handle:

  • Microsoft Excel: XLS, XLSX, XLSB, XLT, XLTX, XLTM, XLSM, XML
  • OpenOffice: ODS
  • Text: CSV, TSV
  • Web: HTML, MHTML
  • Numbers: Apple’s iWork office suite Numbers app documents

In particular, we are interested in making sure that Aspose.Cells is the component that will handle the MS Excel formats and the CSV one while all the other file formats that are clearly not in the Cells wheelhouse to be disregarded.

Here are the files that are detected as CSV Aspose.Cells.DetectedAsCsv.zip (496.8 KB).

Is there a better approach we could pursue here?

Best regards,
Alin

@gwert,

Thanks for sample code segment, sample documents and details.

I am afraid, there is no better way to cope with it. Detecting text formats (e.g CSV, TXT) from streams is always challenging as there is no structure or specifications for such file format types. I guess if you could devise evaluation detection for such formats (e.g CSV, TXT, etc.) from file path rather than streams, it will be better for you to handle in code.