Hi,
We are currently trying to outline a strategy for detecting the Aspose component that could handle a stream (starting from Detect file types from stream).
The code we use for detecting if Aspose.Words 20.3.0 can handle a stream is:
var canHandle = false;
try
{
input.Position = 0L;
var fileFormatInfo = Aspose.Words.FileFormatUtil.DetectFileFormat(input);
if (fileFormatInfo.LoadFormat != Aspose.Words.LoadFormat.Unknown
&& fileFormatInfo.LoadFormat != Aspose.Words.LoadFormat.Pdf)
{
canHandle = true;
}
}
catch (Exception)
{
}
We already started to see problems with this generic approach:
- we needed to add the exclusion for PDF format as Aspose.Words will detect PDF and we want Aspose.PDF to be the one that does it
- the real problem is that a lot more file formats (including diagrams and some images (!)) are being detected as having LoadFormat as Aspose.Words.LoadFormat.Text
We just want to make sure that we have a piece of code that handles only formats that Aspose.Words really knows how to handle:
- Microsoft Word: DOC, DOCX, RTF, DOT, DOTX, DOTM, DOCM FlatOPC, FlatOpcMacroEnabled, FlatOpcTemplate, FlatOpcTemplateMacroEnabled
- OpenOffice: ODT, OTT
- WordprocessingML: WordML
- Web: HTML, MHTML
- Text: TXT
- MOBI
In particular, we are interested in making sure that Aspose.Words is the component that will handle the MS Word formats and the TXT one while other text-like file formats are being disregarded.
Here are the files that are detected as Aspose.Words.LoadFormat.Text: Aspose.Words.DetectedAsText.zip (88.2 KB)
Is there a better approach we could pursue here?
Best regards,
Alin