How to distinguish MHT file for Aspose Word and Aspose Cell

Hi,
Both Apose.Words and Aspose.Cells support mht/mhtml file type.
Is there’s anyway that Aspose can detect a mht/mhtml file is Word or Excel?
For example, I have 2 mht file, one is Word, one is Excel. (create using Microsoft Office)
Both Aspose Words and Aspose Cells can open them but results are not the same.
Documents.zip (13.5 KB)

@long.to

Thanks for your inquiry. Unfortunately, there is no API to detect either MHTML is generated by MS Word or MS Excel. Could you please share some detail about your use case why you want this information? We will then guide you accordingly. Thanks for your cooperation.

Hi,
My use case is converting from MHT file type to other file type. But I can not distinguish 2 kind of MHT file to use Aspose.Words or Aspose.Cells accordingly.

Thanks.

@long.to

Thanks for sharing the detail. Unfortunately, Aspose.Words and Aspose.Cells do not support the requested feature at the moment. However, we have logged this feature request as WORDSNET-18014 for Aspose.Words CELLSNET-46544 for Aspose.Cells in our issue tracking system.

You will be notified via this forum thread once this feature is available. We apologize for your inconvenience.

@long.to,

Please try the following sample code to detect if the MHTML file is a workbook/Excel document as a temp solution:
e.g
Sample code:

if(FileFormatUtil.DetectFileFormat(dir + "excel.mht").FileFormatType == FileFormatType.MHtml) {
StreamReader reader = new StreamReader(dir + "excel.mht");
string line1 = reader.ReadLine();//MIME-Version: 1.0
string line2 = reader.ReadLine();
if (!string.IsNullOrEmpty(line2)) {
string[] strArr = line2.Split(';');
if (strArr.Length 2
&& strArr[0].Trim() "X-Document-Type" 
&& strArr1.Trim() == "Workbook") {
//Excel mht
}
}
}

And, we will be looking into how to support your demanded feature (“CELLSNET-46544”) in FileFormatUtil.DetectFileFormat in Aspose.Cells APIs.

Hi,

Thank you very much for your reply.
I understand your sample code, but the problem is we can not be sure that the X-Document-Type is at line 2.
Some MHT file will have some other infomation before the MIME-Version line.
Somethings like this:

From: <Created by FME>
Subject: MhtmlFormatter output report
Date: 2013-05-13 10:31:38
MIME-Version: 1.0

My current solution is scan the whole file for the “X-Document-Type” line, but that will take too much time to process.

Thanks

@long.to,

Please wait a bit and as we told you we will provide the supported version/fix with enhanced APIs.

Once we have any new information or update, we will share it with you.

@long.to,

You can check “ProgId” of meta node(for Excel : <meta name=3DProgId content=3DExcel.Sheet>) if you scan the whole file.

We will add a return property “ProgId” in the HTMLLoadOptions. Then you can detect the files with the temporary solution, and check ProgId after loading by Aspose.Cells again,if ProgId is not Excel.Sheet, then open the file with Aspose.Words.

@long.to,

This is to inform you that we have fixed your issue (logged earlier as “CELLSNET-46544”) now. We will soon provide you the fixed version after performing QA and incorporating other enhancements and fixes.

@long.to

It is to update you that we have closed the issue (WORDSNET-18014) with “Won’t Fix” resolution. It is out of scope of Aspose.Words. You can use following code example to get the desired output.

string text = File.ReadAllText(MyDir + "word.mht");
var match = Regex.Match(text, @"Microsoft Word 15");
var res = match.Success ? match.Groups[0].Value : "";

The issues you have found earlier (filed as CELLSNET-46544) have been fixed in Aspose.Cells for .NET v19.1. This message was posted using BugNotificationTool from Downloads module by Amjad_Sahi