Cannot extract the text from .mht file using aspose words

Hello,


My company has purchased the license for aspose words. We are trying to decide if we will renew the license. We are trying to use it on a file in the .mht format.

Please see the attached file (sample.ppt). This file is in the .mht format. NOTE: I renamed it to sample.ppt because your website won’t allow me to upload a file with the .mht extension.

I am trying to extract the text from this file using aspose words. Here is my java code:

com.aspose.words.LoadOptions lopt = new com.aspose.words.LoadOptions();
lopt.setLoadFormat(com.aspose.words.LoadFormat.MHTML);

String dpath = “C:\aspose\Aspose.Words.lic”;
com.aspose.words.License dlic = new com.aspose.words.License();
dlic.setLicense(dpath);
String docin = “c:\sample.ppt”;
com.aspose.words.Document doc = new com.aspose.words.Document(docin, lopt);
System.out.println(doc.toTxt());
System.out.println(doc.toString());


The output of this program is:

null null null
com.aspose.words.Document@4bc222e

can you please help provide the code to extract the text from the .mht file?

Hi Ferdinand,

Thanks for your inquiry. Please note that Aspose.Words mimics the same behavior as MS Word does. If you load your mhtml into MS Word, you will get no contents. If you open your mhtml in browser, you will also get no contents. Please make sure that you are using the correct mhtml file.