How to convert Word document to HTML?

I am wondering if there is a way to convert the word document to HTML and go through the HTML node. I notice that I have to call doc.Save() to do some conversion. Does that mean it is storing the file into the hard drive? K thanks and do let me know.


This Topic is created by imran.rafique using the Email to Topic plugin.

Hi

Thanks for your request. The code you are looking for is quite simple:

//Open Document
Document doc = new Document("in.doc");
//Save HTML document
doc.Save("out.html");

Hope this helps.

Also, see the following link to learn how to save a document:

Best regards.

I am trying to call this via ColdFusion and am having some difficulty. I can do the above, but when I try to take it from html and convert it back to doc or rtf, I get errors.

The code

(this bit works)

<cfset doc.init(“C:\Temp\Resume-Jamie2.rtf”)>
<cfset doc.save(“C:\Temp\ASPoseResumeConvertTest.html”)>

(this bit does not)

<cfset doc2 = createObject(“java”,“com.aspose.words.Document”) />
<cfset doc2.init(“C:\Temp\test.html”) />
<cfset doc2.save(“C:\Temp\ASPResumeConverted.rtf”) />

The error I get is:
Object Instantiation Exception.
An exception occurred when instantiating a Java object. The class must not be an interface or an abstract class. Error: ‘’.

<cfset doc2.init(“C:\Temp\test.html”) />
<cfset doc2.save(“C:\Temp\ASPResumeConverted.rtf”) />

The init is what’s throwing the error

Any ideas?

Hi

Thanks for your inquiry. I managed to reproduce this problem on my side. You will be notified as soon as it is resolved.

It seems the problem occurs because ColdFusion does not like how Aspose.Words for Java is obfuscated.

Best regards.

Hi,

You can try using the following code to open HTML document in Coldfusion:

<html>
      <body>
           <cfobject type = "Java" action="create" class="com.aspose.words.LoadFormat" name="asposeLoadFormat">
           <cfset doc = CreateObject("java", "com.aspose.words.Document").init("C:\Temp\test.html", asposeLoadFormat.HTML, 'null') >
           <cfset doc.save("C:\Temp\out.doc")>
      </body>
</html>

Hope this helps.

Best regards.

Hi Alexey,
Just gave that a try and it still gave me the same error:

Object Instantiation Exception.
An exception occurred when instantiating a Java object. The class must not be an interface or an abstract class. Error: ‘’.

Hi

I used the latest version of Aspose.Words for Java (3.3.0). You can download it from here:

Please try using this version and let me know if it still does not work for you.

Best regards.

Still getting the same error. I downloaded 3.3, cleared out any cached classes, restarted CF, and still getting the same result.

Hi

It is odd, but thank you for additional information. We will further investigate the problem and provide you more information.

Best regards.

Hi again Alexey,
Just wanted to check back in and see how this issue was progressing. My manager’s looking for a demonstration soon so we can show the proof-of-concept and this is the last step for me :slight_smile:
Thanks

Hi

Thanks for your inquiry. Unfortunately, the issue is still unresolved. But we have released new version of Aspose.Words for Java, so you can try it. You can download it from here:

Best regards.

Gave that a try, but still getting the same issue.

I switched back to the .Net dll and it is (sort of) working with that. It opens the original rtf document as an HTML file, I make a change then try to save it back. However, the .doc (and .rtf) files are blank. They are 10 and 7K files, and when I view them in notepad there is a binary data in them, just nothing is displayed in MS Word.

My save is very simple:

<cfset doc2.save(“C:\Temp\out.rtf”)>

Any suggestions?

Thanks for all the help with this.

OK, sorry for the multiple posts. I played around a little with it on my own and found a new issue.

If I set up the class instantiation like this:

Then it works (according to the docs, the namespace should be class=“Aspose.Words.Document”, but when I do that I get

500
ROOT CAUSE:
java.lang.ClassFormatError: Illegal method name “?” in class Aspose/Words/Document

I guess I should use the com.aspose.words namespace to prevent the above issue?
I didn’t know if this was the reason that the save didn’t work, but apparently it’s not
as the class won’t instantiate using that namespace.

OK, I’m feeling a little dumb at this point.

I had not removed the Java class from my classes directory in ColdFusion, so when I was referencing com.aspose.words.Document, it was still using the Java object. I was not calling “init”, so thus the reason for the blank document. I guess it does create the file though, so it’s definitely an issue with how the jar converts an html document to another format.

Once I switched to the DLL and restarted coldfusion, I changed the class to class=“Aspose.Words.Document” and now I get the 500 error that I mentioned previously.
“Illegal method name “?” in class Aspose/Words/Document”
So still an issue, but perhaps something you guys can assist me with?

Thanks,

Hi

Thanks for your request. Please try using the following code:

<html>
<head><title>Save or Convert a Document</title></head>
<body>
<b>This example shows how to convert a document to various formats using Aspose.Words</b>
<cfset assemblyPath="C:\Program Files\Aspose\Aspose.Words\Bin\net2.0\Aspose.Words.dll">
<cfset doc=CreateObject(".NET", "Aspose.Words.Document", assemblyPath).Init("C:\Temp\in.doc")>
<cfset saveFormat=CreateObject(".NET", "Aspose.Words.SaveFormat", assemblyPath)>
<cfset doc.save("C:\Temp\out.doc", saveFormat.Doc)>
<cfset doc.save("C:\Temp\out.docx", saveFormat.Docx)>
<cfset doc.save("C:\Temp\out.rtf", saveFormat.Rtf)>
<cfset doc.save("C:\Temp\out.html", saveFormat.Html)>
<cfset doc.save("C:\Temp\out.odt", saveFormat.Odt)>
<cfset doc.save("C:\Temp\out.txt", saveFormat.Text)>
<cfset doc.save("C:\Temp\out.xml", saveFormat.WordML)>
<cfset doc.save("C:\Temp\out.mhtml", saveFormat.Mhtml)>
<cfset doc.save("C:\Temp\out.epub", saveFormat.Epub)>
<cfset doc.save("C:\Temp\out.pdf", saveFormat.Pdf)>
</body>
</html>

Best regards.

Same error:

ROOT CAUSE:

java.lang.ClassFormatError: Illegal method name “?” in class Aspose/Words/Document

This occurs on the “createObject” step. It doesn’t reach the “init” method call.

Hi

Thank you for additional information. Most likely this is the same problem with obfuscation of the library. I suppose all commercial libraries are also obfuscated and maybe the problem should be actually fixed in ColdFusion. Because both .NET and Java version works without any issues in other environments. The problem occurs only in ColdFusion.

Best regards.

That may be it. It just seems strange that the obfuscation is causing these errors to occur when other commercial libraries I’ve used seem to work fine in ColdFusion (and ColdFusion is built on top of Java, so that’s usually the fall back to get something working in ColdFusion). Also, the initial conversion to HTML using the Aspose.Words Java library works, just not from HTML back to doc/rtf/etc, so it can’t be completely related to the obfuscation.

If you guys do get it working in ColdFusion, please let me know as I’m still looking to demo this to my manager (as it is halfway there with the conversion of rtf to HTML).

Thanks

Hi

Thank you for additional information. I am sure the problem occurs because obfuscation. I tried with unobfuscated version of Aspose.Words and all works fine with it.

Anyway, I think, ColdFusion should work with obfuscated libraries, since these libraries work without any issues is other environments.

Best regards.