Convert Equation.DSMT4 in Word Document to OfficeMath Nodes using C# or Java | Custom Alternative Content for OLE Objects

We are looking at creating software which would handle a variety of word processor document formats, convert them to a form of XML and then pass on the XML to another library for further processing.

Aspose.Words seems to be a fairly strong candiate for the conversion process, however there is one small step which I cannot work out a solution for by looking at the documentation.

We would like to support the inclusion of math equations in the documents, this ideally would include supporting MathType equations, which normally are OLE objects inside documents. The library we will be passing the XML version of the document only can handle pure XML, so we would need to convert the MathType equations to a form of XML (MathType can produce MathML so that would feel like the obvious choice).

How should we include MathML we retrieve from the MathType objects in documents when using Aspose.Words?

One thing I looked for was whether there was a callback we could register for handling OLE objects when Aspose.Words is saving (eg. if we were to save to HTML). I cannot find such a callback. The graphics which Aspose.Words inserts for the OLE objects would not be acceptable for our use.

Another thought is to save using FlatOPC, I initially thought based on the name that the custom XML for FlatOPC would be where I could put the MathML, but it appears Microsoft had a different definition for custom XML. Is there a feature to the FlatOPC format which Aspose.Words supports where we can store the MathML and have it associated with the OLE object?

I guess we could write a MathType translator to convert MathType objects to OMML and insert that, but that probably will take a bit of work, if there is a simpler option we obviously would prefer that.

Any other option which I have not considered which would work for what we need to achieve?

Hi,

Thanks for your interest in Aspose.Words.

First of all, yes, Aspose.Words fully supports converting a Word document to XML-based formats e.g. DOCX, FlatOPC, WordML and other Extensible Application Markup Language XAML formats. Please see the documentation here.

Secondly, the math equations in a Microsoft Word document are represented by OfficeMath objects in Aspose.Words’ DOM (Document Object Modal). When you convert the document to other file formats, e.g. DOC, DOCX, RTF, PDF, HTML etc, these OMML math formulas remain preserved. When it comes to exporting these OfficeMath nodes to HTML format, Aspose.Words simply renders them as images . This is because Aspose.Words’ HTML engine tries to mimic the way the Microsoft Word works. To you, this means that if you export your Microsoft Word document with OMML to HTML format using Aspose.Words, the output will appear almost exactly as if it was done by Microsoft Word.

mwhapples:
The library we will be passing the XML version of the document only can handle pure XML

Could you please create such an XML containing OMML equations using Microsoft Word that your third party library can easily load? I will investigate the issue on my side and provide you more information.

Best regards,

Thank you for the reply.

Office Math does work as we would wish, however it has been specified for the software product it must also be able to handle Microsoft Word documents with equations produced by Design Science MathType (I have been told that is what most mathematicians use when doing significant amount of mathematics in MSWord, so it has been explicitly stated as needing to support MathType equations).

Please refer to the attached documents, the doc file is a sample document created in Word with a MathType equation, the XML (changed extension to XML as fopc not permitted to be attached) file is what I get when converting to flatOPC format using Aspose.Words.

As the fopc sample shows, while the document is in XML format, the structure of the equation is not XML based as the equation is an OLE object. As things currently stand our XML library cannot process the OLE object, it would ideally like the equation in an XML form (MathML or may be OMML).

My question is, whether Aspose.Words would allow somehow associating the OLE object with MathML which we could get for the OLE object by calling the Mathtype API? I don’t know whether this may be restricted by the word formats.

Should this not be possible, I have started asking questions on whether our XML library could be modified so that it could do on-the-fly substitution for the <w:object> element replacing it with MathML from Mathtype before proceeding with processing the XML.

Hi,

Thanks for the additional information. Well, the bottom line is that currently Aspose.Words doesn’t provide public methods and properties to create or modify an OfficeMath object. It does also not provide any methods to convert MathType equations (ole object progid: ‘Equation.DSMT4’) to OfficeMath nodes. I have logged a new feature request in our bug tracking system as WORDSNET-7591. Your request has also been linked to this feature and you will be notified as soon as it is supported. Sorry for the inconvenience.

Best regards,

1 Like

Thank you for the reply, at least I know it will be worth continuing in trying to get our XML processing library to handle MathType objects if we will use Aspose.Words.

Hi,

Our development team will further look into the details of this problem and will keep you updated on the status of correction. We will be sure to inform you as soon as this issue is resolved.

Best regards,

A post was split to a new topic: Get Object data from the Mathtype equation such as MTEF data or Base-64 code