Use Cases for Loading and Saving OOXML documents in Aspose.Words

I'm inviting you to participate in shaping up of the Aspose.Words API again.

Brief Description of the Problem

DOCX are macro-free files and .DOCM are macro-enabled files and Microsoft Word 2007 is quite serious about enforcing the "file extension vs actual file type" relationship.

Rename a .DOCM file with a VBA project into a .DOCX file and open in MS Word, it will fail saying the document is invalid. It will not give any other explanation, it will just say XML is invalid or something like that.

With the existing Aspose.Words API it will be very easy for users to generate .DOCX files that contain VBA projects. For example, when converting .DOC into .DOCX. Such files will fail to open in MS Word 2007 saying the document is invalid.

This will bring a storm of support requests. We need to do better than that and make it clear to the users what they are doing.

Analysis of the Existing Open/Save API

User can open a file (load into Aspose.Words.Document object in memory) from a stream or from a file using the Document constructors. The user can specify the format of the file or Aspose.Words will detect the format automatically.

// Use Case 1: Open a file detect the file format automatically.
// The name of the file or extension does not matter.
//
// All of the options below do essentially the same thing.
Document doc = new Document("MyFile.doc");
Document doc = new Document("MyFile.doc", LoadFormat.Auto, "");
Document doc = new Document(myStream);
Document doc = new Document(myStream, "", LoadFormat.Auto, "");

// Use Case 2: Open a file, user specifies the file format explicitly.
// Aspose.Words tries to open the specified document using the specified load format.
// If the format is wrong, it throws an exception.
//
// Why one would use this option? Only if the user thinks he knows the file format for sure.
// This approach allows to avoid auto format detection which incurs a small performance hit.
Document doc = new Document("MyFile.doc", LoadFormat.Rtf, "");

// Use Case 3: Save, the file format is detected automatically.
// Aspose.Words automatically chooses the save file format based on the extension of the file name you provided.
//
// This is a very easy to use, most common method when you are saving into a disk file.
doc.Save("MyFile.doc"); // Saves in Microsoft Word Binary DOC format.
doc.Save("MyFile.rtf"); // Saves in RTF format.

// Use Case 4: Save, the user specifies the file format.
//
// As you can see, this method is very useful when you are saving into a stream. Otherwise how would you specify what format to save in?
doc.Save(myStream, SaveFormat.Rtf);
doc.Save("MyFile.doc", SaveFormat.Doc);

Problem Highlighted

// I think this code will be quite common. Convert a DOC file to OOXML.
Document doc = new Document("MyFile.doc");
doc.Save("MyFile.docx");

If MyFile.doc has a VBA project in it, it will be faithfully preserved by Aspose.Words and saved to MyFile.docx. Now, if you open MyFile.docx in MS Word 2007 you will get:

The Office Open XML file MyFile.docx cannot be opened because there are problems with the contents. Details... No error detail available.

I think a more useful error message from MS Word 2007 would have been a good thing here, but we have to move on.

What does the user do now? Ah, oh... lets report the problem to the Aspose.Words forum because that 3rd party Aspose component must be doing something wrong.

Clearly, we must do something with Aspose.Words to prevent this problem situation. I have some ideas, but this time I want to offer everybody to brainstorm starting with a blank canvas.

If some VBA project is detected in the original .doc file, it should:



1) throw an error when trying to save to DOCX, that time with an explicit message.

or

2) silently remove the VBA portion as it would not work in DOCX anyway.



None of these techniques should require any API change.

It’s hard to choose between 1 and 2 :slight_smile:



1 will be “in your face” and easy to see straight away. But if this scenario is not caught and learned during development, it will occur in a production system requiring and urgent attention of a developer.



2 will be silent and there will possibly be a big gap between the time/place where VBA project was filtered out and time/place that fact was was discovered, making it hard to track down the culprit.



In both cases, the user will have to go back to his project and add the following code (if he wants VBA preserved):



if (doc.HasMacros)

doc.Save(“MyFile.docx”);

else

doc.Save(“MyFile.docm”);



How many of you will come across this problem and have to write this code? It would be ideal if you never had to come across this and write this code, but at the moment I don’t see a way.



Therefore, we need to choose between 1 and 2. How to choose?



Will most users want their VBA preserved by default or deleted? If we can answer this, we can choose. The ones who want VBA preserved will appreciate 1. The ones who don’t mind it deleted will want 2.











Of your described use cases, I would prefer option where the format is explicitly indicated (case 2, 4). It seems the cleanest and most obvious.