WordML support in Aspose.word for java

Vik_A · June 13, 2008, 10:17am

Hi there,
Some of the engineers on my team are beginning to evaluate the Aspose.word for java library. A couple of questions came up:

When will Aspose.word for java support WordML. I have seen some mention of Q2 of 2008. That implies we would see a release in the next couple of weeks. Do you have a new target date or slightly better estimates?
Manipulating a large WordML document using a DOM parser like JDOM is very resource intensive. This is a common problem to parsing large XML documents in general. Is Aspose.word more efficient than XML parsing with a DOM parser?
Thx
Vik

Konstantin · June 13, 2008, 11:33am

Hi Vik,
Thanks you for evaluating Aspose.Words for Java library.

At the moment we are working hard to implement completely new porting technology. With this technology we will be able to porting all Aspose.Words for .Net functionality (including WodrML) to the next Java release. And all subsequent Aspose.Words for Java releases will be synchronized with .Net ones. Ours target for this completely new project is end of June.
We don’t use DOM or JDOM to parse WordML documents. Ours XML parsing technology is closer to SAX, or even more, to StAX. But we have to maintain in-memory document model (in our proprietary format) like DOM does so we can manipulate document nodes.
Best Regards,

Vik_A · June 13, 2008, 4:19pm

I am impressed with your quick and detailed response. Thank you.
As you may infer, efficient processing of large documents is very important for us. My team has come up with a document and some representative operations. I was wondering if it would be possible to obtain some metrics around this document. I do realize that I am asking for a lot in terms of a pre-sales support question.
Here are the details of the document:
The document is attached. It has the following items:
a) 150 tables
b) 150 paragraphs with text
c) 150 lists with 3 list items each
Here are representative operations for the purposes of benchmarking performance:

Find every table and writes to standard out
Find every paragraph and writes to standard out
Find every list item (not just a list, but a list item) and writes to standard out
We would love to know the following metrics:
i) What were the peak and averge memory numbers?
ii) What were the peak and average CPU utlization?
iii) How much time did the entire operation take?

Of course ii) and iii) will be hardware dependent, so they would have to be qualified based on the hardware. My baseline software environment is Sun Java 1.4 on Windows.
When processing large documents, i) is of utmost importance so that the application does not run out of memory. ii) comes next and iii) is further down.
Now, if you were to compare the performance of your java library to simple XML processing using a DOM parser you will have some very interesting benchmarks that would interest a lot of people.
It would definitely interest me.
Good luck on your end of June release.
Thx
Vik

alexey.noskov · June 14, 2008, 11:20am

Hi
Thank you for your interest in Aspose.Words. I tested your document using Aspose.Words for .NET and got the following results.

It takes around 5 seconds to load your document.
It takes around 3 seconds for saving document.
It takes 60MB of RAM. Usualy Aspose.Words needs about 10 times more memory for building the model than the source document size. But this is dependent of your document.

My environment is
Celeron CPU 2.80GH
1.5GB of RAM
Aspose components are designed to be simultaneously used by 100s and 1000s of users. Unfortunately, we don’t have any particular performance testing results. But you are free to evaluate the performance of the product to your satisfaction.
Best regards.