Free Support Forum - aspose.com

Several word files errors

I have tried to extract text from a lot of different word files, and here are the errors I came up with, including the word files. (DOH you can only upload 1 file, i will upload the rest in comments

Unexpected table state at the end of the cell. For free technical support, please post this error and the file in the Aspose.Words Forums http://www.aspose.com/forums/ShowForum.aspx?ForumID=75.
Filename = ADDY_RulesGuidelines_05-06.doc (attached to article)

Expected to get tabs sprm only once for a paragraph. For free technical support, please post this error and the file in the Aspose.Words Forums http://www.aspose.com/forums/ShowForum.aspx?ForumID=75.
Filename = GRIEVANCE_TRACKING_URC.DOC (attached to comment 1)

End of body is out of sync. For free technical support, please post this error and the file in the Aspose.Words Forums http://www.aspose.com/forums/ShowForum.aspx?ForumID=75.
Filename = ics218.doc (attached to comment 2)

End of body is out of sync. For free technical support, please post this error and the file in the Aspose.Words Forums http://www.aspose.com/forums/ShowForum.aspx?ForumID=75.
Filename = ics221.doc (attached to comment 3)

Cannot find this stream in the storage.
Filename = IEEEInst.doc (attached to comment 4)

Thanks, Richard. The posts like yours are very valuable for us because they help making our product better.

I will look at these issues at weekend and post you back.

I have added these issues to our defect list. We will deal with them in the next 2-3 weeks.

Fixed Unexpected table state at the end of the cell. The document has a character #7 that is normally end of cell or end of row character, but in this case it is just a normal paragraph. So the document is malformed. Fixed in Aspose.Words 3.5.1 that will be out in a few days.

Fixed: Expected to get tabs sprm only once for a paragraph. also in 3.5.1

Fixed: End of body is out of sync.

The document has a section break inside a cell. This is malformed document, but MS Word reads it resiliently. I made Aspose.Words to read this as well.

Re: Cannot find this stream in the storage.

The DOC files are structured storage files. This file is completely stuffed from the structured storage validity point of view. There is a "directory" of streams inside the file (similar to FAT in MS-DOS) and that directory is stuffed. The "1Table" stream that Aspose.Words looks for is in the file physically, but it cannot be found by traversing the directory tree hence the error.

To resolve this issue I need to make Aspose.Words read corrupted structured storage files resiliently. Sorry we are working on many other features at the moment and I'm not going to work on this one.