Error In Extraction of Words Document


#1

Hi,

I am trying to use Aspose.Word to get the content of some words files and found the following errors.

Error1.doc
“PersonalParticularsPersonalParticulars” appears after “education background” incorrectly. Is it because of that particular formatting used?

Error2.doc
Same document and changing some part of the contents… it throws this error.

Error Source : Aspose.Word
Error Description : Cannot find node ‘?.?’ at position -1. Please report this file to word@aspose.com.
Error Stack Trace : at ?.?.?(Type ?, Int32 ?)
at ?.?.get_?()
at ?.?.get_?()
at ?.?.?()
at ?.?.?()
at ?.?.?(Stream ?)
at Aspose.Word.Document.?(Stream ?)
at Aspose.Word.Document.?(Stream ?, String ?)
at Aspose.Word.Document…ctor(String fileName)
Hereby i attached both files. Can please have a look?

Thanks a lot.


#2

Second error file


#3

Hi,

I have that exception thrown for the both documents. It caused by that the “Personal Particulars” table is placed into a textbox and this should be avoided for the time present because Aspose.Word sometimes could have problems with textboxes and floating objects in general.

In regard to the incorrect extraction - there was some strange formatting right after the “Education Background” table. I’ve deleted it and now the template seems to work ok, please see the attachment.


#4

This raises a point that concerns my previous problem as well. If Aspose.Word doesn’t support certain features, whether it be floating objects or certain HTML formatting tags, shouldn’t Aspose.Word simply ignore the unsupported objects instead of raising exceptions?

That way, developers like myself do not have to wrap critical methods like “builder.Save” or “builder.InsertHTML” with try catch blocks.


#5

Aspose.Word is a component used by many developers and there are always at least two opinions about every feature. The way unsupported features are handled at the moment suits many customers who don't want to silently get some of their document content ignored, but obviously does not suit others. We will probably introduce some sort of a flag that will allow to select what Aspose.Word should do when it encounters an unsupported feature.

Thank you for understanding.


#6

Has this feature been added yet?


#7

Sorry, we decided to settle on consistent throwing in the meantime. We will add this feature later.


#8

In my opinion, an exception should not be thrown for unhandled features such as HTML styles and is generally not considered to be good practice to throw an exception for an error that can be handled. The current condition is:

  • You have a known error
  • You know the machine's current state
  • There exists a way to "handle" the error

These conditions distate that the programmer can handle the error instead of throwing up his hands and letting the user deal with it. Not to mention exceptions can be very costly due to the machine having to create new objects and traversing the stack. Instead of an exception, one could provide a return value on "InsertHTML" and proceed, ignoring any unsupported elements.