Embedded document name

I have been evaluating the Aspose.Cells, Aspose.Slides and Aspose.Words APIs with a view of extracting embedded documents from MSWord, Excel and PowerPoint documents. Altough the extracting of embedded file from these docuements is quite straight forward and impressive, I'm struggling to maintain the origincal file names of the documents.

I'm aware that at certian times this information is not saved while embeddeding the document, but its is emperitive that I get as much a success rate as I can. Is there a workaround I can do to extract this information for the OleObject / OleObjectFrame?

Thanks.

Hi,



I am a representative of Aspose.Cells, I think you may use OleObject.SourceFullName attribute to get the source file name for your requirement.

Thank you.


Thanks Amjad for the reply. I'm already using this property for Aspose.Cells and Aspose.Word but hardly ever get anything back. For Aspose.Slides, there is no such property since we're us OleObject Frame. For Aspose.Slides, we're using OleObjectFrame.AlternativeText to extract file name, but again its hardly ever populated.

As I mentioned earlier, it is imperitive that we get the file names in most cases and now that we know that there is no certain way of getting them through the properties exposed by the libraries, is there a workaround? Perhaps reading directly from the file stream.

I know it can be done, hence the persistent questions.

Thanks again.

Hi,

"I’m already using this property for Aspose.Cells and Aspose.Word but hardly ever get anything back"

Could you give us your template Excel file and sample code (you are using) here to check the issue regarding Aspose.Cells component.

Regarding Aspose.Slides, one of our Aspose.Slides team member would help you soon.

Thank you.

Attached is the sample code and sample files for which I have not been able to extract the embedded document names.

Hi there,

Thanks for attaching your documents here for testing.

When an object is embedded inside a document (for MS Word at least) the filename is actually not stored. The SourceFullName property only applies to when a document is linked, and that property contains the path of where to find the document it is linked too. In Aspose.Words, the OleFormat.IsLink property is actually just testing if the SourceFileName property is empty or not, as it appears the presense of any set path is what defines an object to be linked or not. On the other hand, embedded objects store no file name so with all of the embedded objects you have encountered, this field has been empty.

You will also notice that the embedded objects in your documents display the short file name below the icon. This is not stored as text, but is actually an image in EMF format which is saved by MS Word with the object itself (and which can also be accessed and extracted using the ImageData property of the Shape class)

I believe the above applies to the other Aspose products as well, although they will need to clarify that. My suggestion for you to be able to extract original file names with an embedded object is to simply store them in the Alternative Text or ScreenTip of the object in MS Word (this can then easily be extracted from the object in code by using the appropriate property), or you may even want to consider running some OCR on the image and extracting the file name from there.

If you have any further queries, please feel free to ask.

Thanks,

Hi,

Thanks for sample project and files.

I have logged an issue regarding Aspose.Cells component into our issue tracking system with an id: CELLSNET-19631. We will look into it and get back to you soon.

Thank you.

That was a great tip to use thee wmf or icon file.

I've managed to extract the name of the file (if present) from the byte stream provided by the OleObject.ImageData for Excel and ImageData.ImageBytes for MSWord. No saving the icon file or OCR required.

One help I require from the Aspose.Slides team is where to get access to a similar Image Data stream when dealing with PPT.

Thanks again.

Hi there,

Ah that's great result! I thoroughly inspected the OleObject's bytes, but didn't even think to actually take a look into the bytes of the icon as well. It seems when the embedded object is set to display as an icon, it stores the icon image along with a caption which by default the filename of the embedded document. Seems no such luck when the object is embedded as the proper content though. In any case I'm glad you found what you are looking for. I have logged a new feature request to be able to properly extract the caption (filename) from embedded objects with an OleIcon in Aspose.Words. We will keep you informed of any developments.

Thanks,

Hi Baroon,

If OleObject.IsLink is false, the SourceFullName is invalid.
You will also notice that the embedded objects in your documents display the short file name below the icon. This is not stored as text, but is actually an image in EMF format which is saved by MS Excel.
If you want to make source full name valid, please confirm that “Link to file” is checked (Tools -> insert->Create from file (Tab)).

Thanks,