Extracting from Excel/xls files

I’m having trouble finding sample code to do the following (the code I did find in this forum did not work, some methods and properties referred to by the code were not there, so I wonder if they were using a different version of the library)

1. Save embedded attachments (OLE objects) as separate files
2. Create tiff (or png) image of a worksheet
3. Save (semi-formatted) text of a worksheet

My project is in Java, so I’d rather use the Java version, but from looking at the documentation, the Java libraries are lagging behind, so I may have to use the .net versions. Please advise…
Q1. What is the timeframe before the Java libraries catch up?
Q2. Are the .net versions more active–are they always be more up to date and have the latest bug fixes?

1. Save embedded attachments (OLE objects) as separate files

We have supported this feature. Which version are you using? Have you tried our latest one?


2. Create tiff (or png) image of a worksheet

We are working to convert worksheet to image. Now we have a technical obstacle to make it and we are investigating it. After we solve it, we can give you a specific time frame for this feature. Hopefully we can make it in about 3 months.


3. Save (semi-formatted) text of a worksheet

Can you elaborate this feature?

Q1. What is the timeframe before the Java libraries catch up?
Q2. Are the .net versions more active--are they always be more up to date and have the latest bug fixes?

Both .NET and JAVA version are active. We develop features based on users' requirements and there are some difference of requirements between .NET and JAVA version. So there are some feature differences between them.

.NET version has following features that's not included in JAVA version:

*Chart2Image conversion

*Sheet2Image conversion

*Direct Xls2PDF conversion(without Aspose.PDF)

Java version has following features that's not included in .NET version:

*MHTML importing

*More ODS support

Hopefully in March 2010, JAVA version can have same or more feature set with .NET version.

Hi

1) Please see the following document for your reference on how to extract ole objects in worksheets:
http://www.aspose.com/documentation/java-components/aspose.cells-for-java/managing-ole-objects.html

Please try our latest version v2.1.1 ( http://www.aspose.com/community/files/72/java-components/aspose.cells-for-java/entry208557.aspx ) as Laurence has suggested you. If you still find any issue, kindly let us know with your template file and sample code, we will check your issue soon.

Thank you.

Hi Laurence, and thanks for the response.

For #3, I want to extract the text of workbook for text-indexing the document. It doesn’t have to be very pretty, but should be somewhat readable. Headers and footers are needed (there may be important text in there, but it doesn’t have to repeat at each page).

I’m not sure if Excel allows textboxes the way Word does, but if so, I need to extract those too. Any hidden or filtered cells or worksheets should also be extracted.

Speed and completeness of text are most important.

I suspect something like this has been done often, so I thought I would ask for sample code before trying to decipher the documentation.

For #2, if there is something I can help with or give ideas for, please let me know!

Hi,

For 3):

We are not very clear about your need. But, I think you may save .txt (semi colon’s seperated file), csv and tab delimited files if it fits your requirements.

e.g

Workbook wb = new Workbook();
wb.Open(@“f:\test\MyFile.xls”);
wb.Save(“f:\test\out1.txt”, ‘;’);

Workbook wb = new Workbook();
wb.Open(@“f:\test\MyFile.xls”);
wb.Save(“f:\test\out1.csv”, FileFormatType.CSV);

If it does not fit your requirements, please create your sample input and output files in MS Excel manually, post the files here, we will check it soon.

Thank you.

Hi,

Thank you for considering Aspose.

For #3, we are not very clear about your requirement. However, we have created a sample program to show how to save the text in a Workbook (find the attached file), hopefully it will help you.

Thank You & Best Regards,

Thanks for the sample code, Nausherwan. I had written additional code as well, and once it works ok, I can post it so others can use it as well.

Two quick questions:
1. Why do you get the shapes in reverse order (highest index first)?

2. is there a way to get the shapes at the same time as the cells, so they will be in the same order as the cells? For example, if a shape starts at cell C7, when I extract the text for C7, I want to extract the text for the shape at that time and put it after the text for C7, so the shapes will be roughly in the same order as someone would read the spreadsheet.

This is not absolutely necessary, I can find some other way to merge this data, but if there is an easy way to do it, it would save me some time.

Hi,

Thank you for considering Aspose.

amp834:

  1. Why do you get the shapes in reverse order (highest index first)?

There is no any special purpose for the reversed order when getting shapes. In fact you can loop shapes in any order and get any shape you like by index.

amp834:

  1. is there a way to get the shapes at the same time as the cells, so they will be in the same order as the cells? For example, if a shape starts at cell C7, when I extract the text for C7, I want to extract the text for the shape at that time and put it after the text for C7, so the shapes will be roughly in the same order as someone would read the spreadsheet.

To get shapes according to its position, I am afraid you have to gather the position info of shapes(such as the top-left cell of shape) before exporting cells. Then when exporting text of cells and shapes, you need to manually check the gathered position info of shapes to get the shape that need to be exported with current cell.

Thank You & Best Regards,