Hello support,
We have been evaluating the Note tool and we noticed that embedded Excel tables are being converted to HTML as images, not as a table with text. I was wondering why that is the case?
The use case, we are looking for is to be able to extract all the text/values from OneNote files for analysis. But, if Excel tables are images, we can’t extract text from them.
Here is an example:
TablePageOneNoteFiles.zip (505.2 KB)
@eboraks2017,
Can you please elaborate what code samples you are trying for reading the information from the OneNote file? It seems you are converting the OneNote files to HTML and trying to extract data from HTML.
Kashif,
We are using Aspose.Note Java API to convert OneNote to HTML. If you will look at the attached zip, it having an HTML and OneNote files. The HTML have an Excel table. This table is an image, not a HTML. Meaning, we can’t extract the text from it.
Eliran
@eboraks2017,
We are looking into any such possibility and will update you here in case it is possible for us to implement such a feature.
@eboraks2017,
An investigation has been opened at our end as NOTENET-2662 to look at your specific scenario. We will update you here as soon as we have additional information.
@eboraks2017
Our investigation has revealed the following:
- The embedded Excel file is just a binary file. When Excel file is dropped in the OneNote desktop application, an attached file node is created which is associated with the binary content of Excel file.
- If you want to keep the preview images but remove the embedding, you can do that as they are independent objects.
- Aspose.Note does not convert embedded Excel files. To extract data from the embedded Excel file, you should use Aspose.Cells API.