Using latest Aspose.Cells v20.11.0
Create a new xls workbook
Insert - Object - Create From File
Tick Display as Icon (don’t click link to file - this is an embedded PDF)
Change Icon - Edit Caption to sample.pdf (rather than full file path)
Save workbook
Now in Aspose, examine worksheet.OleObjects
var ole = worksheet.OleObjects[0];
Console.WriteLine(ole.Label); // This is incorrect, shows "oleObject1.bin". No property shows caption entered against OLE object in Excel
ProgID is “Acrobat Document” (correct) and ObjectData bytes are also the original PDF bytes.
has_pdf_and_word.zip (154.1 KB)
@sfackrell2,
Could you share your sample Excel file, we will check it soon.
PS. please zip the file prior attaching.
Added attachment to the original post. Thanks in advance
This happens for all embedded OLE objects and you get the following pattern:
Embedded OLE PDF => Label shows as “oleObject1.bin”, “oleObject2.bin” etc (ProgID=Acrobat Document)
Embedded OLE VSD => Label shows as “oleObject1.bin”, “oleObject2.bin” etc (ProgID=Visio)
Embedded OLE PPT => Label shows as “Microsoft_PowerPoint_Presentation1.pptx” (ProdID=Presentation)
Embedded OLE Word => Label shows as “Microsoft_Word_97_-_2003_Document1.doc” (ProgID=Document)
All of these objects have a caption with the actual file name in it which you can see if you look at the worksheet. That file name is getting lost.
@sfackrell2,
After an initial test, I am able to reproduce the issue as you mentioned by using the following sample code with your template file. I found Ole Object label is not retrieved correctly in Excel sheet. The label for PDF embedded object is retrieved as “oleObject1.bin” whereas for other file formats, it also does not give correct label/text.
e.g.
Sample code:
Workbook workbook = new Workbook("e:\\test2\\has_pdf_and_word.xlsx");
Worksheet worksheet = workbook.Worksheets[0];
int idxOle = 0;
OleObject objOle = worksheet.OleObjects[idxOle];
Console.WriteLine(objOle.ObjectData.ToString());
Console.WriteLine(objOle.ProgID);
Console.WriteLine(objOle.ObjectSourceFullName);
Console.WriteLine(objOle.Label);
Console.WriteLine(objOle.Text);
idxOle = 1;
objOle = worksheet.OleObjects[idxOle];
Console.WriteLine(objOle.ObjectData.ToString());
Console.WriteLine(objOle.ProgID);
Console.WriteLine(objOle.ObjectSourceFullName);
Console.WriteLine(objOle.Label);
Console.WriteLine(objOle.Text);
I have logged a ticket with an id “CELLSNET-47752” for your issue. We will look into it soon.
Once we have an update on it, we will let you know.
@sfackrell2,
Please unzip the template file, you will find PDF embedded object is saved oleObject1.bin. Moreover, FileFormatType.Pdf will be returned from objOle.FileFormatType in the next fix
Hi @ahsaniqbalsidiqui, re.
“Please unzip the template file”
I’m not sure what you mean?
@sfackrell2,
Please use some zip tool (e.g. WinRAR) to open your XLSX file into it (or extract your XLSX file). Then find the embedded object (.bin file) into its specified folder, see the screenshot (attached) for your reference.
sc_shot1.png (89.5 KB)
@sfackrell2,
Please try our latest version/fix: Aspose.Cells for .NET v20.11.7 (attached)
Please note, FileFormatType.Pdf will be returned from objOle.FileFormatType if the embedded file is PDF.
Let us know your feedback.
Aspose.Cells20.11.7 For .Net2_AuthenticodeSigned.Zip (5.4 MB)
Aspose.Cells20.11.7 For .Net4.0.Zip (5.4 MB)
Hi @Amjad_Sahi, thank you for your quick response and your updated version v20.11.7. I have just re-run my integration test and I can see that the .pdf file extension is now correctly showing instead of .bin but the Label property should be showing “sample_content.pdf” not “oleObject1.pdf”. If you open the xls you will see these labels (also screenshot below). The text in these labels is what I’m after and they don’t appear to be mapped so I can’t access them through any property.
has_pdf_and_word.PNG (13.4 KB)
@sfackrell2,
We could not return the text “sample_content.pdf” because the text is in the image.
Wow. I’m amazed Microsoft only store it as an image but I searched for the byte array of the string with encoding and can’t see it anywhere so you are absolutely right. We will have to OCR it. Thank you
@ahsaniqbalsidiqui, are you sure about this because how is it working correctly in Aspose.Words if that’s the case?
I’m surprised given the similar OLE Object interface in Word and Excel that Word would store it as text that you can read and Excel as an image. The objects look identical and the way you add OLE Objects is identical in Word and Excel. Can you not use the same method used in Aspose.Words to expose the caption in Aspose.Cells?
Reference the attached sample, here is the code showing you can read the OLE embedded pdf’s caption as text
var doc = new Aspose.Words.Document(path);
var first = (Shape)doc.GetChildNodes(NodeType.Shape, true).First();
Console.WriteLine(first.OleFormat.IconCaption); // sample_pdf_content_with_long_name_not_in_image.pdf
sample_content_with_pdf.zip (142.0 KB)
@sfackrell2,
We have noted your comments with the ticket and will share our feedback after detailed analysis.
@sfackrell2,
It’s different between MS Excel and MS Word.
MS Word stores them in emf image and you can see them in MS Excel with the following operations:
“Right click”=>"Adobe Acrobat object " ->“Convert”=> “Change icon”.
But we could not find them in emf image stored by MS Excel, they are not visible with above operations.
The issues you have found earlier (filed as CELLSNET-47752) have been fixed in this update. This message was posted using Bugs notification tool by simon.zhao