Hi,
I am using Aspose.Word for extracting OLE objects from a DOCX file.After extracting the OLE file, I got the error "Excel can not open the file *.xlsx because the file format or extension is not valid. Verify that file has not been corrupted and that the file extension matches the format of the file."I have tried everything like changing the extension of the file.Please help me in finding why the excel file is corrupt.
Thanks in advance.
Krishna
Hi
Krishna,
Thanks for your inquiry. Could you please attach your input Word document here for testing? I will investigate the issue on my side and provide you more information.
Best Regards,
Hi Krishna,
Thanks for providing the sample document. I would suggest you please open your DOCX file using MS WORD and ‘Save As’ again to DOCX format. Then please try loading this newly saved DOCX document using Aspose.Words for extracting Excel file by using the following code snippet:
Document doc = new Document(@"c:\test\New.docx");
// Get collection of shapes
NodeCollection shapes = doc.GetChildNodes(NodeType.Shape, true);
int i = 0;
// Loop through all shapes
foreach(Shape shape in shapes)
{
if (shape.OleFormat != null)
{
// Get extension of the OLE object.
string ext = "object";
switch (shape.OleFormat.ProgId)
{
case "Excel.Sheet.8":
ext = "xls";
break;
case "Excel.Sheet.12":
ext = "xlsx";
break;
}
shape.OleFormat.Save(String.Format(@"C:\test\out_{0}.{1}", i, ext));
i++;
}
}
I hope, this will help.
Best Regards,
Hi Krishna,
Thanks for your request. The code provided in the thread mentioned by Awais is obsolete a bit. It will work, but the current API provides a more convenient way to determine extension of the file. Here is updated code:
// Open document
Document doc = new Document(@"Test091\in.doc");
// OLE embedded objects are available through OleFormat property of Shape object
// So first we should extract shapes from the document
NodeCollection shapes = doc.GetChildNodes(NodeType.Shape, true);
// Index will be used to generate unique name for each extracted object
int oleIndex = 0;
// Define variable that specifies base path
string basePath = @"Test091\obj_{0}.{1}";
// Loop through all shapes
foreach(Shape shape in shapes)
{
// Check whether shape contains OLE object
if (shape.OleFormat != null)
{
shape.OleFormat.Save(String.Format(basePath, oleIndex, shape.OleFormat.SuggestedExtension));
// Increase index
oleIndex++;
}
}
Best regards,
Hi alexey,
Thanks for the reply. After following the method you suggested, Excel is able to open the file. But I have millions of such documents. How is it possible to open each document and save it as *.docx file? Is not there any way to do this task programmatically using C#.
Regards,
Krishna
Hi Krishna,
Thanks for your inquiry. In this case, the behavior of Aspose.Words is correct and I do not think that Aspose.Words is causing the problem while extracting the embedded Package from your document. I suppose there is a lack of information in the original file. Please see the following screen shot:
``
The OLE object is just a Package but not an Excel sheet. In this case, Aspose.Words is correctly returning what it reads from the document. Moreover, when you open/save with MS Word, it fills the gaps and then Aspose.Words can read type of the object as ‘Excel.Sheet.12’ or ‘Excel.Sheet.8’. Also, I am afraid, I can’t suggest you any programmatic work around.
If we can help you with anything else, please feel free to ask.
Best Regards,
Hi Awais,
Thanks for the reply.I am able to extract OLE objects with right format using other method, without using Aspose.Words with the same document as input.I think ProgId is just for checking the extension of the OLE object.I use other methods to know the extension of OLE object.So I have used the ProgId as Package here to handle all types of OLE objects.How can you say that Aspose.Words is behaving correctly if I am able to extract the OLE object from the same document using other methods?
Best regards,
Krishna
Hi Krishna,
Thanks for the additional information. But, you should not that Aspose.Words reads the information present in the document and based on this information it builds the DOM in memory. In my previous post, you can see that originally the ‘ProgId’ was set to ‘Package’. When you manually open/save this document with MS Word, it fixes this problem and as seen in the below screen shot the ‘ProgId’ becomes ‘Excel.Sheet.12’.
``
Please let me know if I can be of any further assistance.
Best Regards,
Hi Awais,
Many many thanks for the solution.Now I am able to extract OLE objects by changing ‘ProgId’ to “Excel.Sheet.12” using your product “Aspose.Word”.
I am really very happy for your support.
Regards,
Krishna
Hi Krishna,
Thanks for your feedback. It is perfect that you managed to achieve what you were looking for. Please let us know any time you have any further queries.
Best Regards,
Hi Awais,
Currently I am facing a problem while extracting embedded object which are created using “Object from file” instead of directly creating “Object”.The excel worksheet successfully opens, but no workbook is seen.While I manually change it’s visibility from workbook.xml\workbook\bookViews\workbookView\visibility = “hidden” to "visible"or “veryHidden”, the contents are seen.Do you have any idea? Manually changing this attribute is somehow impossible in case of large no. files.
Please find attached file.
Regards,
Krishna
Hi Krishna,
Thanks for your inquiry.
I managed to reproduce this issue on my side. I have logged this issue in our bug tracking system as WORDSNET-6032. Your request has been linked to this issue and you will be notified as soon as it is resolved.
Sorry for the inconvenience.
Best Regards,
The issues you have found earlier (filed as WORDSNET-6032) have been fixed in this .NET update and this Java update.
This message was posted using Notification2Forum from Downloads module by aspose.notifier.