Free Support Forum - aspose.com

PDF attachment extraction corrupted for Slides

Hello Team,
I am using aspose-slides-18.7-jdk16.jar for PowerPoint files for extracting embedded files and attachments from ppt/pptx files. All good except for Pdf files. It’s getting corrupted after extraction. Please let me know how can I resolve it.
Below is my code snippet

		for(File file : inputFiles){

			Presentation pres = new Presentation(file.toString());

			//Access the first slide
			for(int i=0;i<pres.getSlides().size();i++)
			{
				ISlide sld = pres.getSlides().get_Item(i); 
				for(int j=0;j<sld.getShapes().size();j++)
				{
					IShape shape = sld.getShapes().get_Item(j);
					if(shape instanceof OleObjectFrame)
					{
						//Cast the shape to OleObjectFrameEx
						OleObjectFrame oof = (OleObjectFrame)shape;
						System.out.println(String.format("%s\t\t\t\t%s",oof.getObjectName(),oof.getObjectProgId()));
						String FileType="";
						if(oof.getObjectName().equals("Worksheet") &&oof.getObjectProgId().equals( "Excel.Sheet.12"))
						{
							FileType=".xlsx";
						}
						else if(oof.getObjectName().equals("Worksheet") &&oof.getObjectProgId().equals("Excel.Sheet.8"))
						{
							FileType=".xls";
						}
						else if(oof.getObjectName().equals("Document") &&oof.getObjectProgId().equals("Word.Document.12"))
						{
							FileType=".docx";
						}
						else if(oof.getObjectName().equals("Document") &&oof.getObjectProgId().equals("Word.Document.8"))
						{
							FileType=".doc";
						}
						else if(oof.getObjectName().equals("Presentation") &&oof.getObjectProgId().equals("PowerPoint.Show.12"))
						{
							FileType=".pptx";
						}
						else if(oof.getObjectName().equals("Presentation") &&oof.getObjectProgId().equals("PowerPoint.Show.8"))
						{
							FileType=".ppt";
						}
						else if(oof.getObjectName().equals("Acrobat Document") &&oof.getObjectProgId().equals("AcroExch.Document.11"))
						{
							FileType=".pdf";
						}
						else
						{
							FileType=".txt";
						}
						if (oof != null)
						{
							FileOutputStream fstr;
							fstr = new FileOutputStream("E:\\Project\\output\\"+file.getName()+"_"+Integer.toString(i)+" OleIndex_"+Integer.toString(j)+FileType);
							byte[] buf = oof.getObjectData();
							int len=buf.length;
							fstr.write(buf, 0, buf.length);
							fstr.flush();
							fstr.close();
							//System.out.println("Excel OLE Object written as excel1.xls file");
						}
					}
				}
			}
		}

@batuck420,

I have observed the sample code shared by you and request you to please share the source presentation along with extracted corrupt information with us. We will investigate the issue further on our end on provision of requested information.

Hello,

I have attached all the required files please update on the same.

Files.zip (178.0 KB)

Also I am using the latest aspose-slides-18.8-jdk16.jar for operations related to ppt/pptx.

@batuck420,

Thank you for sharing the information. Actually, the data which is contained in
OleObjectFrame.getObjectData() method is not a genuine PDF document. Please note that embedded documents are not stored in presentation as a document itself, but as an OLE container. We don’t have a public API which will allow extracting this stream and using as ready-to-use Excel or other document, but it can be easily achieved using third-party open source library, such as OpenMcdf .
I suggest you to please visit the following thread link for possible option and your convenience. Unfortunately, presently there is no known third party tool for Java in our knowledge.

Thanks for your input,
Is there any way apart from OleObjectFrame.getObjectData() method to get the pdf attachments from presentation files.

@batuck420,

I like to share that the said method contains the information about OLE Object data and there is no other option.