How reliable is OLE extraction from RTF files?

Hi,

I am interested in using a few methods from OleFormat to extract OLEs from RTF files. We need it to:

It is critical that the bytes extracted and the file extension are both valid such that when we save that OutputStream to disk it will be possible for the user to open the embedded file using the original application.

Are there any known issues or limitations? Which file types can it reliably handle?

I ask as I noticed we had a bit of prior work in this area and it seems there is some fallback logic for the getSuggestedExtension() method to handle a case where it might return null or “.bin”, both of which aren’t correct. In that case, we use OleFormat#getProgId to determine a suitable fallback extension based on the program ID. For example, “Excel.Sheet.12” == “.xlsx”

Thanks,
Chase

@apatter,

There should not be any problems when extracting OLE objects from RTF files. However, please note that you can get name of object only when the embedded object in Word file is Linked. You will not be able to extract name if the embedded object is displayed as Icon or Content. However, you can get extension of (Icon or Content) object using OleFormat.SuggestedExtension property. Please check the following Java code:

NodeCollection<Shape> shapes = (NodeCollection<Shape>) doc.getChildNodes(NodeType.SHAPE, true);
OleFormat tempOleFormatRef = null;

for (Shape shape : shapes) {
    tempOleFormatRef = shape.getOleFormat();
    if (tempOleFormatRef != null) {
        if (tempOleFormatRef.isLink()) {
            System.out.println("Object is Linked");
            String fileName = tempOleFormatRef.getSourceFullName();
            System.out.println(fileName);
            // tempOleFormatRef.getSuggestedExtension will return empty but you can get it from fileName
        } else if (tempOleFormatRef.getOleIcon()) {
            System.out.println("Object displayed as icon");
            System.out.println(tempOleFormatRef.getSuggestedExtension());
        } else if (!tempOleFormatRef.getOleIcon()) {
            System.out.println("Object displayed as Content");
            System.out.println(tempOleFormatRef.getSuggestedExtension());
        }
    }
}

In case you have any troubles, please provide a sample file containing the problematic OLE object. We will then investigate the scenario on our end and provide you more information.

1 Like