Hi, I’m trying to get an embedded docx file from a docx file.
in the next code i’ll show you how did i try to hold the shape of the docx(inner one):
ArrayList<Shape> docxList = new ArrayList<>();
try
{
// Get all shapes in the document
NodeCollection<Shape> shapes = document.getChildNodes(NodeType.SHAPE, true);
// Loop through the shapes
for (Shape shape : shapes)
{
// Check if it's an OleObject
if (shape.getOleFormat() != null)
{
// Check the file type and add to the corresponding list
String progId = shape.getOleFormat().getProgId();
if (progId.equals("Word.Document.12"))
{
docxList.add(shape);
}
else if (progId.equals("JPEG_PROGID"))
{ // Replace with actual ProgId for .jpeg
jpegList.add(shape);
}
else if (progId.equals("PNG_PROGID"))
{ // Replace with actual ProgId for .png
pngList.add(shape);
}
}
}
for (Shape docxShape : docxList)
{
try
{
trying_docx(document, analysis_data, docxShape);
}
catch (Exception e)
{
// Handle exceptions as needed
e.printStackTrace();
}
}
Now after i got the docx shape i send it to the function “trying_docx”. However, when i try to get the binary data of the shape i get java.lang.NullPointerException.
here is the function:
public void trying_docx(Document document, JSONObject config, Shape oldShape) throws Exception {
// Check if oldShape is null
if (oldShape == null) {
System.out.println("Old shape is null.");
return;
}
// Load the embedded DOCX file from the shape if it has an OleFormat
OleFormat oleFormat = oldShape.getOleFormat();
if (oleFormat == null) {
System.out.println("Shape does not have an OLE format.");
return;
}
// Retrieve the embedded DOCX data
String entryName = "\'x0001CompObj";
byte[] embeddedDocxData = oleFormat.getOleEntry(entryName);
// Handle the embedded DOCX data
// Create a temporary file to hold the embedded DOCX data
File tempFile = File.createTempFile("embedded_docx", ".docx");
try (FileOutputStream fileOutputStream = new FileOutputStream(tempFile)) {
fileOutputStream.write(embeddedDocxData);
}
// Sanitize the embedded DOCX file
File sanitizedFile = recursive_disarmer(tempFile, config);
// Create a new Shape with the sanitized DOCX content
Shape newShape = new Shape(document, ShapeType.OLE_OBJECT);
OleFormat newOleFormat = newShape.getOleFormat();
newOleFormat.setProgId("Word.Document.12");
newOleFormat.setSourceFullName(sanitizedFile.getAbsolutePath());
// Set the dimensions and position of the new shape to match the old shape
newShape.setWidth(oldShape.getWidth());
newShape.setHeight(oldShape.getHeight());
newShape.setLeft(oldShape.getLeft());
newShape.setTop(oldShape.getTop());
// Replace the old shape with the new shape
Node parent = oldShape.getParentNode();
if (parent instanceof CompositeNode) {
((CompositeNode<?>) parent).insertBefore(newShape, oldShape);
oldShape.remove();
} else {
System.out.println("Parent node is not a CompositeNode.");
}
// Delete the temporary files
tempFile.delete();
sanitizedFile.delete();
}
Is there any other way to hold the shape as docx file or maybe other technique to get the binary data?
I know that there is a way to get the binary data as image but this is not what i want