Word转pdf,如何提取word中的附件,将word和excel类型的附件也转成pdf?

word转pdf,如何提取word中的附件,将word和excel类型的附件也转成pdf?
转pdf的部分代码:

private byte[] convertToPdf(Document document) throws Exception {
        try (ByteArrayOutputStream bos = new ByteArrayOutputStream()) {
            PdfSaveOptions pdfOptions = new PdfSaveOptions();
            pdfOptions.setCompliance(PdfCompliance.PDF_17);
            pdfOptions.setUpdateFields(false);
            document.save(bos, pdfOptions);
            return bos.toByteArray();
        }
    }

aspose-words版本为24.11
aspose-cells版本为25.1

@ztthu

Word → PDF and also convert the embedded Excel files to PDF (Java)

The sample below shows how to:

  1. Convert the main Word document to PDF with Aspose.Words.
  2. Enumerate every embedded object (OLE/Package) in the Word file.
  3. Detect the objects that are Excel workbooks (.xls / .xlsx).
  4. Convert each Excel workbook to PDF with Aspose.Cells.
  5. (Optional) Render the generated Excel‑PDF pages to images and insert them back into the Word document.

Prerequisites

  • Aspose.Words for Java ≥ 24.11
  • Aspose.Cells for Java ≥ 25.1
  • (Optional) Aspose.PDF for Java ≥ 23.10 – only needed if you want to embed the Excel‑PDF as an image.
  • Both JARs must be on the project classpath.

1. Complete code

import com.aspose.words.*;
import com.aspose.cells.*;

import java.io.*;
import java.util.Iterator;

public class WordWithExcelAttachmentsToPdf {

    /** Convert a Word Document to PDF and return the PDF bytes. */
    private static byte[] convertWordToPdf(Document wordDoc) throws Exception {
        try (ByteArrayOutputStream bos = new ByteArrayOutputStream()) {
            PdfSaveOptions pdfOpts = new PdfSaveOptions();
            pdfOpts.setCompliance(PdfCompliance.PDF_17);
            pdfOpts.setUpdateFields(false);          // keep fields as‑is
            wordDoc.save(bos, pdfOpts);
            return bos.toByteArray();
        }
    }

    /** Simple helper – true for .xls or .xlsx extensions. */
    private static boolean isExcel(String ext) {
        return ext != null && (ext.equalsIgnoreCase("xls") || ext.equalsIgnoreCase("xlsx"));
    }

    /** Convert an Excel byte array to PDF and return the PDF bytes. */
    private static byte[] convertExcelToPdf(byte[] excelData) throws Exception {
        try (ByteArrayInputStream bis = new ByteArrayInputStream(excelData);
             ByteArrayOutputStream bos = new ByteArrayOutputStream()) {

            // Aspose.Cells automatically detects the format (xls / xlsx)
            Workbook wb = new Workbook(bis);
            wb.save(bos, SaveFormat.PDF);
            return bos.toByteArray();
        }
    }

    public static void main(String[] args) throws Exception {
        // -------------------------------------------------
        // 1. Load the Word document (DOC or DOCX)
        // -------------------------------------------------
        Document wordDoc = new Document("input.docx");

        // -------------------------------------------------
        // 2. Convert the main document to PDF
        // -------------------------------------------------
        byte[] wordPdf = convertWordToPdf(wordDoc);
        try (FileOutputStream fos = new FileOutputStream("output_word.pdf")) {
            fos.write(wordPdf);
        }

        // -------------------------------------------------
        // 3. Process embedded objects
        // -------------------------------------------------
        EmbeddedObjectCollection embedded = wordDoc.getEmbeddedObjects();
        int excelIndex = 1;   // used for naming the generated PDFs

        // Use an iterator because we may want to remove objects while iterating
        for (Iterator<EmbeddedObject> it = embedded.iterator(); it.hasNext(); ) {
            EmbeddedObject eo = it.next();

            // Raw bytes of the embedded file
            byte[] data = eo.getEmbeddedData();

            // Try to determine the type – extension is the most reliable
            String ext = eo.getFileExtension();               // e.g. "xlsx"
            OLEFileFormat format = eo.getFileFormat();        // e.g. OLEFileFormat.XLSX

            if (isExcel(ext) ||
                format == OLEFileFormat.XLSX ||
                format == OLEFileFormat.XLS) {

                // -------------------------------------------------
                // 4. Excel → PDF
                // -------------------------------------------------
                byte[] excelPdf = convertExcelToPdf(data);

                String pdfName = "attachment_excel_" + excelIndex + ".pdf";
                try (FileOutputStream fos = new FileOutputStream(pdfName)) {
                    fos.write(excelPdf);
                }
                System.out.println("Excel attachment converted to PDF: " + pdfName);

                // -------------------------------------------------
                // 5. (Optional) Insert the PDF as an image back into the Word file
                // -------------------------------------------------
                // Uncomment the block below if you need a visual representation
                /*
                com.aspose.pdf.Document pdfDoc = new com.aspose.pdf.Document(
                        new ByteArrayInputStream(excelPdf));
                com.aspose.pdf.devices.Resolution res = new com.aspose.pdf.devices.Resolution(300);
                com.aspose.pdf.devices.PngDevice pngDev = new com.aspose.pdf.devices.PngDevice(res);
                ByteArrayOutputStream imgOut = new ByteArrayOutputStream();
                pngDev.process(pdfDoc.getPages().get_Item(1), imgOut);

                DocumentBuilder builder = new DocumentBuilder(wordDoc);
                builder.moveToDocumentEnd();
                builder.insertImage(imgOut.toByteArray());
                */

                excelIndex++;
            } else {
                // Save any other embedded file unchanged
                String otherName = "attachment_" + eo.getName();
                try (FileOutputStream fos = new FileOutputStream(otherName)) {
                    fos.write(data);
                }
                System.out.println("Non‑Excel attachment saved: " + otherName);
            }
        }

        // -------------------------------------------------
        // (Optional) Save the modified Word document if you inserted images
        // -------------------------------------------------
        // wordDoc.save("output_word_with_images.docx");
    }
}

Key API reference

Goal API Remarks
Load Word new Document(String path) Handles .doc and .docx.
Enumerate embedded objects Document.getEmbeddedObjects()EmbeddedObjectCollection Iterable collection of EmbeddedObject.
Get raw bytes EmbeddedObject.getEmbeddedData() Returns the original file content.
Identify Excel EmbeddedObject.getFileExtension() or EmbeddedObject.getFileFormat() OLEFileFormat.XLS / XLSX.
Convert Excel → PDF new Workbook(InputStream) + Workbook.save(OutputStream, SaveFormat.PDF) Aspose.Cells automatically recognises the format.
Convert Word → PDF Document.save(OutputStream, PdfSaveOptions) PdfSaveOptions lets you set compliance, field updates, etc.
(Optional) Render PDF page to image com.aspose.pdf.Document + PngDevice Needed only if you want to embed a visual of the Excel‑PDF.
Insert image into Word DocumentBuilder.insertImage(byte[]) Places the image at the current cursor position.

2. Frequently asked questions & tips

Issue Solution
Embedded object appears as “Package” EmbeddedObject.getFileExtension() still returns the correct extension. If it is empty, parse the extension from eo.getName() (e.g., Object1.xlsx).
Excel workbook is password‑protected Use LoadOptions when creating the Workbook:
LoadOptions lo = new LoadOptions(); lo.setPassword("pwd"); Workbook wb = new Workbook(new ByteArrayInputStream(data), lo);
Large number of attachments – performance Create a single PdfSaveOptions instance and reuse it. Re‑using the same ByteArrayOutputStream (clearing it each iteration) also reduces GC pressure.
Remove the original OLE object after conversion wordDoc.getEmbeddedObjects().remove(eo); – this shrinks the final Word file if you no longer need the original Excel object.
Combine all generated Excel PDFs into one file After the loop, load each PDF with Aspose.PDF, append its pages to a master Document, and save the master PDF.

3. High‑level workflow

Word (DOC/DOCX)
   │
   ├─► Aspose.Words → Word PDF (output_word.pdf)
   │
   └─► Enumerate EmbeddedObjectCollection
          ├─ Excel? → Aspose.Cells → Excel PDF (attachment_excel_*.pdf)
          └─ Other → Save original binary

Summary

  • Extract: Document.getEmbeddedObjects()EmbeddedObject.getEmbeddedData()
  • Detect: getFileExtension() / getFileFormat() → Excel?
  • Convert: new Workbook(stream)workbook.save(pdf, SaveFormat.PDF)
  • Persist: Write the PDF bytes wherever you need them; optionally render to an image and embed back into the Word document.

With the code above you can reliably convert a Word file to PDF and turn every embedded Excel workbook into its own PDF, all using Aspose libraries. If you need further customisation (e.g., merging all PDFs, preserving hyperlinks, or batch processing a folder), feel free to ask!

没有 EmbeddedObjectCollection 类 和 getEmbeddedObjects 这个方法啊

@ztthu Aspose.Words 文档对象模型中的嵌入对象通过设置了 Shape.getOleFormat 属性的 Shape 节点表示。因此您可以遍历文档中的所有形状,检查 getOleFormat 属性是否为 null 来识别嵌入的 OLE 对象。有关OLE对象操作的更多信息,请参阅我们的文档:Working with Ole Objects|Aspose.Words for Java

若需将Excel文件转换为PDF,可使用Aspose.Cells。