Embedded Files in Word document (DOC/DOCX to PDF)

Hey Aspose Team,
I want to convert word documents to PDF and have the requirement to convert embedded files within the word document too. It’s similar to MSG and attachments, but I couldn’t find a way to get to the “attachments” in a word document, could you please help on this?
The code snippets looks as follows:

Document doc = new Document(source);
PdfSaveOptions saveOptions = new PdfSaveOptions();
HandleDocumentWarnings callback = new HandleDocumentWarnings();
saveOptions.setWarningCallback(callback);
doc.save(target, saveOptions);
// get embedded files/attachments from document (need help on this)
// convert those attachments to pdf (no problem)
// merge pdf (no problem)

You can find an example file attached. The embedded file in word-language is an “object”.
Thanks in advance!
Kind regards
Peter

Hi Peter,

Thanks for your inquiry. Please try using the following code snippet:

Document doc = new Document(@"C:\test\embeddedDocument.docx");
// Get collection of shapes
NodeCollection shapes = doc.GetChildNodes(NodeType.Shape, true);
int i = 0;
// Loop through all shapes
foreach(Shape shape in shapes)
{
    if (shape.OleFormat != null)
    {
        if (!shape.OleFormat.IsLink)
        {
            // Extract OLE Word object
            if (shape.OleFormat.ProgId == "Word.Document.12")
            {
                MemoryStream stream = new MemoryStream();
                shape.OleFormat.Save(stream);
                Document newDoc = new Document(stream);
                newDoc.Save(string.Format(@"C:\test\outEmbeded_{0}.pdf", i));
                i++;
            }
            // Extract OLE Excel object
            if (shape.OleFormat.ProgId == "Excel.Sheet.12")
            {
                // Here you can use Aspose.Cells component
                // to be able to convert MS Excel files to PDF
            }
        }
        else
        {
            string filePath = shape.OleFormat.SourceFullName;
            Document newDoc = new Document(filePath);
            newDoc.Save(string.Format(@"C:\test\outLinkedEmbeded_{0}.pdf", i));
            i++;
        }
    }
}
doc.Save(@"C:\test\out.pdf");

I hope, this will help.

Best Regards,

Hello Awais,
thank you for your reply. The code snippet looks like C#, I will try to find a way in Java… it seems that the interface and classes are different…
Kind regards
Peter

Hello Awais,
thanks again for your help, I was able to write it down in Java.

Document doc = new Document("");
@SuppressWarnings("unchecked")
NodeCollection shapes = doc.getChildNodes(NodeType.SHAPE, true);
for (int i = 0; i <shapes.getCount(); i++)
{
    Shape shape = (Shape) shapes.get(i);
    OleFormat oleFormat = shape.getOleFormat();
    if (oleFormat != null)
    {
        if (!oleFormat.isLink() && oleFormat.getOleIcon())
        {
            String progId = oleFormat.getProgId();
            // TODO: convert due to progId
        }
        else
        {
            String filePath = oleFormat.getSourceFullName();
            // TODO: convert due to file type
        }
    }
}

Is there a list off all progId’s with a refernce to their file extension?
King regards
Peter

Hi Peter,

Thanks for your inquiry.

ProgID stands for "programmatic identifier. I am afraid, there is no such list. The ProgID is stored in document binary as string. We just extract it from there. But I can provide a list of OLE ProgID found in our test documents:

  • MS_ClipArt_Gallery
  • Equation.3
  • WPGraphic21
  • MIDFile
  • MSGraph.Chart.8
  • PBrush
  • MSPhotoEd.3
  • Excel.Sheet.8
  • Excel.Sheet.12
  • WordPad.Document.1
  • Package
  • Word.Document.8
  • Word.Document.12
  • Visio.Drawing .11

May be you should create your own program to list all the programmatic identifiers for the OLE objects embedded in your document.

Best Regards,

Hello Awais,
thank you for your reply.
I noticed that checking the oleFormat with:

if (!oleFormat.isLink() && oleFormat.getOleIcon())
{

is better because Shapes that are already displayed with their content are getting converted with the document itself and don’t need an extra conversion. I will edit the code in my previous reply.
I have also noticed that remarks/comments in Word documents are not included in conversion to pdf… is there a way to enable it? I need 1:1 conversion as much as possible.
Kind regards
Peter

Hi Peter,

Thanks for the additional information.

Peter:
I have also noticed that remarks/comments in Word documents are not included in conversion to pdf… is there a way to enable it? I need 1:1 conversion as much as possible.

Aspose.Words mimics the bahaviour of MS Word and I am able to render comments in Word document to PDF on my side using Aspose.Words v11.6.0. Could you please attach your input document here for testing? I will investigate the issue on my side and provide you more information.

Best Regards,

Hello Awais,
thank you for your reply. You can find the document attached to this message. I’m not exactly sure what the red text is but it does not appear in the pdf.
Thank you in advance!
Kind regards
Peter

Hi Peter,

Thanks for your inquiry. The Text in Red is actually marked as ‘Hidden Text’. In MS WORD 2007, you can turn off/on this feature by doing the following steps:

  1. Click on the ‘Office Button’.
  2. Click on ‘Word Options’ button.
  3. Select the ‘Display’ tab on the left.
  4. Check/Un-check ‘Hidden Text’ option.

Moreover, to be able to render that hidden text to PDF, please try using the following code snippet:

Document doc = new Document(@"C:\test\comments.docx");
foreach(Run r in doc.GetChildNodes(NodeType.Run, true))
r.Font.Hidden = false;

doc.Save(@"C:\test\out.pdf");

I hope, this will help.

Best Regards,

Hey Awais,
thank you for your helpful reply. I didn’t expect it to be hidden text… I wrote it down in Java and it gets converted now, thank you. Here is the Java code snippet:

private void showHiddenText(Document document, boolean show)
{
    @SuppressWarnings("unchecked")
    NodeCollection runs = document.getChildNodes(NodeType.RUN, true);
    for (Run run: runs)
    {
        run.getFont().setHidden(!show);
    }
}

Have a nice day!
Kind regards
Peter

Hi
Peter,

Thanks for the additional information. Please let us know any time you have any further queries.

Best Regards,