Error when trying to embed a PDF document in a cell

What is the best way to embed a pdf document into a cell in a word template and c#?

I’ve tried using builder.insertImage, but that is failing with ‘cannot insert a null node’ (the code works for ‘other’ image (tif) files)

using aspose version 22.2.0.0

(partially pseudo code)

public void insertImage(string fileName, bool forceResize, int sHeight)
{
   Aspose.Words.DocumentBuilder builder = new DocumentBuilder(AsposeDocument);
   System.Drawing.Image dImage = null;
   System.Drawing.Image newImage = null;
   Aspose.Words.Drawing.Shape image = null;
   int targetHeight = 0;

   try
   {
      dImage = System.Drawing.Image.FromFile(fileName);
      PageSetup ps = builder.CurrentSection.PageSetup;
      if (ps.PageWidth > ps.PageHeight)
      {
         targetHeight = CalculateImageSize(builder, sHeight);
         newImage = cloneImage(dImage, Convert.ToInt32(ConvertUtil.PointToPixel(Cell.CellFormat.Width)), targetHeight);
         image = builder.InsertImage(newImage, Cell.CellFormat.Width, targetHeight);
         image.WrapType = Aspose.Words.Drawing.WrapType.Inline;
         image.BehindText = true;
      }
      else
      {
         image = builder.InsertImage(dImage);
         if (forceResize)
            resizeImage(image, sHeight);
      }
   }
   catch (Exception E)
   {

   }
   Cell.FirstParagraph.AppendChild(image);           
}


public void resizeImage(Aspose.Words.Drawing.Shape img, int sHeight)
{
   // Return if this shape is not an image.
   if (!img.HasImage)
      return;
   //System.Drawing.Image newImage = null;
   // Calculate the free space based on an inline or floating image. If inline we must take the page margins into account.
   PageSetup ps = img.ParentParagraph.ParentSection.PageSetup;
   double freePageWidth = ps.PageWidth - ps.LeftMargin - ps.RightMargin;
   double freePageHeight = ps.PageHeight - ps.TopMargin - ps.BottomMargin - ps.HeaderDistance - ps.FooterDistance;
   freePageHeight = freePageHeight - sHeight;
   Aspose.Words.Drawing.ImageSize size = img.ImageData.ImageSize;
   Boolean exceedsMaxPageSize = size.WidthPoints > freePageWidth || size.HeightPoints > freePageHeight
                   || img.Width > freePageWidth || img.Height > freePageHeight;

   if (exceedsMaxPageSize)
   {
      // Calculate the ratio to fit the page size based on which side is longer.
      Boolean widthLonger = (size.WidthPoints > size.HeightPoints);
      double ratio = widthLonger ? freePageWidth / size.WidthPoints : freePageHeight / size.HeightPoints;
      if (ratio > .90)
         ratio = ratio - .10;
      // Set the new size.
   }
   img.Width = freePageWidth;
}

Hi @conniem,

It is possible to embed PDF files using either DocumentBuilder.InsertOleObjectAsIcon or DocumentBuilder.InsertOleObject

Document doc = new Document();
DocumentBuilder builder = new DocumentBuilder(doc);
builder.StartTable();
Cell cell = builder.InsertCell();
builder.MoveTo(cell.FirstParagraph);
builder.InsertOleObjectAsIcon(@"embedded.pdf", false, null, "This is an embedded PDF");
builder.EndRow();
builder.EndTable();
doc.Save("pdfembeddedindocument.docx");

embedded.pdf (15.7 KB)
pdfembeddedindocument.docx (23.9 KB)

welll, I don’t get an error anymore…
however, I’m also not gettng the PDF file embedded.

my code:
I added a check for file extension, and if it’s pdf, then it does this:

else
{
     builder.MoveTo(m_Cell.FirstParagraph);
     builder.InsertOleObject(fileName, true, false, null);//tried with second parameter as  false
}

I have also tried

else
{
    builder.MoveTo(m_Cell.FirstParagraph);
    builder.InsertOleObjectAsIcon(fileName, true, null, "inserted PDF");   //also toggled second parameter         
}

attached are my results:
Instead of the contents of the pdf file (which I want), I get either an icon (inserted as an icon, I expect that) or an aspose picture. (assuming because the last parameter is null)

what would I set that last parameter to (using insertOleObject
asOle.pdf (28.6 KB)
AsIcon.pdf (27.0 KB)

Hi @conniem,

When a PDF is embedded in a document, Microsoft Word converts the first page of that PDF to an EMF and then inserts this EMF. You can achieve the same with the following code:

Document pdf = new Aspose.Words.Document(@"embedded.pdf");
MemoryStream pdfPreviewEmfStream = new MemoryStream();
ImageSaveOptions opts = new ImageSaveOptions(SaveFormat.Emf);
opts.Scale = 0.2f; // Scale image down, so it fits in the page
opts.PageSet = new PageSet(0); // Select the first page
pdf.Save(pdfPreviewEmfStream, opts);

Document doc = new Document();
DocumentBuilder builder = new DocumentBuilder(doc);
builder.StartTable();
Cell cell = builder.InsertCell();
builder.MoveTo(cell.FirstParagraph);
builder.InsertOleObject(@"embedded.pdf", false, false, pdfPreviewEmfStream);
builder.EndRow();
builder.EndTable();
doc.Save(@"pdfembeddedindocument.docx");

pdfembeddedindocument.docx (23.8 KB)
embedded.pdf (15.7 KB)

better.
next question : best way of resizing, based on the width of the cell (which will be nearly width of page)

I don’t want to force it to fit on a page (because some of the real life pdf files wouldn’t fit on a page), but I do want the width to match.
scaled.pdf (113.7 KB)
notScaled.pdf (115.1 KB)

@conniem, I would like to clarify what you are trying to achieve:

  1. Is the final width of the cell is known before the image resizing ?
  2. Is the one-cell table really necessary? Would a border around image be enough?
  3. If the table is neccessary, do you create it via API or you have a template where the table is already created? Could you attach this template?
  1. Is the final width of the cell is known before the image resizing ?

yes

  1. Is the one-cell table really necessary? Would a border around image be enough?

the one cell table is required. the cell is for placement. also, other image types get placed there (i.e. tif files

  1. If the table is neccessary, do you create it via API or you have a template where the table is already created? Could you attach this template?

bSpecs.zip (8.7 KB)

@conniem,

With the known cell width and the width of PDF’s page you can calculate the scale factor like it is shown in the code given below:

Document doc = new Document(@"bSpecs.dot");
DocumentBuilder builder = new DocumentBuilder(doc);

// Move inside of the first paragraph in the first cell of the first table.
builder.MoveToCell(0, 0, 0, 0);

Cell cell = (Cell)builder.CurrentNode.ParentNode.ParentNode;

double cellWidth = cell.CellFormat.Width;

// Optional: limit the height of the row. The PDF preview image will be truncated in this case.
const double MaxRowHeight = 500.0;
RowFormat rowFormat = cell.ParentRow.RowFormat;
rowFormat.HeightRule = HeightRule.Exactly;
rowFormat.Height = MaxRowHeight;

// Generate PDF preview by converting its first page to an EMF image.
Document pdf = new Document(dataDir + "embedded.pdf");
Aspose.Words.Rendering.PageInfo pageInfo = pdf.GetPageInfo(0);

// Scale the PDF preview, so it fits in the cell
float scaleFactor = (float)(cellWidth / pageInfo.WidthInPoints); 

ImageSaveOptions opts = new ImageSaveOptions(SaveFormat.Emf);
opts.Scale = scaleFactor;
opts.PageSet = new PageSet(0); // Select the first page

MemoryStream pdfPreviewEmfStream = new MemoryStream();
pdf.Save(pdfPreviewEmfStream, opts);

// Embed the PDF
builder.InsertOleObject(@"embedded.pdf", false, false, pdfPreviewEmfStream);
doc.Save(@"pdfembeddedindocument.scaled.docx");

pdfembeddedindocument.docx (30.2 KB)

thank you, that’s what I needed

so, after running tests, this is working fine, if the pdf file has only one page.
my code:

Aspose.Words.Document pdf = new Aspose.Words.Document(fileName);
MemoryStream pdfPreviewEmfStream = new MemoryStream();
ImageSaveOptions opts = new ImageSaveOptions(SaveFormat.Emf);
Aspose.Words.Rendering.PageInfo pageInfo = pdf.GetPageInfo(0);
float scaleFactor = (float)(m_Cell.CellFormat.Width/ pageInfo.WidthInPoints);
opts.Scale = scaleFactor;
opts.PageSet = new PageSet(0); // Select the first page
               
// Calculate the ratio to fit the page size based on which side is longer.
pdf.Save(pdfPreviewEmfStream, opts);
builder.MoveTo(m_Cell.FirstParagraph);
builder.InsertOleObject(fileName, false, false, pdfPreviewEmfStream);

attached is the pdf file, and there result. note the pdf file has 2 pages, but only the first page is showing on the result.
from your initial response:

When a PDF is embedded in a document, Microsoft Word converts the first page of that PDF to an EMF and then inserts this EMF. You can achieve the same with the following code:

any ideas how to insert as multiple pages?

If I cannot embed multiple pages into a cell, is there a way to merge the pdf file another way?

@conniem Could you please create an expected output document in MS Word and provide it here? We will check it and provide you more information.
As an option you can save each page to thumbnail, for example see Document.RenderToScale method.

I worked out how to deal with this, (my delphi codes saves to a tif file, then splits the tiff into separate pages and embeds those). I was just wondering if there was a way to do it automatically, without all the processing I have to do to embed tiff files, or pdf files.

I don’t have a word doc that shows that.

@conniem It is perfect that you managed to achieve what you need. And no, unfortunately, there is no way to automatically to this, you have to render each page of your PDF and then put them together.