Converting specific pages from a Word document into a new PDF

I need the ability to take the first few pages (anywhere from one to five, for example) of a Word document, and programmatically convert it into a new PDF document on the server.
Our current version of ASPOSE.Words for ASP.NET expired back in 2009, so we may need to upgrade to the newest one for this to work.
Can you give me some examples of how to accomplish this?
Thanks,
Paul

Hi Paul,
Thanks for your request. Sure, you can achieve this using Aspose.Words. Please see the following code example:

// Open document.
Document doc = new Document("in.doc");
// Save all pages of the docuemnt as separete PDFs.
for (int pageIndex = 0; pageIndex <doc.PageCount; pageIndex++)
{
    PdfSaveOptions options = new PdfSaveOptions();
    options.PageIndex = pageIndex;
    options.PageCount = 1;
    doc.Save(string.Format("out_{0}.pdf", pageIndex), options);
}

Hope this helps.
Best regards,

Thank you Alexey, but I’m actually trying to create ONE pdf document which consists of the first few pages of a Word document. Your code will create a new PDF document for each page (I think).
I guess I need to first create a new Word document with the pages I want, and then create the PDF from the new Word document.
Will this work with my version of ASPOSE.Word?

Hi
Thanks for your request. Please see the following code that shows how to convert the first 3 pages of the document to PDF:

// Open document.
Document doc = new Document("in.doc");
// Save first 3 pages of the docuemnt as separete PDF.
PdfSaveOptions options = new PdfSaveOptions();
// Specify start page.
options.PageIndex = 0;
// Specify nubmer of pages.
options.PageCount = 3;
doc.Save("out.pdf", options);

Best regards,

I am getting an error.
“The type or namespace ‘PdfSaveOptions’ could not be found”
Also, I am unable to add the namespace "Aspose.Words.Saving".
Is this because I am using an older version? When was this class added?

Hi
There were a few breaking changes to the API in version 9.2/9.5 You can find full details of these changes in the migration article:
https://docs.aspose.com/words/net/aspose-words-for-net/
PdfOptions was replaced with PdfSaveOptions.
Best regards,

I downloaded the latest version and I’m getting an error while converting a Word document to PDF:
Unrecognized image type encountered during DOCX export
I have attached the Word document that I am trying to convert (I’m only converting the first few pages of it. I think it’s only the table of contents too).

Hello
Thanks for your request. I cannot reproduce the problem on my side using the latest version of Aspose.Words (10.2.0) and the following code for testing:

Document doc = new Document("C:\\Temp\\Kane_International_Tax_Spring_2011.docx");
doc.Save("C:\\Temp\\out.pdf");

or the following code:

Document doc = new Document("C:\\Temp\\Kane_International_Tax_Spring_2011.docx");
PdfSaveOptions options = new PdfSaveOptions();
options.PageCount = 1;
for (int pageIndex = 0; pageIndex <doc.PageCount; pageIndex++)
{
    string outputFileName = string.Format("{0}\\{1}_{2}.pdf", "C:\\Temp", "Test", pageIndex + 1);
    options.PageIndex = pageIndex;
    doc.Save(outputFileName, options);
}

Please try removing reference to the old version of Aspose.Words and add reference to the new version.
Best regards,

How do I remove the reference? I changed the DLL and XML and LIC files and then recompiled. Isn’t that enough? This is a website project, not a web application project.

It looks like the error is happening BEFORE I try to do the PDF conversion. Maybe my old code needs to be modified due to the newer version of your software. Here is my code. The error happens during save (“Unrecognized image type encountered during DOCX export.Unrecognized image type encountered during DOCX export.”):

Document doc = new Document("C:\Kane_International_Tax_Spring_2011.docx");
DocumentBuilder builder = new DocumentBuilder(doc);
builder.PageSetup.DifferentFirstPageHeaderFooter = false;
builder.PageSetup.OddAndEvenPagesHeaderFooter = false;
builder.MoveToHeaderFooter(HeaderFooterType.HeaderPrimary);
builder.ParagraphFormat.Alignment = ParagraphAlignment.Center; 
builder.Font.Name = "Arial";
builder.Font.Underline = Underline.Single;
builder.Font.Size = 10;

builder.Writeln("Downloaded From mywebsite.com");

doc.BuiltInDocumentProperties["Title"].Value = ""; 
doc.BuiltInDocumentProperties["Subject"].Value = "";
doc.BuiltInDocumentProperties["Author"].Value = "";
doc.BuiltInDocumentProperties["Manager"].Value = "";
doc.BuiltInDocumentProperties["Company"].Value = "mywebsite.com";
doc.BuiltInDocumentProperties["Category"].Value = "";
doc.BuiltInDocumentProperties["Keywords"].Value = "";
doc.BuiltInDocumentProperties["Comments"].Value = "";
doc.BuiltInDocumentProperties["LastSavedBy"].Value = "mywebsite.com";
doc.Save("C:\temp\NewDoc.docx");

Once I save it with the new properties, I will then save a PDF version consisting of the first 5 pages.

Hi
Thank you for reporting this problem to us. I managed to reproduce the problem on my side. Your request has been linked to the appropriate issue. You will be notified as soon as it is resolved.
Best regards,

Thanks for the update. I look forward to the fix.

Any idea when this issue will be resolved?

Hello
Thanks for your request. At the moment this issue is pending for analysis. The responsible developer will analyze the issue and we will be able to provide you an estimate.
Best regards,

Still waiting here. I kind of need this asap please!
Thanks

Hi
Thanks for your request. Unfortunately, the issue is still unresolved. I asked the responsible developer to take a look at this issues shortly. We will keep you informed regarding the status of this issue and let you know once it is resolved.
Best regards,

It’s been a month now, and I’m still waiting. If this isn’t resolved in the next few days I will have to find an alternative solution from someone else.

Hi
Thanks for your request. Unfortunately, the issue is still unresolved. We will let you know once it is fixed.
As a temporary workaround, I can only suggest remove Shape and DrawingML nodes with unknown image types. Please see the code below:

Document doc = new Document(@"Test001\test.docx");
// Get all shapes.
Node[] shapes = doc.GetChildNodes(NodeType.Shape, true).ToArray();
foreach(Shape shape in shapes)
{
    // Remove all shapes with unknown image format.
    if (shape.ImageData != null && shape.ImageData.ImageType == ImageType.Unknown)
        shape.Remove();
}
// Do the same for DrawingML.
Node[] dmls = doc.GetChildNodes(NodeType.DrawingML, true).ToArray();
foreach(DrawingML dml in dmls)
{
    // Remove all shapes with unknown image format.
    if (dml.ImageData != null && dml.ImageData.ImageType == ImageType.Unknown)
        dml.Remove();
}
doc.Save(@"Test001\out.docx");

Best regards,

This works, but it results in a Word document that is missing key graphics. All we are trying to do is add a header, and then re-save it. Then, once that works, save the first few pages as a PDF.
But, as you know, the original re-saving with the header is not working due to those graphic elements (which only appear in this particular document, but still shouldn’t pose a problem).

Hello
Thank you for additional information. Currently I cannot suggest you any other way to work this problem around. You should just wait for the fix of the original issue. We will be sure to inform you of any developments regarding this issue.
Best regards,