How do I load a PDF file into Aspose.Words?

A few days after downloading the Total product for evaluation, I received a followup call from a person at Aspose. I explained that all I need to do is extract images from various file formats. I mentioned that the Aspose.Words link on the web site says "comprehensive support of DOC, OOXML, RTF, WordprocessingML, HTML, OpenDocument and PDF formats". I asked if that meant Aspose.Words could read a PDF file and extract its images. She said yes, Aspose.Words can read and process PDF files but cannot write them, so it would not be necessary for me to purchase Aspose.PDF for what I wanted to do. So I purchased Aspose.Words and Aspose.Cells. But now I can't find a way to load a PDF file into Aspose.Words. I've installed the 5.3 version, which appears to be the latest. Can anyone help? Thanks.

Hi.<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" />

Thanks for your inquiry. No Aspose.Words can’t read PDF document. But you can extract images for PDF document using Aspose.Pdf.Kit.

http://www.aspose.com/documentation/file-format-components/aspose.pdf.kit-for-.net-and-java/product-overview.html

Here is manual how you can achieve this.

http://www.aspose.com/documentation/file-format-components/aspose.pdf.kit-for-.net-and-java/extract-image-from-pdf-document.html

Also you can use Aspose.Recognition for converting PDF document to DOC format.

http://www.aspose.com/documentation/file-format-components/aspose.recognition-for-.net/introducing-aspose-recognition.html

But currently Images are not supported. See limitations list.

http://www.aspose.com/documentation/file-format-components/aspose.recognition-for-.net/limitations.html

Hope this helps.

Best regards.

OK, I've installed Aspose.PDF.Kit and asked my boss to order a copy of it. I have a couple more questions. I searched the web site but couldn't find answers to them.

Your example contains the code 'extractor.EndPage = 2'. Most of our PDF files are a lot larger than that. Is there a property that returns the number of pages in the document? Or if I say 'extractor.EndPage = 99999' will that process the entire PDF file without causing a problem?

Using the sample code, I'm getting the error 'Unknown Stream Filter' on a lot of PDF files. The error occurs at the line of code 'extractor.ExtractImage();'. I've processed several thousand PDF files from various sources, and hundreds of them returned this error.

Thanks.

Hi<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" />

Thanks for your request. I move this thread into the Aspose.Pdf.Kit forum. They will answer you shortly.

Best regards.

Hi,

Thank you for considering Aspose.Pdf.Kit.

For the page number issue, please refer to PdfFileInfo.NumberofPages.

For the 'Unknown Stream Filter' error, please provide example PDF files that can reproduce this error and let us test it.