Problem in Extracting Image From PDF

lwdevs · January 27, 2012, 11:25pm

Hi,

I have two things to know,

1.) I am new for this API.I am using Aspose Beta version for doing PDF operation.I tried for Extracting Image from PDF but i cant get.I attached the PDF for your Ref (Text.pdf). It contains Text and .eps image. Can anyone suggest me to overcome this and proceed in further?

2.) Will aspose give the complete formatting of text either in HTML or in WORD? If so, what package should i implement to do it.

Thanks in Advance.

Regards,
LDev

babar.raza · January 29, 2012, 12:36am

Hi LDev,

Thank you for considering Aspose.

It seems that the target forum for your inquiry is Aspose.Pdf forum, cause current implementation of Aspose.Imaging for .NET does not allow the extraction of Images or Text from a PDF file.

I am moving this thread to the appropriate forum for further assistance. Hopefully soon my colleagues from the concerned department will reply to your inquiry.

Regards.

codewarior · January 29, 2012, 1:42pm

lwdevs:

1.) I am new for this API.I am using Aspose Beta version for doing PDF operation.I tried for Extracting Image from PDF but i cant get.I attached the PDF for your Ref (Text.pdf). It contains Text and .eps image. Can anyone suggest me to overcome this and proceed in further?

Hi Dev,

Thanks for your interest in our products. I am afraid recently our web server encountered an issue where data of some forum posts got lost. Eventhough we managed to recover the data but I think the attached from your recent post is lost. Can you please share the source PDF file so that we can test the scenario at our end. We are really sorry for your inconvenience.

lwdevs:

2.) Will aspose give the complete formatting of text either in HTML or in WORD? If so, what package should i implement to do it.

I am pleased to share that Aspose.Pdf supports the feature to convert source PDF file into HTML as well as in MS Word (.doc) format. Please try using the following code snippets to accomplish your requirements.

[C#]

//Instantiate a Pdf object by calling its empty constructor
Aspose.Pdf.Document document = new Aspose.Pdf.Document("sourcePDF.pdf");
//Save the Pdf file into HTML format
document.Save("sourcePDF.html", Aspose.Pdf.SaveFormat.Html);
//Instantiate a Pdf object by calling its empty constructor
Aspose.Pdf.Document document = new Aspose.Pdf.Document("sourcePDF.pdf");
//Save the Pdf file into HTML format
document.Save("sourcePDF.doc", Aspose.Pdf.SaveFormat.Doc);

codewarior · June 3, 2014, 6:47am

lwdevs: I am new to this API. I am using Aspose Beta version for doing PDF operations. I tried extracting an image from PDF but couldn’t get it. I attached the PDF for your reference (Text.pdf). It contains text and . eps image. Can anyone suggest me to overcome this and proceed further?

Hi,

Thanks for your patience and sorry for the delayed response. I have tested the scenario and am able to reproduce the same problem that the image is not being extracted from the PDF file when using the following code snippet. For the sake of correction, I logged it in our issue tracking system as PDFNEWNET-37026. We will investigate this issue in detail and keep you updated on the status of a correction.

We apologize for your inconvenience.

// Open document
Document pdfDocument = new Document("c:/pdftest/Text.pdf");

// Console.WriteLine("Number of Images =" + pdfDocument.Pages[1].Resources.Images.Count);

// Extract a particular image
JImage xImage = pdfDocument.Pages[1].Resources.Javacsript.AppletImages[1];

FileStream outputImage = new FileStream("c:/pdftest/Text_output.jpg", FileMode.Create);*.

// save output image
xImage.Save(outputImage, System.Drawing.Imaging.ImageFormat.Jpeg);
ouputImage.Close();

codewarior · August 6, 2014, 7:57am

Hi Dev,

Thanks for your patience. We have further investigated the issue PDFNEWNET-37026 reported earlier and it does not seem to be an issue in our API. In fact the source document contains no images and it only contains text and fonts.