Get PDF content from URL

Hi,
I want to get the contents of PDF (as a plain text or along with html tags for style information) from a URL ( for e.g., http://www.analysis.im/uploads/seminar/pdf-sample.pdf) and want in a String variable. Could anybody help using Aspose PDF - java. Thanks in advance.


Regards,
Praveen J

Hi Praveen,

Thanks for contacting support.

As per my understanding, you need to download PDF file contents from URL and then render it as HTML format. If so is the requirement, then you can get the PDF file from URL and can use Aspose.Pdf for Java to convert PDF contents to HTML format. For more information, please visit Convert PDF to HTML format

In order to get file from URL, you may consider using code as specified below.

[Java]

java.net.URL url = **new** java.net.URL(“http://www.analysis.im/uploads/seminar/pdf-sample.pdf”); <o:p></o:p>

String path = "c:/pdftest/TempFile.pdf";

java.io.File file = **new** File(path);

file.deleteOnExit();

org.apache.commons.io.FileUtils.*copyURLToFile*(url,
file);

Hi,

While exporting to HTML, I am getting the image copy of each pages embedded in html code. I need the actual content in with tags like

,, etc. Also i need code to extracting the metadata of that pdf.



Regards,
Praveen J

Hi Praveen,


Do you mean each page appears as Base64 image inside HTML when rendering the input PDF (from URL) to HTML format ? Please share some further details and code snippet, so that we can test the scenario in our environment.

S it embeds the image.


I have found solution in using PDFBox API.

PDDocument pddDocument = PDDocument.load(new URL(pdfUrl));
PDFText2HTML stripper = new PDFText2HTML(“UTF-8”);
String pdfContent = stripper.getText(pddDocument);

I hope Aspose Pdf too capable of extracting the text with html tags.

Regards,
Praveen J

Hi Praveen,


In order to generate image and related resource files over system directory during conversion, please try using NoEmbedding value from HtmlSaveOptions.PartsEmbeddingModes enumeration, instead of EmbedAllIntoHtml value. When using this approach, the images are saved in separate folder and they are referenced inside HTML file inside tag.

Now concerning to removing metadata, please visit the following link for further details Remove Metadata from PDF.

Should you have any further query, please feel free to contact.

PDDocument pddDocument = PDDocument.load(new URL(pdfUrl));


what is this PDDocument?

prabhuram.rethinam:
PDDocument pddDocument = PDDocument.load(new URL(pdfUrl));

what is this PDDocument?
Hi Prabhuram,

The above stated code line is for competitor product shared by Praveen (who initiated this thread). However we recommend you to please try using our API and in case you encounter any issue, please feel free to contact.

all i want to do is just display a PDF… How to do this? Does aspose has a viewer? My usecase is my app has several links when one of those PDF links are clicked, need to show the PDF within the android app. Is this even possible because the more I dig this product I guess it’s used only for editing the PDF from local SD not displaying it.

prabhuram.rethinam:
all i want to do is just display a PDF… How to do this? Does aspose has a viewer? My usecase is my app has several links when one of those PDF links are clicked, need to show the PDF within the android app. Is this even possible because the more I dig this product I guess it’s used only for editing the PDF from local SD not displaying it.
Hi Prabhuram,

Thanks for sharing the details.

We have an API named Aspose.Pdf for Android which provides the feature to create as well as manipulate existing PDF files over Android platform. But I am afraid it does not support the feature to display PDF document However as a workaround, you may consider converting PDF pages to Image format and then display the images in image control. For more information, please visit Convert PDF Document to Specified Images

Now concerning to the query on accessing PDF file from link, the query has been answered in your other forum thread.