Convert an entire website to PDF

Hi..

Is it possible to use the Aspose.PDF to convert an ENTIRE website (All pages) to a single PDF.

or website 'source' pages HTML to a single PDF

thanks

Hi Jon,

Thanks for using our products.

You may check the following documentation link for details and code snippets as per your requirement.

How to – Convert HTML to PDF using InLineHTML approach

Please do let us know if you need any further assistance.

Thanks & Regards,

Hi..

I don't see how this helps. I need to read multiple linked pages and save them as single PDF. How can that be done?

thx

Hi Jon,

Aspose.Pdf provides functionality How to – Convert HTML to PDF using InLineHTML approach for individual file only but you can achieve this functionality using different approaches according to your website. One approach is mentioned below for your reference.

[C#]

// Instantiate an object PDF class
Aspose.Pdf.Generator.Pdf pdf = new Aspose.Pdf.Generator.Pdf();

// add the section to PDF document sections collection
Aspose.Pdf.Generator.Section section = pdf.Sections.Add();

//Fill datatable with all the html pages path
DataTable dt = FillDataTable();

//iterate this loop for each html
for (int i = 0; i < dt.Rows.Count; i++)
{
// Read the contents of HTML file into StreamReader object
StreamReader r = File.OpenText(dt.Rows[0]["Path"].ToString());

//Create text paragraphs containing HTML text
Aspose.Pdf.Generator.Text text2 = new Aspose.Pdf.Generator.Text(section, r.ReadToEnd());

// enable the property to display HTML contents within their own formatting
text2.IsHtmlTagSupported = true;

//Add the text paragraphs containing HTML text to the section
section.Paragraphs.Add(text2);
}

//Save the pdf document
pdf.Save(@"D:/pdffiles/HTML2pdf.pdf");

Please do let us know if you need any further assistance.

Thanks & Regards,

Hi..

I tried this approach.. but I get the unformatted HTML. Looks like I opened the HTML doc in Word. How can I get the individual HTML to render and added to the doc.

It did add all the page.

thanks in advance

I also tried your sample code - But it just hangs at the SAVE. Any ideas?

// The address of the web URL which you need to convert into PDF format

string WebUrl = "http://en.wikipedia.org/wiki/Main_Page";

// create a Web Request object to connect to remote URL

HttpWebRequest request = (HttpWebRequest)WebRequest.Create(WebUrl);

// set the Web Request timeout

request.Timeout = 10000; // 10 secs

// Retrieve request info headers

HttpWebResponse localWebResponse = (HttpWebResponse)request.GetResponse();

// Windows default Code Page (Include System.Text namespace in project)

Encoding encoding = Encoding.GetEncoding(1252);

// Read the contents of into StreamReader object

StreamReader localResponseStream = new StreamReader(localWebResponse.GetResponseStream(), encoding);

// Instantiate an object PDF class

Aspose.Pdf.Generator.Pdf pdf = new Aspose.Pdf.Generator.Pdf();

// add the section to PDF document sections collection

Aspose.Pdf.Generator.Section section = pdf.Sections.Add();

//Create text paragraphs containing HTML text

Aspose.Pdf.Generator.Text text2 = new Aspose.Pdf.Generator.Text(section, localResponseStream.ReadToEnd());

// enable the property to display HTML contents within their own formatting

text2.IsHtmlTagSupported = true;

// Add the text object containing HTML contents to PD Sections

section.Paragraphs.Add(text2);

// Specify the URL which serves as images database

//pdf.HtmlInfo.ImgUrl = "http://en.wikipedia.org/";

//Save the pdf document

pdf.Save("D:/pdftest/DirectHTML2pdf.pdf");

localWebResponse.Close();

localResponseStream.Close();v

Any ideas?

Thanks

Hi Jon,

The above discussed functionality is available for HTML Tags only, but your requested Web URL is not based on HTML only. In the requested URL, PHP syntax is also available in source. Kindly use pure HTML pages.

Please feel free to contact us in case any further assistance required.

Thanks & Regards,