Error while converting html page to pdf

asadanwar · February 10, 2017, 12:43am

Hi Team,

I am trying to convert html page using url to pdf but this exception occures-

An unhandled exception of type ‘System.ArgumentException’ occurred in Aspose.Pdf.dll

Additional information: At most 4 text fragments can be added in evaluation mode.

Code:

WebRequest request =WebRequest.Create(“https://www.sec.gov/Archives/edgar/data/320193/000119312517003764/d307349ddefa14a.htm”);

// If required by the server, set the credentials.

request.Credentials = CredentialCache.DefaultCredentials;

HttpWebResponse response = (HttpWebResponse)request.GetResponse();

Stream dataStream = response.GetResponseStream();

StreamReader reader = new StreamReader(dataStream);

string responseFromServer = reader.ReadToEnd();

reader.Close();

dataStream.Close();

response.Close();

MemoryStream stream = new MemoryStream(System.Text.Encoding.UTF8.GetBytes(responseFromServer));

HtmlLoadOptions options = new HtmlLoadOptions(“https://www.sec.gov/Archives/edgar/data/320193/000119312517003764/”);

Document pdfDocument = new Document(stream, options);

options.PageInfo.IsLandscape = true;

pdfDocument.Save( @“C:\HTMLtoPDF_DOM.pdf”);

tilal.ahmad · February 10, 2017, 11:53am

Hi Asad,

Thanks for your inquiry. Aspose.Pdf evaluation version has two limitations, evaluation watermark and at most four elements of any collection can be viewed. Please make a request for 30 days temporary license to evaluate our product without any limitation, it will resolve the issue.

Please feel free to contact us for any further assistance.

Best Regards,

asadanwar · February 12, 2017, 11:33pm

Hi Team,

I am converting webpage using below ‘url’ in code but containing image in webpage is not converted in output pdf.

Aspose.Pdf.License license = new Aspose.Pdf.License();

license.SetLicense(“Aspose.Pdf.lic”);

WebRequest request = WebRequest.Create(url);

// If required by the server, set the credentials.

request.Credentials = CredentialCache.DefaultCredentials;

// time out in miliseconds before the request times out

// request.Timeout = 100;

// Get the response.

HttpWebResponse response = (HttpWebResponse)request.GetResponse();

// Get the stream containing content returned by the server.

Stream dataStream = response.GetResponseStream();

// Open the stream using a StreamReader for easy access.

StreamReader reader = new StreamReader(dataStream);

// Read the content.

string responseFromServer = reader.ReadToEnd();

reader.Close();

dataStream.Close();

response.Close();

MemoryStream stream = new MemoryStream(System.Text.Encoding.UTF8.GetBytes(responseFromServer));

HtmlLoadOptions options = new HtmlLoadOptions(“https://www.sec.gov/Archives/edgar/data/320193/000119312517003764/”);

// Load HTML file

Document pdfDocument = new Document(stream, options);

options.PageInfo.IsLandscape = true;

// Save output as PDF format

pdfDocument.Save(@“C:\HTMLtoPDF_DOM.pdf”);

asad.ali · February 13, 2017, 10:29am

Hi Asad,

Thanks for sharing more details. I have tried to generate PDF from URL by using the code snippet which you have shared and was able to generate an output containing all images present in the URL. I have modified your code snippet a little (i.e I have set ExternalResourcesCredentials field of HtmlLoadOptions).

Please check the following code snippet and also the attached output file generated by the code.

System.Net.WebRequest request = System.Net.WebRequest.Create(“https://www.sec.gov/Archives/edgar/data/320193/000119312517003764/d307349ddefa14a.htm”);<o:p></o:p>
// If required by the server, set the credentials.
request.Credentials = System.Net.CredentialCache.DefaultCredentials;
// time out in miliseconds before the request times out
// request.Timeout = 100;
// Get the response.
System.Net.HttpWebResponse response = (System.Net.HttpWebResponse)request.GetResponse();
// Get the stream containing content returned by the server.
Stream dataStream = response.GetResponseStream();
// Open the stream using a StreamReader for easy access.
StreamReader reader = new StreamReader(dataStream);

// Read the content.
string responseFromServer = reader.ReadToEnd();
reader.Close();
dataStream.Close();
response.Close();

MemoryStream stream = new MemoryStream(System.Text.Encoding.UTF8.GetBytes(responseFromServer));
HtmlLoadOptions options = new HtmlLoadOptions("https://www.sec.gov/Archives/edgar/data/320193/000119312517003764/");
options.ExternalResourcesCredentials = System.Net.CredentialCache.DefaultCredentials;
options.PageInfo.IsLandscape = true;
// Load HTML file
Document pdfDocument = new Document(stream, options);
// Save output as PDF format
pdfDocument.Save("HTMLtoPDF_DOM.pdf");

In case of any further assistance please feel free to contact us.

Best Regards,

asadanwar · February 13, 2017, 11:10pm

Hi Team,
Converting this url into pdf cause text overwrite in pdf.

System.Net.WebRequest request = System.Net.WebRequest.Create(“https://www.sec.gov/Archives/edgar/data/789019/000119312516740765/d243670ddefa14a.htm”);
// If required by the server, set the credentials.
request.Credentials = System.Net.CredentialCache.DefaultCredentials;
// time out in miliseconds before the request times out
// request.Timeout = 100;

// Get the response.
System.Net.HttpWebResponse response = (System.Net.HttpWebResponse)request.GetResponse();

// Get the stream containing content returned by the server.
Stream dataStream = response.GetResponseStream();
// Open the stream using a StreamReader for easy access.
StreamReader reader = new StreamReader(dataStream);
// Read the content.
string responseFromServer = reader.ReadToEnd();
reader.Close();
dataStream.Close();
response.Close();

MemoryStream stream = new MemoryStream(System.Text.Encoding.UTF8.GetBytes(responseFromServer));
HtmlLoadOptions options = new HtmlLoadOptions(“https://www.sec.gov/Archives/edgar/data/789019/000119312516740765/”);
options.ExternalResourcesCredentials = System.Net.CredentialCache.DefaultCredentials;

// Load HTML file
Document pdfDocument = new Document(stream, options);

options.PageInfo.IsLandscape = true;

// Save output as PDF format
pdfDocument.Save(@“C:\1223HTMLtoPDF_DOM.pdf”);

asadanwar · February 13, 2017, 11:36pm

Hi team,
As i am converting this url into pdf, text overwrite happens. Please look into it.

System.Net.WebRequest request = System.Net.WebRequest.Create(“https://www.sec.gov/Archives/edgar/data/789019/000119312516740765/d243670ddefa14a.htm”);
// If required by the server, set the credentials.
request.Credentials = System.Net.CredentialCache.DefaultCredentials;
// time out in miliseconds before the request times out
// request.Timeout = 100;

// Get the response.
System.Net.HttpWebResponse response = (System.Net.HttpWebResponse)request.GetResponse();

// Get the stream containing content returned by the server.
Stream dataStream = response.GetResponseStream();
// Open the stream using a StreamReader for easy access.
StreamReader reader = new StreamReader(dataStream);
// Read the content.
string responseFromServer = reader.ReadToEnd();
reader.Close();
dataStream.Close();
response.Close();

MemoryStream stream = new MemoryStream(System.Text.Encoding.UTF8.GetBytes(responseFromServer));
HtmlLoadOptions options = new HtmlLoadOptions(“https://www.sec.gov/Archives/edgar/data/789019/000119312516740765/”);
options.ExternalResourcesCredentials = System.Net.CredentialCache.DefaultCredentials;

// Load HTML file
Document pdfDocument = new Document(stream, options);

options.PageInfo.IsLandscape = true;

// Save output as PDF format
pdfDocument.Save(@“C:\1223HTMLtoPDF_DOM.pdf”);

asad.ali · February 14, 2017, 12:25pm

Hi Asad,

Thanks for contacting support.

I have tried to generate PDF from the code which you have shared and noticed that the content of generated document did not render correctly (i.e some text overlapped). I have also tried to resize the content of the document using following code snippet but I am sorry I got no success.

pdfDocument.ProcessParagraphs();

// Resize contents of resultant PDF

int[] page_cnt1 = new int[pdfDocument.Pages.Count];

for (int i = 0; i < pdfDocument.Pages.Count; i++)
{
    page_cnt1[i] = i + 1;
}

PdfFileEditor pfe = new PdfFileEditor();

pfe.ResizeContents(pdfDocument, page_cnt1, PdfFileEditor.ContentsResizeParameters.PageResize(Aspose.Pdf.PageSize.A4.Width, Aspose.Pdf.PageSize.A4.Height));

I have logged this issue as PDFNET-42273 in our issue tracking system for the purpose of further investigation. We will look into the details of the issue and keep you updated on the status of its resolution within this thread. Please be patient and spare us a little time. We are sorry for the inconvenience.

Best Regards,