Aspose.Pdf.Document throws InvalidCastException from html stream

serdarb · October 30, 2015, 5:54am

Hi,

I am getting InvalidCastException while creating pdf doc from html of long string. for example a wikipedia article link.

(if I do this with short html string it works. you can see the working sample in commented line)

What may be the cause?

            const string url = "https://tr.wikipedia.org/wiki/T%C3%BCrk%C3%A7e";
            const string filePath = "test_turkish.pdf";

            var license = new License();
            license.SetLicense(@"C:\License\Aspose.Total.lic");
            
            var html = string.Empty;
            //html = @"
test heading
test paragraph... test turkish chars üğşiçöıIÜĞİŞÇÖ
";

            var request = WebRequest.Create(url);
            var response = (HttpWebResponse)request.GetResponse();
            var responseStream = response.GetResponseStream();
            if (responseStream != null)
            {
                var reader = new StreamReader(responseStream);
                html = reader.ReadToEnd();
                reader.Close();
                responseStream.Close();
                response.Close();
            }

            var stream = new MemoryStream(Encoding.UTF8.GetBytes(html));
            var options = new HtmlLoadOptions();
            
            var pdfDocument = new Document(stream, options);
            pdfDocument.Save(filePath);

codewarior · November 2, 2015, 9:01am

Hi Serdar,

Thanks for using our API’s and sorry for the delayed response.

I have tested the scenario and as per my observations, an OutOfMemory exception is being generated. For the sake of correction, I have logged this problem
as PDFNEWNET-39632 in our issue tracking system. We will
further look into the details of this problem and will keep you updated on the
status of correction. Please be patient and spare us little time. We are sorry
for this inconvenience.<!–[if gte mso 9]>
<w:WordDocument>
<w:View>Normal</w:View>
<w:Zoom>0</w:Zoom>
<w:TrackMoves/>
<w:TrackFormatting/>
<w:PunctuationKerning/>
<w:ValidateAgainstSchemas/>
<w:SaveIfXMLInvalid>false</w:SaveIfXMLInvalid>
<w:IgnoreMixedContent>false</w:IgnoreMixedContent>
<w:AlwaysShowPlaceholderText>false</w:AlwaysShowPlaceholderText>
<w:DoNotPromoteQF/>
<w:LidThemeOther>EN-US</w:LidThemeOther>
<w:LidThemeAsian>ZH-TW</w:LidThemeAsian>
<w:LidThemeComplexScript>AR-SA</w:LidThemeComplexScript>
<w:Compatibility>
<w:BreakWrappedTables/>
<w:SnapToGridInCell/>
<w:WrapTextWithPunct/>
<w:UseAsianBreakRules/>
<w:DontGrowAutofit/>
<w:SplitPgBreakAndParaMark/>
<w:EnableOpenTypeKerning/>
<w:DontFlipMirrorIndents/>
<w:OverrideTableStyleHps/>
</w:Compatibility>
<m:mathPr>
<m:mathFont m:val=“Cambria Math”/>
<m:brkBin m:val=“before”/>
<m:brkBinSub m:val="–"/>
<m:smallFrac m:val=“off”/>
<m:dispDef/>
<m:lMargin m:val=“0”/>
<m:rMargin m:val=“0”/>
<m:defJc m:val=“centerGroup”/>
<m:wrapIndent m:val=“1440”/>
<m:intLim m:val=“subSup”/>
<m:naryLim m:val=“undOvr”/>
</m:mathPr></w:WordDocument>
<![endif]–><!–[if gte mso 10]>

/* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-parent:""; mso-padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin-top:0in; mso-para-margin-right:0in; mso-para-margin-bottom:10.0pt; mso-para-margin-left:0in; line-height:115%; mso-pagination:widow-orphan; font-size:11.0pt; font-family:"Calibri","sans-serif"; mso-ascii-font-family:Calibri; mso-ascii-theme-font:minor-latin; mso-hansi-font-family:Calibri; mso-hansi-theme-font:minor-latin; mso-fareast-language:EN-US;}

<![endif]–>

tilal.ahmad · November 2, 2015, 9:13am

Hi Serdar,

In addition to above reply, please use following code snippet for web page to PDF conversion, as suggested in documentation link. It will help you to accomplish the task.

// Create a request for the URL.<o:p></o:p>

WebRequest request = WebRequest.Create("https://tr.wikipedia.org/wiki/T%C3%BCrk%C3%A7e");

// If required by the server, set the credentials.

request.Credentials = CredentialCache.DefaultCredentials;

// time out in miliseconds before the request times out

//request.Timeout = 100;

// Get the response.

HttpWebResponse response = (HttpWebResponse)request.GetResponse();

// Get the stream containing content returned by the server.

Stream dataStream = response.GetResponseStream();

// Open the stream using a StreamReader for easy access.

StreamReader reader = new StreamReader(dataStream);

// Read the content.

string responseFromServer = reader.ReadToEnd();

reader.Close();

dataStream.Close();

response.Close();

MemoryStream stream = new MemoryStream(System.Text.Encoding.UTF8.GetBytes(responseFromServer));

HtmlLoadOptions options = new HtmlLoadOptions("https://tr.wikipedia.org/wiki/T%C3%BCrk%C3%A7e");

// Load HTML file

Document pdfDocument = new Document(stream, options);

options.PageInfo.IsLandscape = true;

// Save output as PDF format

pdfDocument.Save(myDir + "HTMLtoPDF_DOM.pdf");

Please feel free to contact us for any further assistance.

Best Regards,