Hello all… am getting a little stuck trying to convert a web page to PDF
The web page could be anything, public, wherever - as an example I tried:
I have no idea where its CSS files are or anything like that. I think the problem is with my HTMLLoadOptions. In other examples I see online it seems assumed that the “basePath” is a known entity but it is not. All I can do is pass in the same value as the original web page. So in the following code, sFile = “Home - BBC News”
oRequest = Net.HttpWebRequest.Create(requestUri:=New Uri(sFile))
If sFile.ToLower.IndexOf("https://") = 0 Then
Try
oRequest.UseDefaultCredentials = True
Catch ex As Exception
End Try
End If
oResponse = oRequest.GetResponse()
… I’m leaving out the code to read the response text with a StreamReader and putting it into a MemoryStream named oStream - All of that is working properly and I’m taking care of Closing and Disposing… So we have oStream
oHTML = New Aspose.Pdf.HtmlLoadOptions(basePath:=sFile)
oPDF = New Aspose.Pdf.Document(input:=oStream, options:=oHTML)
oPDF.Save(outputFileName:=sToFileName, format:=Aspose.Pdf.SaveFormat.Pdf)
Again, I’m taking care of cleaning up after myself. Suffice to say that this code DOES produce an output PDF, and I can compare the text it outputs in the PDF to what I can see on “Home - BBC News” … but that is where the similarities end. The output format is pretty much garbage…
As I said already - I think my main problem is located in the code
oHTML = New Aspose.Pdf.HtmlLoadOptions(basePath:=sFile)
Is there a way to “do that better”? I would NOT be intimidated by the notion of having to parse the response string to extract “css” locations.
Would you please share SSCCE code along with generated file so that we may try to reproduce and investigate it in our environment. Before sharing requested data, please ensure using Aspose.PDF for .NET 19.10 and visiting Convert Web Page to PDF for your kind reference.
Ok - with regard to providing SSCCE I cannot share the code that gets our Licence, but that part should be self explanatory. Suffice to say we are loading a valid Licence
Other than that. sFile is a string that contains
"https://bbc.co.uk/news"
I looked at your page Convert HTML to PDF in .NET|Aspose.PDF for .NET and as far as I can tell my code is an almost literal copy of same, except for the fact that I code in VB.NET. The following is my actual code:
Dim sFile As String
sFile = "https://bbc.co.uk/news"
'Perform Web Request
Dim oRequest As System.Net.HttpWebRequest = Nothing
Dim oResponse As System.Net.HttpWebResponse = Nothing
Dim oStream As IO.MemoryStream = Nothing
Dim oStreamReader As IO.StreamReader = Nothing
Dim sResponse As String
Dim oHTML As Aspose.Pdf.HtmlLoadOptions = Nothing
Dim oPDF As Aspose.Pdf.Document = Nothing
Try
oRequest = Net.HttpWebRequest.Create(requestUri:=New Uri(sFile))
If sFile.ToLower.IndexOf("https://") = 0 Then
Try
oRequest.UseDefaultCredentials = True
Catch ex As Exception
End Try
End If
oResponse = oRequest.GetResponse()
oStreamReader = New IO.StreamReader(oResponse.GetResponseStream())
sResponse = oStreamReader.ReadToEnd()
If Not oStreamReader Is Nothing Then
Try
oStreamReader.Close()
Catch
End Try
Try
oStreamReader.Dispose()
Catch
End Try
oStreamReader = Nothing
End If
Try
oResponse.Close()
Catch
End Try
Try
oResponse.Dispose()
Catch
End Try
oResponse = Nothing
oStream = New IO.MemoryStream(System.Text.Encoding.UTF8.GetBytes(sResponse))
'oResult.Action = "Load into PDF"
Try
oHTML = New Aspose.Pdf.HtmlLoadOptions(basePath:=sFile)
Catch ex As Exception
'This is part of a DLL - for your testing you can replace this with a simple Return False
Return False
'oResult.Data = ex.Message
'oResult.Response = ConverterResponse.unknownerror
'Return oResult
End Try
Try
oPDF = New Aspose.Pdf.Document(input:=oStream, options:=oHTML)
Catch ex As Exception
'This is part of a DLL - for your testing you can replace this with a simple Return False
Return False
'oResult.Data = ex.Message
'oResult.Response = ConverterResponse.unknownerror
'Return oResult
End Try
Try
oPDF.Save(outputFileName:=sToFileName, format:=Aspose.Pdf.SaveFormat.Pdf)
Catch ex As Exception
'This is part of a DLL - for your testing you can replace this with a simple Return False
Return False
'oResult.Data = ex.Message
'oResult.Response = ConverterResponse.unknownerror
'Return oResult
End Try
Catch ex As Exception
Finally
If Not oPDF Is Nothing Then
Try
oPDF.Dispose()
Catch
End Try
oPDF = Nothing
End If
If Not oHTML Is Nothing Then
oHTML = Nothing
End If
If Not oStream Is Nothing Then
Try
oStream.Close()
Catch
End Try
Try
oStream.Dispose()
Catch
End Try
oStream = Nothing
End If
If Not oStreamReader Is Nothing Then
Try
oStreamReader.Close()
Catch
End Try
Try
oStreamReader.Dispose()
Catch
End Try
oStreamReader = Nothing
End If
If Not oResponse Is Nothing Then
Try
oResponse.Close()
Catch
End Try
Try
oResponse.Dispose()
Catch
End Try
oResponse = Nothing
End If
If Not oRequest Is Nothing Then
oRequest = Nothing
End If
End Try
We were able to replicate the issue in our environment while using Aspose.PDF for .NET 19.10. Hence, we have logged it as PDFNET-47185 in our issue tracking system for the sake of correction. We will further look into details of the issue and keep you posted with the status of its correction. Please be patient and spare us little time.