Converting web page to PDF?

rozeboosje · October 25, 2019, 2:02pm

Hello all… am getting a little stuck trying to convert a web page to PDF

The web page could be anything, public, wherever - as an example I tried:

I have no idea where its CSS files are or anything like that. I think the problem is with my HTMLLoadOptions. In other examples I see online it seems assumed that the “basePath” is a known entity but it is not. All I can do is pass in the same value as the original web page. So in the following code, sFile = “Home - BBC News”

                oRequest = Net.HttpWebRequest.Create(requestUri:=New Uri(sFile))
                If sFile.ToLower.IndexOf("https://") = 0 Then
                    Try
                        oRequest.UseDefaultCredentials = True
                    Catch ex As Exception

                    End Try
                End If
                oResponse = oRequest.GetResponse()

… I’m leaving out the code to read the response text with a StreamReader and putting it into a MemoryStream named oStream - All of that is working properly and I’m taking care of Closing and Disposing… So we have oStream

oHTML = New Aspose.Pdf.HtmlLoadOptions(basePath:=sFile)
oPDF = New Aspose.Pdf.Document(input:=oStream, options:=oHTML)
oPDF.Save(outputFileName:=sToFileName, format:=Aspose.Pdf.SaveFormat.Pdf)

Again, I’m taking care of cleaning up after myself. Suffice to say that this code DOES produce an output PDF, and I can compare the text it outputs in the PDF to what I can see on “Home - BBC News” … but that is where the similarities end. The output format is pretty much garbage…

As I said already - I think my main problem is located in the code
oHTML = New Aspose.Pdf.HtmlLoadOptions(basePath:=sFile)

Is there a way to “do that better”? I would NOT be intimidated by the notion of having to parse the response string to extract “css” locations.

Farhan.Raza · October 25, 2019, 8:16pm

@rozeboosje

Thank you for contacting support.

Would you please share SSCCE code along with generated file so that we may try to reproduce and investigate it in our environment. Before sharing requested data, please ensure using Aspose.PDF for .NET 19.10 and visiting Convert Web Page to PDF for your kind reference.

rozeboosje · October 25, 2019, 8:51pm

Ok - with regard to providing SSCCE I cannot share the code that gets our Licence, but that part should be self explanatory. Suffice to say we are loading a valid Licence

Other than that. sFile is a string that contains

"https://bbc.co.uk/news"

I looked at your page Convert HTML to PDF in .NET|Aspose.PDF for .NET and as far as I can tell my code is an almost literal copy of same, except for the fact that I code in VB.NET. The following is my actual code:

            Dim sFile As String
            sFile = "https://bbc.co.uk/news"
            'Perform Web Request
            Dim oRequest As System.Net.HttpWebRequest = Nothing
            Dim oResponse As System.Net.HttpWebResponse = Nothing
            Dim oStream As IO.MemoryStream = Nothing
            Dim oStreamReader As IO.StreamReader = Nothing
            Dim sResponse As String
            Dim oHTML As Aspose.Pdf.HtmlLoadOptions = Nothing
            Dim oPDF As Aspose.Pdf.Document = Nothing
            Try
                oRequest = Net.HttpWebRequest.Create(requestUri:=New Uri(sFile))
                If sFile.ToLower.IndexOf("https://") = 0 Then
                    Try
                        oRequest.UseDefaultCredentials = True
                    Catch ex As Exception

                    End Try
                End If
                oResponse = oRequest.GetResponse()
                oStreamReader = New IO.StreamReader(oResponse.GetResponseStream())
                sResponse = oStreamReader.ReadToEnd()
                If Not oStreamReader Is Nothing Then
                    Try
                        oStreamReader.Close()
                    Catch

                    End Try
                    Try
                        oStreamReader.Dispose()
                    Catch

                    End Try
                    oStreamReader = Nothing
                End If
                Try
                    oResponse.Close()
                Catch

                End Try
                Try
                    oResponse.Dispose()
                Catch

                End Try
                oResponse = Nothing
                oStream = New IO.MemoryStream(System.Text.Encoding.UTF8.GetBytes(sResponse))
                'oResult.Action = "Load into PDF"
                Try
                    oHTML = New Aspose.Pdf.HtmlLoadOptions(basePath:=sFile)
                Catch ex As Exception
                    'This is part of a DLL - for your testing you can replace this with a simple Return False
                    Return False
                    'oResult.Data = ex.Message
                    'oResult.Response = ConverterResponse.unknownerror
                    'Return oResult
                End Try
                Try
                    oPDF = New Aspose.Pdf.Document(input:=oStream, options:=oHTML)
                Catch ex As Exception
                    'This is part of a DLL - for your testing you can replace this with a simple Return False
                    Return False
                    'oResult.Data = ex.Message
                    'oResult.Response = ConverterResponse.unknownerror
                    'Return oResult
                End Try
                Try
                    oPDF.Save(outputFileName:=sToFileName, format:=Aspose.Pdf.SaveFormat.Pdf)
                Catch ex As Exception
                    'This is part of a DLL - for your testing you can replace this with a simple Return False
                    Return False
                    'oResult.Data = ex.Message
                    'oResult.Response = ConverterResponse.unknownerror
                    'Return oResult
                End Try

            Catch ex As Exception
            Finally
                If Not oPDF Is Nothing Then
                    Try
                        oPDF.Dispose()
                    Catch

                    End Try
                    oPDF = Nothing
                End If
                If Not oHTML Is Nothing Then
                    oHTML = Nothing
                End If
                If Not oStream Is Nothing Then
                    Try
                        oStream.Close()
                    Catch

                    End Try
                    Try
                        oStream.Dispose()
                    Catch

                    End Try
                    oStream = Nothing
                End If
                If Not oStreamReader Is Nothing Then
                    Try
                        oStreamReader.Close()
                    Catch

                    End Try
                    Try
                        oStreamReader.Dispose()
                    Catch

                    End Try
                    oStreamReader = Nothing
                End If
                If Not oResponse Is Nothing Then
                    Try
                        oResponse.Close()
                    Catch

                    End Try
                    Try
                        oResponse.Dispose()
                    Catch

                    End Try
                    oResponse = Nothing
                End If
                If Not oRequest Is Nothing Then
                    oRequest = Nothing
                End If
            End Try

rozeboosje · October 25, 2019, 8:54pm

news.pdf (1.4 MB)

asad.ali · October 26, 2019, 9:34am

@rozeboosje

We were able to replicate the issue in our environment while using Aspose.PDF for .NET 19.10. Hence, we have logged it as PDFNET-47185 in our issue tracking system for the sake of correction. We will further look into details of the issue and keep you posted with the status of its correction. Please be patient and spare us little time.

We are sorry for the inconvenience.

rozeboosje · October 26, 2019, 11:12am

Hello Asad,

Thank you… Looking forward to that!