getting an InvalidOperationException - “Nullable object must have a value”.
using the following code:
var httpClient = new HttpClient();
var url = @"https://manuals.health.mil/pages/DisplayManualHtmlFile/2022-12-05/AsOf/TOT5/C1TOC.html";
var cancellationToken = new CancellationToken();
var request = new HttpRequestMessage(HttpMethod.Get, url);
using var result = httpClient.Send(request, HttpCompletionOption.ResponseContentRead, cancellationToken);
result.EnsureSuccessStatusCode();
using var stream = result.Content.ReadAsStream(cancellationToken);
stream.Position = 0;
var memoryStream = new MemoryStream();
stream.CopyTo(memoryStream);
memoryStream.Position = 0;
(new { Length = memoryStream.Length, Position = memoryStream.Position }).Dump();
var options = new HtmlLoadOptions(url)
{
HtmlMediaType = HtmlMediaType.Print,
PageLayoutOption = HtmlPageLayoutOption.ScaleToPageWidth,
PageInfo = new PageInfo { Width = PageSize.PageLetter.Width, Height = PageSize.PageLetter.Height, IsLandscape = false },
};
using var pdf = new Document(memoryStream, options);
var docInfo = new DocumentInfo(pdf)
{
Title = "title"
};
The InvalidOperationException you’re encountering, which states “Nullable object must have a value,” may be related to how the HTML content is being fetched and processed. It seems that the issue arises from the way the HttpClient is being used to retrieve the HTML stream.
One potential solution is to ensure that the HTTP request is fully completed before attempting to read the stream. In some cases, using GetAwaiter().GetResult() can lead to issues where the operation does not complete as expected. Instead, consider using asynchronous methods properly to avoid blocking calls.
Here’s a modified version of your code that uses await to ensure the request completes before processing the stream:
var httpClient = new HttpClient();
var url = @"https://manuals.health.mil/pages/DisplayManualHtmlFile/2022-12-05/AsOf/TOT5/C1TOC.html";
var cancellationToken = new CancellationToken();
var request = new HttpRequestMessage(HttpMethod.Get, url);
using var result = await httpClient.SendAsync(request, HttpCompletionOption.ResponseContentRead, cancellationToken);
result.EnsureSuccessStatusCode();
using var stream = await result.Content.ReadAsStreamAsync(cancellationToken);
stream.Position = 0;
var memoryStream = new MemoryStream();
await stream.CopyToAsync(memoryStream, cancellationToken);
memoryStream.Position = 0;
var options = new HtmlLoadOptions(url)
{
HtmlMediaType = HtmlMediaType.Print,
PageLayoutOption = HtmlPageLayoutOption.ScaleToPageWidth,
PageInfo = new PageInfo { Width = PageSize.PageLetter.Width, Height = PageSize.PageLetter.Height, IsLandscape = false },
};
using var pdf = new Document(memoryStream, options);
var docInfo = new DocumentInfo(pdf)
{
Title = "title"
};
In this code, I’ve replaced the synchronous calls with their asynchronous counterparts, which should help ensure that the stream is fully loaded before you attempt to read from it. This change may resolve the InvalidOperationException you are experiencing.
If the problem persists, please check if the HTML content being fetched is valid and complete, as any issues with the content itself could also lead to exceptions during processing.
Can you please confirm if you have tried with the latest version of the API? Also, please share the complete stack trace information with us so that we can further proceed accordingly.
using Aspose.Html;
using Aspose.Html.Converters;
using Aspose.Html.Saving;
using PdfTemp;
var httpClient = new HttpClient();
var url = @"https://manuals.health.mil/pages/DisplayManualHtmlFile/2022-12-05/AsOf/TOT5/C1TOC.html";
//var url = @"https://manuals.health.mil/pages/DisplayManualHtmlFile/2024-10-24/AsOf/TOT5/FOREWORD.html";
var cancellationToken = new CancellationToken();
var request = new HttpRequestMessage(HttpMethod.Get, url);
using var result = httpClient.Send(request, HttpCompletionOption.ResponseContentRead, cancellationToken);
result.EnsureSuccessStatusCode();
using var stream = result.Content.ReadAsStream(cancellationToken);
stream.Seek(0, SeekOrigin.Begin);
using var streamProvider = new MemoryStreamProvider();
using var doc = new HTMLDocument(stream, url);
Converter.ConvertHTML(doc, new PdfSaveOptions(), streamProvider);
var pdfStream = streamProvider.Streams.First();
using (var fileStream = File.Create("file.pdf"))
{
pdfStream.Seek(0, SeekOrigin.Begin);
pdfStream.CopyTo(fileStream);
}
So V24 fixed that issue, but the PDF formatting is horrendous. Adding the base href to the html does not fix the formatting issue. And adding the base href to the HtmlLoadOptions constructor causes a stack overflow.
var baseHref = "https://manuals.health.mil"
var options = new HtmlLoadOptions(baseHref)
{
PageLayoutOption = HtmlPageLayoutOption.ScaleToPageWidth,
PageInfo = new PageInfo { Width = PageSize.PageLetter.Width, Height = PageSize.PageLetter.Height, IsLandscape = false },
};
using var pdf = new Document(htmlStream, options); // <--- booom!
Would you please confirm if you are using Aspose.PDF only for now to convert HTML to PDF? Also, can you please share the generated output PDF for our reference?