Accesibility not avaliable in when converting from html to pdf

DonSpaghetti · April 6, 2022, 1:32pm

I have some trouble converting a html document to pdf where images etc. is accessible from a program like NVDA (NV Access | Download NVDA). I’m using Aspose with .NET 5.

I have made a small example where I’m creating a simple html-string with images that have an alt text and converting it to a byte[].

Code:

var temp = @"<!DOCTYPE html>
<html>
<body>

<h1>My First Heading</h1>

<p>My first paragraph.</p>

<figure class=""image"" contenteditable=""false""><img src=""https://media.istockphoto.com/photos/hot-air-balloons-flying-over-the-botan-canyon-in-turkey-picture-id1297349747?b=1&k=20&m=1297349747&s=170667a&w=0&h=oH31fJty_4xWl_JQ4OIQWZKP8C6ji9Mz7L4XmEnbqRU="" alt=""A random bird BOBBY BOBBY BOOBY"" width=""366"" height=""366""></figure>

<p>My second paragraph.</p>

<img src=""https://media.istockphoto.com/photos/hot-air-balloons-flying-over-the-botan-canyon-in-turkey-picture-id1297349747?b=1&k=20&m=1297349747&s=170667a&w=0&h=oH31fJty_4xWl_JQ4OIQWZKP8C6ji9Mz7L4XmEnbqRU="" alt=""A random bird gives an alt text"" width=""366"" height=""366"">
</body>
</html>";

    byte[] bytes = Encoding.ASCII.GetBytes(temp);
    MemoryStream htmlStringStreamTemp = new MemoryStream(bytes);
    htmlStringStreamTemp.Position = 0;
    Aspose.Pdf.HtmlLoadOptions htmlLoadOptionsTemp = new Aspose.Pdf.HtmlLoadOptions();
    Document htmlDocumentTemp = new Document(htmlStringStreamTemp, htmlLoadOptionsTemp);

    var pdfDocumentTemp = new Document();
    pdfDocumentTemp.Pages.Add(htmlDocumentTemp.Pages);

    pdfDocumentTemp.Save(@"C:\Temp\temp03.pdf");

    var options = new PdfFormatConversionOptions(PdfFormat.PDF_A_2A);
    options.ErrorAction = ConvertErrorAction.Delete;
    pdfDocumentTemp.Convert(options);

    pdfDocumentTemp.Save(@"C:\Temp\temp04.pdf");

Can you see what I’m doing wrong or if you don’t support this feature yet?

tahir.manzoor · April 6, 2022, 4:27pm

@DonSpaghetti

We have tested the scenario and have not found any issue with output PDF. Could you please share some more detail about your issue along with your requirement? We will then provide you more information about your query.

DonSpaghetti · April 6, 2022, 8:17pm

After the html have been converted would I like to be able to have a program read the alternate text. I cant make it work with the code above.

Can you please send me the pdf-file you have converted so i can test it on my computer?

tahir.manzoor · April 7, 2022, 3:34am

@DonSpaghetti

Please check the attached PDF files generated by your code example.
temp03.pdf (481.3 KB)
temp04.pdf (482.5 KB)

DonSpaghetti · April 7, 2022, 6:31am

When I’m using NVDA to read the text it is only registrering the paragraphs and heading but not the alternate text. What tool are you using?

See the attached file for confirmation:
speechViewer.jpg (38.1 KB)

tahir.manzoor · April 7, 2022, 5:18pm

@DonSpaghetti

We have logged this problem in our issue tracking system as PDFNET-51611. You will be notified via this forum thread once this issue is resolved.

We apologize for your inconvenience.

DonSpaghetti · April 12, 2022, 7:23am

Do you know when the issue will be fixed?

tahir.manzoor · April 12, 2022, 3:45pm

@DonSpaghetti

Currently, your issue is pending for analysis and is in the queue. Once we complete the analysis of your issue, we will then be able to provide you an estimate.