PowerPoint to PDF in Java - Accessibility Alt-Text Not Read by Screen Reader

Hi,

I have encountered a problem while trying to transform a PowerPoint to PDF with the UA compliance using Aspose.Slides for Java (25.4). When trying out the result, a screen reader is not able to detect pictures or their alt-texts. By inspecting the accessibility content using Firefox inspector I can see clear differences between the generated layout from Aspose and PowerPoint.

Following the documentation I use the following code to transform the presentation:

Presentation pres = null;
PdfOptions options = new PdfOptions();

options.setCompliance(PdfCompliance.PdfUa);

try (FileInputStream is = new FileInputStream(fileIn);
	 FileOutputStream os = new FileOutputStream(fileOut)) {
	pres = new Presentation(is);

	for (var slide : pres.getSlides()) {
		for (var shape : slide.getShapes()) {
			if (shape instanceof PictureFrame) {
				if (ShapeType.NotDefined == ((PictureFrame) shape).getShapeType()) {
					((PictureFrame) shape).setShapeType(ShapeType.Rectangle);
				}
			}
		}
	}
	
	pres.save(os, SaveFormat.Pdf, options);
} catch (Exception e) {
	e.printStackTrace();
} finally {
	if (pres != null) pres.dispose();
}

Additionally, I had to add some fallback code for the shape type as it is not set on the image by default and I figured out that Aspose won’t set the alternate text if the shape type is not defined.

I have attached both result PDFs whereof one is generated by Aspose and one by PowerPoint.

clean-sample-by-aspose.pdf (829.2 KB)
clean-sample-by-powerpoint.pdf (173.1 KB)

Does someone know how to enforce a more PowerPoint like structure or at least a structure so a screen reader will pick up the alt-text on the images? Currently they seem to be ignored completely.

Thanks in advance!

@Nobl

It seems you are experiencing issues with the accessibility of images when converting a PowerPoint presentation to PDF using Aspose.Slides for Java, specifically regarding the screen reader’s ability to detect alt-text for pictures.

From your description, it appears that the alt-text is not being recognized due to the shape type of the images not being defined. You have already implemented a workaround by setting the shape type to ShapeType.Rectangle for PictureFrame instances where the shape type is not defined. This is a good step, as the library requires a defined shape type to associate the alt-text correctly.

To further improve the accessibility of the generated PDF and ensure that screen readers can detect the alt-text, consider the following suggestions:

  1. Ensure Alt-Text is Set: Make sure that the alt-text for each image is explicitly set before saving the presentation. You can do this by checking if the PictureFrame has alt-text and setting it if it doesn’t.

  2. Review PdfOptions Settings: While you are already using PdfCompliance.PdfUa, ensure that all other relevant settings in PdfOptions are configured correctly to enhance accessibility.

  3. Check for Other Shape Types: If there are other shape types that might be affecting the alt-text detection, consider handling those as well. For example, if you have other shapes that are not pictures but still need alt-text, ensure they are also configured correctly.

  4. Testing with Different Screen Readers: Sometimes, different screen readers may interpret the PDF differently. Testing with various screen readers can help identify if the issue is specific to one tool.

Here’s a modified snippet of your code that ensures alt-text is set for each image:

for (var slide : pres.getSlides()) {
    for (var shape : slide.getShapes()) {
        if (shape instanceof PictureFrame) {
            PictureFrame picture = (PictureFrame) shape;
            if (ShapeType.NotDefined == picture.getShapeType()) {
                picture.setShapeType(ShapeType.Rectangle);
            }
            // Ensure alt-text is set
            if (picture.getAlternativeText() == null || picture.getAlternativeText().isEmpty()) {
                picture.setAlternativeText("Your alt text here"); // Set appropriate alt text
            }
        }
    }
}

By ensuring that the alt-text is explicitly set and that the shape types are correctly defined, you should see an improvement in how screen readers interpret the PDF.

If you continue to experience issues, consider reaching out to Aspose support for more tailored assistance regarding accessibility features in your generated PDFs.

Sources:
[1]: PdfOptions | Aspose.Slides for Java API Reference

Yes, alt-texts are properly set in the PowerPoint. Both PDF variants contain the alt-texts but a screen reader is not able to read the alt-texts in the PDF generated by Aspose. The only real difference is the structre of the resulting PDF.

@Nobl,
Thank you for contacting free support. Could you please share a sample presentation file and clarify which screen reader you mean? You can zip the presentation file and upload the archive here.

Hi @andrey.potapov ,

Thank you for your quick reply. I was using the native Windows screen reader and NVDA. Both seem to not see the generated accessibility structure - they do not even recognize the images let alone their alt-texts. Only the free license terms are read as they are in the text layer I suppose. I will attach the zipped presentation I used to transform.

clean-sample.zip (820.9 KB)

@Nobl,
Thank you for the additional information. I need some time to check the issue. I will get back to you as soon as possible.

@Nobl,
Thank you for your patience. I’ve reviewed the issue carefully, but I still don’t fully understand it.

Currently, I can only confirm that the alt text for the images is missing when converting the PowerPoint presentation to a PDF document.

I ran both code examples, with and without the fallback code, and the result is the same: the alt text is not present in the output PDF. Could you please clarify in more detail what the difference is?

Hi @andrey.potapov ,

At first I just ran the normal “Presentation pres; pres.save()” code that only converted the PPTX to a PDF using UA compliance. There I observed that the alt-text on the first image (left) was missing. The second image (right) on the presentation is just that image copied in PowerPoint with another alt-text. But on the converted PDF, the second image actually had the “ABCDEFG” alt text set.

For the inspection of the PDF I used the online tool: Inspect PDF Online - PDFCrowd which shows the internal structure and if there is some “alt” text associated with the picture. There I noticed that the first image was not embedded like the second image was.

To investigate further, I opened the presentation in its XML view by saving the .pptx as .zip and then opening the “ppt/slides/slide1.xml” file. By comparing both images, I only found one real difference: the first image where the alt text is not saved to the PDF is missing a <a:prstGeom prst=“rect”> attribute (located under <p:pic>/<p:spPr>) which does exist on the second image I copied in PowerPoint. Seemingly, Microsoft does not always add that attribute to the picture frame. PowerPoint is able to export the presentation as a PDF containing the alt-texts, Aspose.Slides though, seems to have a problem when the <a:prestGeom> tag is not set. Thats what my fallback code is for. When the tag is missing, there is no shape type set and thus, so I assume, Aspose does render the PDF differently for such a picture frame. Setting it to “Rectangle” helps.

Here is a screenshot from the XML of the first slide where the first picture does not have the tag but the second does: xml-structure.png (170.0 KB) - when adding that “<a:prstGeom>…” section to the first image, repacking the zip as a .pptx, Aspose picks up the shape type to be a rectangle for both images so I am pretty sure that this is one of the problems.

For my tests I first checked that the resulting PDF (the one transformed by Aspose and one exported by PowerPoint) does follow the UA compliance using: Your PDF Accessibility Checker which validated the PDF to be compliant.

The second check was to use a screen reader (the native Windows one and NVDA) to check that the alt-texts are actually recognized and read out aloud. That test failed on the PDF transformed by Aspose. So I used Firefox to view the accessibility structure of the PDF and noticed differences between Aspose and PowerPoint. I am attaching a screenshot of both structures to see for yourself.

Result from PowerPoint: accessibility-structure-powerpoint.png (40.9 KB) - you can see that each figure has an “image” tag with the alt-text.
Result from Aspose:
accessibility-structure-aspose.png (62.1 KB) - you can see that the figures are on the same level of hierarchy and do not contain an “image” tag. I also tried to remove the text layer of the PDF so only the images stay, but this did not help either.

According to my inspections, I assume that either the generated structure of the PDF is something the screen reader is not able to understand or that the missing “image” tag is actually required to be identified as an alt-text. That you were not able to convert the PowerPoint to PDF with alt texts, I do not understand but hopefully this longer explanation helps to find the problem and a way to solve it.

@Nobl,
Thank you for the issue details. I need some time to check the issue. I will get back to you as soon as possible.

@Nobl,
Thank you for your patience.

We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.

Issue ID(s): SLIDESJAVA-39682

You can obtain Paid Support Services if you need support on a priority basis, along with the direct access to our Paid Support management team.

The issues you found earlier (filed as SLIDESJAVA-39682) have been resolved in Aspose.Slides for Java 25.8 (Maven, JAR).
You can check all fixes on the Release Notes page.
You can also find the latest version of our library on the Product Download page.

@Nobl,
With Aspose.Slides for Java 25.8, please use the following code example:

PdfOptions pdfOptions = new PdfOptions();
pdfOptions.setCompliance(PdfCompliance.PdfUa);
Presentation presentation = new Presentation("clean-sample.pptx");
presentation.save("output.pdf", SaveFormat.Pdf, pdfOption);

I tested it and now the test image does indeed have an alt-text and the native Windows screen reader is able to read the text. Thank you very much for your time and pretty fast implementation of the issue.

@Nobl,
Thank you for helping make Aspose.Slides better.