Clarify meaning of 7.1.5 Spoken subtitles

7.1.5 reads as follows:

Where ICT displays video with synchronized audio, it shall have a mode of operation to provide a spoken output of the available captions, except where the content of the displayed captions is not programmatically determinable.

For a video player embedded in a web page, it is relatively easy to make sure that a screen reader will speak closed captions as they appear. For example, the closed captions can be placed in a container using the aria-live attribute - if the user has indicated that this is what they want. Se, for example, how AblePlayer can delegate rendering of WebVTT audio descriptions to screen readers.

That would help users who have a screen reader and, for example, understand the language used in subtitles but not the language of the main audio track. This seems to be the primary objective of clause 7.1.5, according to, for example, how IBM Accessibility describes the clause.

But would such a solution qualify as a "mode of operation"? Can a "mode of operation" require that the user has access to appropriate assistive technology such as a screen reader?

Other groups, such as persons with dyslexia, for example, may not know how to use a screen reader, and may be unable to visually follow some subtitles. They could benefit from spoken subtitles if such subtitles can be spoken without the help of a screen reader.

This could be implemented either by server-side TTS that offer an extra audio track that the user could activate via player controls, or TTS locally in the browser via for example the Web Speech API as, for example, how AblePlayer can render WebVTT audio descriptions using Speech API. Unfortunately, both of these solutions pose considerable technical challenges:

Server side TTS and extra audio tracks, while possible, require quite a technical infrastructure, that I have not seen on the web anywhere - only in audio book environments. As far as I know, very few (if any) online video platforms currently offer such functionality.
Web Speech Api is widely but not universally implemented by web browsers, and may not be available at all for non-web content.

In other words, the screen reader option is technically quite simple but not as useful as the more complex options.

(All of them share the challenge of synchronising the spoken subtitles with the video content, and balancing the sound used for subtitles with the main video soundtrack.)

Just as the comment about 7.1.5 Spoken subtitles from a TV perspective states, the clause is a great challenge to existing products and services. As far as I know, there are very few video players currently in use on the web that offer spoken subtitles in a convenient way. The only option in many cases is to use a screen reader, locate the position in the player where the subtitles are, and move the screen reader focus back and forth over this position in order to hear every new line of text that appears. (This is a far from ideal solution.)

But these challenges can be overcome, as illustrated by AblePlayer and a few others. A good first step towards getting useful solutions more widespread would be to clarify what, exactly, the clause actually requires.

Edited Sep 04, 2023 by Pär Lannerö