Susanne Meyer
5 min readFeb 8, 2021

--

The Art of Audio Description

Recently, I was tasked with explaining the relatively novel assistive technology of Audio Description to a group of people. My “students” were experienced professionals in the video production and marketing fields, and well versed in the tools of their trades, but they had not yet come across the need to integrate audio description into any of the videos they were producing. The concept of Audio Description is simply: the idea is to add an audio to track to a video — any video — that describes what is happening on the screen so that people who are blind or low vision can follow the action. This differs from closed captioning or subtitles: audio descriptions do not track the dialogue, but rather visual happenings on the screen. It’s a game changer for blind people and their friends and families — audio description means that people who are blind can now go the the cinema or watch Netflix Original series (all of which now include audio description) without running the risk of arousing the wrath of their fellow movie goers for continuous whispering. Both the person who is blind and their companion can now focus on the film rather than on filling in the gaps left by what can be seen but not heard.

Although the concept is simple, audio description is an art. The video doesn’t stop. Writers have to pace audio descriptions so as to fit naturally into the pauses in dialogue and other relevant sounds that accompany screen action. Audio descriptions cannot interfere with or obscure the video’s soundtrack, or be disruptive in any other way. At the same time, they have to convey *all* the relevant visual information that is not otherwise conveyed by sound — but they also cannot convey *too much* information because verbosity would be distracting and dilute the blind viewer’s experience with useless chatter. Seamlessly integrating audio descriptions into a pre-existing audio tracks is technically challenging. But deciding on the content of the descriptions is even more difficult, and calls for both restraint and comprehensiveness. In effect, the writer has to think like a sighted person and a blind person at the same time: as a sighted person, the writer has to decide what, within the visual tapestry of the film, conveys relevant information in addition to the soundtrack. This is not always obvious, and usually a judgment call — for example, does the messy state of a room on screen convey something that is supplementary to what is communicated in the dialogue and audio track, or is it merely a visual representation of what can be heard? In essence, the writer must distinguish between information and artistic visualization, which is not always clear-cut. From the (assumed) perspective of the blind viewer, the writer must make decisions about what type of information would likely be of interest to the average blind viewer. The problem is, of course, that there is no such average viewer, but only a crowd of individual viewers, all with their own preferences. Audio descriptions should contain information to satisfy all of them, but not bore any of them. Writing them is truly an art.

In the course of our conversation, our group discussed several different video formats, and whether or not they required audio descriptions. In most cases, the basic purpose of audio descriptions was enough to suggest an answer, but one type of recording raised questions: that of a “talking head” style video in which a person is shown on screen while they talk, and their speech constitutes the soundtrack. Unless the background behind the speaker in some way contributes additional information to what is said, audio description is not customary in such cases — that is, the viewer who is blind or low vision is not told about the background image or the appearance of the speaker. Not only is it customary to omit audio descriptions in “talking head” videos, it can be argued that doing so would in fact be potentially problematic: when writers attempt to convey impressions rather than just information, they inherently run the risk of tinging the description with their own biases. This is particularly true when describing people. But the point of audio description is to convey the action on the screen, and not a particular writer’s impression of a speaker.

And yet, the idea of not audio describing the background behind a speaker and — more importantly — the appearance of that speaker, struck many participants in the conversation as unsatisfactory. Without the audio description, they argued, blind viewers miss part of the video: they don’t miss any factual information, but they *do* miss something else of significance: the visual context. They are not wrong, of course. This is, in fact, just what it *means* to be blind: to be unable to form first-hand visual impressions of people and places. For people who can’t see, this is not just the case when it comes to video recordings. It is the case in everyday life, too. The realization makes sighted people uncomfortable, and they want to fix it. But they can’t.

For some participants in the discussion, I think that this small insight represented a moment of understanding of the true implications that vision loss has on a person’s life. These tiny epiphanies are a good thing because they foster mutual understanding and empathy. But they should not be the end of the story. What the sighted people who were uncomfortable with the lack of audio description for “talking head” style videos must remember is that people who can’t see are masters at filling in the gaps left by their lack of vision — they do it all the time. They might ask a friend whose biases they trust to describe the visual context, or they might rely on cues from their other senses to paint a richer picture of the context. They might be particularly astute at extrapolating contextual clues from the audio track that go unnoticed by those who are focused on the visual track. A lack of visual context does not mean a lack of context altogether. It means that this context needs to be constructed in a different way, and that one person’s context often varies from another person’s context. But that is not unique to the case of videos — it is true in everyday life, too.

--

--

Susanne Meyer

Philosopher, reader, disability inclusion advocate. Unpeeling the layers of meaning..