Pairing Text with Audio

There are numerous learning & development products that use audio as their primary means of transmitting information: lectures, webinars, eCourses and videos with voice-over narration. Choosing just the right text and other visual elements to pair with that audio is one of the most common challenges content developers face.

You want your visuals to drive engagement and reinforce retention, but far too often, they end up inadvertently distracting or overwhelming your audience, and when this occurs, the culprit is almost always too much text.

  • When the human brain encounters text, it naturally wants to read it
  • But if the brain tries to read text at the same time it's trying to listen to audio, it will likely experience a form of cognitive dissonance and fail to process either communication channel properly or fully
  • This dissonance is amplified if the audio and text are communicating different messages
  • Too much text can also tempt an audience into believing they can gather all the applicable information from the text, causing them to tune out the audio and miss key information that is only relayed through it

What to Pair with Audio

These cognitive tendencies lead us to a few questions:

  • What text should I pair with audio?
  • How much text should I pair with audio?
  • What visual elements, beyond text, should I pair with audio?

Mayer's 12 principles of multimedia offer a helpful set of guidelines for addressing these questions. How you apply them will vary depending on your project and content, but in general, the principles recommend:

  • People tend to stay more engaged when audio is paired with graphics rather than with text
    • And when audio and text are paired together, people tend to stay more engaged when a graphic is included as well
  • If text is paired with audio, try to minimize the text, capturing — and perhaps highlighting — the essentials, while excluding as much unnecessary text as possible
    • This "keep it simple/minimal" principle can apply to graphics as well: a single graphic may help drive retention and engagement, whereas too many graphics presented together may distract the audience
  • Use segmenting or sectioning to avoid displaying too much text at any given time
    • E.g., instead of a single minute-long slide or segment that captures two points, consider two 30 second-long slides or segments that each capture one of those two points

What About Review Purposes?

One of the most common arguments in favor of pairing audio with larger amounts of text that paraphrases the audio is that it helps facilitate review of the material:

  • When reviewing, many learners prefer to read rather than re-listen; it's faster and more easily allows one to skim and skip around
  • Text makes it easier to find specific topics or information during a review
    • In some cases, you can even use Search/Find functionality to locate specific words

While text may indeed be the better format for reviewing material, for the reasons stated above, large amounts of it shouldn't be included in audio-driven products for review purposes if it's going to conflict with engagement and retention.

Instead, take the text that would've gone in the audio driven-product and put it in a review document or web page that's shared with learners.

Creating two products instead of one — e.g., a text-light eCourse and a text-based reference guide, instead of just a text-heavy eCourse — might seem like it'd require more work, but:

  1. It often doesn't
  2. And when it does, the amount is usually negligible and/or justified by being able to provide learners with products that are better able to achieve their specific purposes; e.g.:
    • An eCourse that is 100% designed to maximize engagement and retention when learners first encounter the material
    • A reference guide that is 100% designed to facilitate review when learners revisit the material
    • Versus a text-heavy eCourse that is 50% designed for engagement/retention and 50% designed for review but, because those two purposes counter-act each other, isn't able to achieve either purpose optimally