Video & Audio Accessibility Compliance
nsuring video accessibility is essential for creating an inclusive digital experience. WCAG 2.1 AA establishes standards so that people with disabilities—including individuals who are Deaf or hard of hearing, blind or low vision, or who use assistive technologies—can fully understand and interact with video content. These requirements address both the audio and visual components of a video, ensuring that important information is not limited to a single sensory channel.
Together, accessible captions, audio descriptions, transcripts, safe visual design, and accessible video players support equitable access, meet legal compliance expectations, and improve usability for all viewers.
What is required for accessible video and audio?
Captions (required)
Captions provide access to audio content for users who are Deaf or hard of hearing, as well as for users who cannot use sound in their environment. All pre-recorded videos must include accurate, synchronized captions.
Captions must include:
- All spoken dialogue
- Speaker identification when not visually obvious
- Meaningful non-speech sounds (e.g., [music], [laughter], [applause], [phone ringing])
- Correct timing and synchronization with the audio
Auto-generated captions may be used as a starting point but must be reviewed and edited for accuracy. Auto-captions alone (including YouTube’s default captions) are not WCAG compliant.
Audio description (required when visuals add meaning)
Audio description is required when important information is presented visually and is not spoken aloud. Provide audio description if the video includes:
- On-screen text that is not read aloud
- Meaningful actions, gestures, or facial expressions
- Demonstrations or visual instructions
- Charts, graphs, or diagrams
If the video is primarily a "talking head" and the visuals do not add new meaning, audio description may not be necessary.
A written transcript (required for Level A; strongly recommended)
A transcript offers a flexible, readable alternative and supports screen reader users and users who prefer reading. A complete transcript includes:
- All spoken dialogue
- Speaker identification when unclear
- Descriptions of key visuals and actions
- Relevant non-speech sounds
Flashing and strobe effect (not allowed)
To prevent seizures, videos must not include flashing content that exceeds three flashes per second, unless it meets strict low-contrast thresholds.
*Note: Flashing content is not the same as motion. Animation or movement can be paused or stopped and may still be accessible, but unsafe flashing is not permitted.
An accessible video player
The video player itself must be accessible and support:
- Full keyboard navigation
- Visible focus indicators
- Play, pause, stop, and volume controls without a mouse
- No auto-play, or the ability to pause immediately
- Compatibility with assistive technologies such as screen readers
Accessible on-screen text
If text appears within the video:
- Maintain a minimum 4.5:1 color contrast ratio
- Ensure text is large enough and displayed long enough to read
- Avoid flashing, rapidly moving, or distracting text
Why auto-captions alone are not enough
While auto-captioning tools (such as YouTube’s) are helpful for efficiency, they are not sufficient for accessibility compliance. Auto-captions are often inaccurate, especially with technical terms, names, accents, or background noise, and they frequently omit meaningful non-speech sounds.
Auto-captions should be used as a starting point, then manually edited to meet accessibility standards.
YouTube captioning best practice
- Start with auto-captions
- Enable YouTube’s automatic captions and use them as a draft.
- Edit for accuracy
- Correct spelling, grammar, and punctuation
- Fix misheard words, names, and technical terms
- Synchronize timing with the audio
- Include non-speech elements. Examples:
- [music playing]
- [applause]
- [laughter]
- Identify speakers when not obvious, Example:
- (Professor Lee):
- Meet caption quality standards. Captions must be:
- Accurate — close to 100% correct.
- Synchronized — appear when words are spoken.
- Complete — include all spoken content and key sounds.
- Readable — use proper line breaks, no overlong sentences.
- Upload and verify
- Upload a reviewed .srt or .vtt file and confirm it is set as the default caption track.
- Test
- Watch the video with captions enabled and ensure readability and accuracy.
What you do not need to provide
You typically do not need to caption filler words such as “um,” “uh,” stutters, or repeated false starts. WCAG captioning focuses on meaning, not every sound. These fillers can be omitted to improve clarity and readability.
What should be captioned instead
- All meaningful spoken content
- Relevant non-speech sounds
- Speaker identification
- Tone or emotion when necessary for understanding
Rare exceptions
Include filler sounds only if they are meaningful to the content, such as:
- Speech or linguistic studies
- Therapy or training videos
- Intentional comedic or dramatic effect