How Screen Readers Work – a11y examples

A screen reader is a software application that translates on-screen content into synthesized speech or braille output, enabling people who are blind or have severe visual impairments to use computers and smartphones. Screen readers don’t simply read text — they interpret the entire user interface, describing the type of each element, its current state, and its relationships to other elements.

Screen reader usage spans a wide demographic. While blind users are the most common group, screen readers are also used by people with low vision, cognitive disabilities, learning differences like dyslexia, and sighted developers who want to verify their accessibility implementations.

The most widely used screen readers on the web, according to the WebAIM Screen Reader User Survey (2024), are:

Screen Reader	Market Share
JAWS (Job Access With Speech)	~40.5%
NVDA (NonVisual Desktop Access)	~37.7%
VoiceOver (Apple, built-in)	~9.7%
Narrator (Microsoft, built-in)	~5.2%
TalkBack (Android, built-in)	~4.1%

JAWS is a commercial product from Freedom Scientific that has been the industry standard for decades, particularly in enterprise and government settings. NVDA is a free, open-source alternative that has grown significantly in market share. VoiceOver ships built-in with all Apple devices (macOS, iOS, iPadOS) and is the dominant screen reader on mobile. This site’s emulator models NVDA behavior as a representative middle ground — JAWS and NVDA behave similarly enough that learning one transfers to the other.

Browse Mode vs. Focus Mode

One of the most important concepts for developers to understand is that screen readers operate in at least two distinct modes, and the mode determines how keyboard input is interpreted.

Browse Mode (also called Reading Mode or Virtual Cursor Mode)

When a screen reader user arrives at a web page, they are typically in browse mode. In this mode, the screen reader creates an internal representation of the page — called the virtual buffer — and the user navigates through it using a special cursor that the screen reader controls, not the browser.

In browse mode, single letter keypresses are intercepted by the screen reader and used as navigation shortcuts. Pressing H moves to the next heading. Pressing B moves to the next button. Pressing F moves to the next form field. These keystrokes never reach the web page itself — the screen reader consumes them.

This has a critical implication: keyboard shortcut implementations using single letters (like many JavaScript keyboard handlers) will be silently swallowed by the screen reader in browse mode. Only shortcuts using modifier keys like Control, Alt, or Insert will work in both browse and focus modes.

Focus Mode (also called Forms Mode or Application Mode)

When a user lands on an interactive element — a text field, a <select>, a widget with role="application" — the screen reader typically switches to focus mode automatically (and usually plays an audible chime or says “forms mode”). In focus mode, all keystrokes are passed through to the browser and the focused element, so the user can type text or use arrow keys to select options.

Mode switching also happens in reverse: pressing Escape from a text field typically returns to browse mode.

Why This Matters for Developers

Never rely on single-key shortcuts for functionality in web applications
Custom widgets that implement keyboard navigation (like carousels, date pickers, or tree views) need role="application" or a related landmark to force focus mode, or users won’t be able to interact with them
ARIA live regions work in both modes — they announce updates regardless of what the user is doing
Rotor and element lists give users an overview of the page structure without needing to read everything

How Users Actually Navigate

Most developers assume that screen reader users read pages top-to-bottom, word by word, like listening to a long audiobook. In reality, this is the exception, not the rule.

According to the WebAIM Screen Reader User Survey, 71.6% of screen reader users say their first action when arriving at an unfamiliar web page is to navigate by headings. The heading structure functions as a table of contents: users jump to the section that sounds relevant and begin reading from there.

The most common navigation strategies, in rough order of frequency:

Headings — Used to scan page structure and jump to relevant sections. This is why proper heading hierarchy (h1 → h2 → h3, not skipping levels) matters so much.
Landmarks — HTML5 landmark elements (<main>, <nav>, <header>, <footer>, <aside>, <form>) and their ARIA equivalents create named regions. Users can pull up a list of all landmarks and jump directly to any of them. A page without landmarks forces users to read from the top every time.
Links — Users frequently pull up a list of all links on a page to find what they’re looking for. Link text that says “click here” or “read more” is useless in this context — every link must be distinguishable by its text alone.
Form fields — On pages with forms, users navigate directly to form controls.
Tab key — The Tab key only moves between focusable interactive elements (links, buttons, inputs, etc.). It does not move through all content. Many developers believe that testing keyboard accessibility means only pressing Tab — this misses the majority of how screen reader users actually navigate.
Linear reading — Used on short content or when skimming hasn’t found what the user needs. Arrow keys move through the virtual buffer element by element.

The Accessibility Tree

Web browsers maintain two parallel representations of a page: the DOM tree (the full HTML structure) and the accessibility tree (a filtered, semantically enriched version built from the DOM).

The accessibility tree is what screen readers actually read. Browsers construct it by:

Taking the DOM
Filtering out elements that are not semantically meaningful (like generic <div> and <span> elements with no role or accessible name)
Adding computed properties for each node: role, name, state, description, and relationships

Semantic HTML vs. Div Soup

When you write <button>Save</button>, the accessibility tree node for this element has:

Role: button
Name: “Save”
State: (focusable, not pressed, not disabled)

When you write <div class="btn" onclick="save()">Save</div>, the accessibility tree node has:

Role: (none — generic container)
Name: (none)

The screen reader announces the button as “Save button” and the user knows they can press Enter or Space to activate it. The <div> is announced as nothing meaningful — or ignored entirely — and the user has no idea it’s interactive.

What CSS Cannot Do

A common mistake is thinking that styling conveys meaning. If an element is visually styled to look like a heading, it is not a heading unless it uses <h1>–<h6> or role="heading" with aria-level. If text is styled red to indicate an error, screen reader users won’t know it’s an error unless there’s also a text label or ARIA state.

The accessibility tree is built from HTML semantics and ARIA attributes, not CSS. Visual presentation is invisible to assistive technology.

Inspecting the Accessibility Tree

Modern browsers let you inspect the accessibility tree directly:

Chrome DevTools: Elements panel → Accessibility tab
Firefox DevTools: Accessibility panel (Inspector → Accessibility)
Safari Web Inspector: Accessibility tab in the Node panel

Use these tools to verify what screen readers will actually see when they encounter your components.

Accessible Name Computation

Every interactive element and many non-interactive elements need an accessible name — the text that a screen reader announces to identify the element. How browsers compute this name follows a specific algorithm (the Accessible Name and Description Computation, or ACCN spec), with a defined priority order:

aria-labelledby — References the text content of one or more other elements by ID. Takes highest priority because it explicitly points to labeling content that may already be visible on screen.
aria-label — A direct string attribute on the element. Used when there’s no visible label text, such as icon-only buttons.
Native HTML <label> — For form controls, a <label> element associated via for/id or wrapping the control.
alt attribute — For <img> elements.
Element text content — For buttons, links, and headings, the visible text content becomes the accessible name.
title attribute — Fallback of last resort. Creates a tooltip visually but is unreliable for accessible names across screen readers. Avoid relying on it.
placeholder attribute — Also a poor fallback. Disappears when the user types, so it’s not a substitute for a real label.

Practical Examples


<!-- Name from aria-labelledby (highest priority) -->
<h2 id="section-title">User Profile</h2>
<button aria-labelledby="section-title">Edit</button>
<!-- Announced as: "Edit User Profile, button" -->
 
<!-- Name from aria-label -->
<button aria-label="Close dialog">✕</button>
<!-- Announced as: "Close dialog, button" -->
 
<!-- Name from label element -->
<label for="email">Email address</label>
<input type="email" id="email">
<!-- Announced as: "Email address, edit text" -->
 
<!-- Name from text content -->
<button>Submit form</button>
<!-- Announced as: "Submit form, button" -->
 
<!-- No accessible name — BAD -->
<button><img src="search-icon.png"></button>
<!-- Announced as: "button" — user has no idea what it does -->

Common Misconceptions

”Tab key navigates all content” — FALSE

The Tab key only moves focus between interactive elements: links, buttons, form inputs, and elements with tabindex. It skips all non-interactive content like headings, paragraphs, and images. Screen reader users access non-interactive content through browse mode navigation (arrow keys and letter shortcuts), not Tab.

This is why “keyboard accessibility” testing that only uses Tab is insufficient — it never exercises the majority of how screen reader users actually move around a page.

Most blind users don’t use a mouse, but “screen reader user” doesn’t exclusively mean blind. Some users with motor impairments use both screen readers and pointing devices. Some low-vision users combine magnification software with a screen reader and still use a mouse to orient themselves spatially.

For web accessibility purposes, the safe assumption is: any functionality available via mouse must also be available via keyboard. If you build an interaction that requires hover, drag, or precision pointing, you need a keyboard-accessible equivalent.

”ARIA fixes everything” — NO

ARIA (Accessible Rich Internet Applications) attributes can add and modify semantics, but they cannot compensate for poor fundamental structure. Common mistakes:

Adding role="button" to a <div> makes the screen reader announce it as a button, but doesn’t automatically make it focusable or respond to keyboard events — you still need tabindex="0" and keyboard event handlers.
Adding aria-label to an image doesn’t make a broken layout navigable.
The first rule of ARIA is: if a native HTML element or attribute exists that does what you need, use it instead of ARIA.

”If it works with a keyboard, it’s accessible” — NECESSARY BUT NOT SUFFICIENT

Keyboard accessibility is a prerequisite, not the definition of accessibility. A page can be fully keyboard-operable and still fail for screen reader users if:

Interactive elements have no accessible names
Dynamic content changes aren’t announced via live regions
Focus order doesn’t match visual order and creates confusion
Error messages appear visually but aren’t programmatically associated with their inputs
Custom widgets don’t communicate state changes (expanded/collapsed, selected, checked)

Deaf vs. Blind Accessibility

Accessibility is not monolithic. The needs of users who are Deaf or hard of hearing are substantially different from the needs of users who are blind or have low vision, and both differ from users with motor impairments or cognitive disabilities.

Users who are Deaf primarily need:

Captions and transcripts for audio and video content
Visual alternatives to audio alerts and notifications
Clear, simple language (many Deaf users have English as a second language, with American Sign Language as their first)

Users who are blind primarily need:

Screen reader-compatible markup (the focus of this site)
Keyboard accessibility
Text alternatives for images and non-text content
Logical, navigable document structure

DeafBlind users face unique challenges — they typically use screen readers combined with refreshable braille displays, hardware devices that render text as raised braille dots. Braille displays present content in short lines (40–80 cells typically) and lack the ability to convey spatial layout. This means verbose ARIA labels and redundant visual cues are especially helpful for braille users, and animations or purely spatial UI patterns (like dragging items to reorder) need text-based alternatives.

This site focuses on the blind/low-vision screen reader experience, since that’s the set of requirements most directly tied to HTML semantics and ARIA. The patterns demonstrated here also benefit keyboard-only users, users with cognitive disabilities who benefit from clear structure, and users on slow connections who may rely on text content when images fail to load.

For comprehensive accessibility coverage, always consult the Web Content Accessibility Guidelines (WCAG) and consider testing with representative users from different disability groups.

What is a Screen Reader?