Level 5 — Advanced (CEFR: C1)

Unit 19 — Simultaneous Interpretation: Introduction

Lesson 2 — The Ear-Voice Span


Lesson Overview

Level: 5 — Advanced Unit: 19 — Simultaneous Interpretation: Introduction Lesson: 2 of 6 Estimated Time: 90 minutes

What this lesson covers:

  • What the ear-voice span (EVS) is and why it is the central variable in simultaneous interpretation
  • The 2–5 second range and what determines where you fall in that range
  • What happens when the EVS is too short vs. too long
  • The delayed shadowing protocol: building a controlled EVS
  • The transition from Spanish-to-Spanish to Spanish-to-English
  • EVS calibration by speaker type and content density
  • Common EVS failures and how to correct them
  • The bridge to full simultaneous interpretation

What the Ear-Voice Span Is

From the curriculum:

The ear-voice span (EVS) is the delay between when you hear something and when you say it in the other language. In simultaneous interpretation, this delay is typically 2–5 seconds. Too short and you make errors; too long and you fall behind.

The EVS is the gap between the speaker’s mouth and the interpreter’s mouth. It is the buffer — the amount of source language the interpreter is holding in working memory while producing earlier content in the target language.

Example:

A speaker says: Dios no te ha abandonado — aunque en este momento no lo sientas, Él está contigo.

The interpreter begins producing English approximately 3 seconds after the first word. By the time the interpreter says “God has not abandoned you,” the speaker has already moved on to “although right now you may not feel it.” The interpreter is always processing the phrase that came 3 seconds before while simultaneously receiving the phrase currently being spoken.

The EVS is not fixed. It fluctuates during interpretation based on:

  • Content density (more information = longer EVS needed)
  • Speaker pace (faster speaker = shorter available EVS)
  • Vocabulary difficulty (unfamiliar terms = processing slows = EVS extends)
  • Sentence structure (long dependent clauses = interpreter must hold more in working memory)

The trained interpreter manages the EVS — neither too short nor too long.


Too Short: The Premature Start Problem

When the EVS is too short (less than 2 seconds), the interpreter begins producing English before enough of the Spanish has been received to extract reliable meaning.

What happens:

  • The interpreter starts a sentence without knowing how it ends in Spanish
  • The sentence ends differently than expected; the interpreter must self-correct mid-sentence
  • Or the interpreter commits to an incorrect reading and delivers a distorted rendering
  • Exact-content items (numbers, names) are especially vulnerable — the interpreter starts the English sentence before the number has arrived and then must insert it awkwardly

Example of EVS-too-short failure:

Spanish: Dios nos promete que en todo cooperan para bien los que a Él aman. (Translation: God promises us that all things work together for good for those who love Him.)

EVS-too-short rendering: “God promises us that in everything… that all things… work together for good for those who love Him.”

The interpreter committed to “that in everything” before hearing “cooperan para bien” — then had to restart.

The fix: deliberately extend the EVS by training the working memory buffer. This is the purpose of the delayed shadowing drill.


Too Long: The Falling Behind Problem

When the EVS is too long (more than 5–6 seconds), the interpreter is holding too much content in working memory simultaneously.

What happens:

  • Content from the beginning of the buffer begins to degrade — key details become less reliable
  • The interpreter falls further and further behind the speaker
  • If the speaker does not slow down, the gap grows until the interpreter loses coherent tracking entirely
  • The audience experiences a confusing lag — the English they are hearing refers to something the speaker said many seconds ago

Example of EVS-too-long failure:

At the 6-second mark, the interpreter is still producing the content from 6 seconds ago. The speaker has introduced two new ideas in those 6 seconds. By the time the interpreter reaches the current moment, the earlier content has partially degraded and the new content has not yet been processed.

The fix: manage the EVS actively. When the buffer is growing too large, the interpreter may compress content (rendering with less detail, accepting a slight accuracy cost) to reduce the EVS before it becomes critical.


The Target Range: 2–5 Seconds

The curriculum target is a 2–5 second EVS. More precisely:

  • 2–3 seconds: ideal for simple, predictable content (prayer, familiar Bible passages, formulaic announcements)
  • 3–4 seconds: standard for moderate-density preaching
  • 4–5 seconds: appropriate for dense theological exposition where the interpreter needs more structure before committing to English output

The interpreter learns to calibrate — shortening the EVS for simple content and extending it for complex content — without allowing either extreme to occur.


The Delayed Shadowing Protocol

From the curriculum:

EVS training drill: Listen to a Spanish speaker. Begin shadowing in Spanish with a deliberate 3-second delay (not the simultaneous echo of basic shadowing, but a true delayed repetition). Build the habit of holding incoming content in working memory while producing earlier content. When this becomes comfortable, switch from Spanish-to-Spanish delayed shadowing to Spanish-to-English simultaneous interpretation.

Phase 1: Spanish-to-Spanish delayed shadowing

Setup: Use a recorded Spanish speaker. Start the recording. Count silently to 3. Then begin repeating in Spanish what the speaker said 3 seconds ago.

The dual challenge:

  • You are listening to the speaker at the current moment
  • You are producing in Spanish what was said 3 seconds ago
  • These two streams of Spanish content must not bleed into each other

Success criteria: You are always 3 seconds behind the speaker, accurately reproducing the Spanish, with no mixing of current and buffered content.

Common failure: the current input intrudes on the buffered output. The interpreter begins producing the current Spanish (not the buffered one) — losing the deliberate delay. When this happens, stop, reset to silence, count to 3, and restart.

Building the buffer: Start with simple, slow Spanish (a children’s Bible lesson, a slowly-delivered devotional). As the 3-second buffer becomes comfortable, move to normal preaching pace.

Duration target: sustain 3-second delayed Spanish shadowing for 3 minutes without collapse. This is a working memory workout. It will feel mentally fatiguing — that is appropriate.

Phase 2: Transition to English output

When Phase 1 is stable (3 minutes clean at normal pace), switch the output language.

The same protocol: listen to Spanish, hold 3 seconds in the buffer, produce — but now produce the English equivalent rather than the Spanish.

What changes: the output now requires meaning extraction and language switching simultaneously. This is the full simultaneous interpretation task.

Early sessions: allow the EVS to extend naturally — if 4 or 5 seconds feels more comfortable, take it. Do not force a 3-second EVS when meaning extraction is still slow.

Progress markers:

  • Week 1: Phase 1 stable for 3 minutes at slow pace
  • Week 2: Phase 1 stable for 3 minutes at natural pace
  • Week 3: Phase 2 sustained for 1 minute at slow pace
  • Week 4: Phase 2 sustained for 2 minutes at moderate pace
  • Week 5: Phase 2 sustained for 2 minutes at natural preaching pace

EVS Calibration by Speaker Type

Different speakers require different EVS calibration.

Slow, structured speakers

A preacher who speaks slowly and uses clear discourse structure (numbered points, transitional phrases) allows the interpreter to work with a short EVS — content arrives at a pace that does not overload the buffer.

EVS target: 2–3 seconds

Fast speakers

A preacher who speaks rapidly compresses available processing time. The interpreter cannot extend the EVS much (or they fall behind permanently) but cannot shorten it much either (or they make errors). The interpreter must compress content — accepting some detail loss to maintain tracking.

EVS target: 2–3 seconds (forced by pace); manage by accepting compression

Dense theological exposition

A theological lecture with sustained complex argument requires more processing time per unit of content. The interpreter needs the full structure of a sentence before committing to the English.

EVS target: 4–5 seconds

Emotionally charged speech (altar calls, testimony climax)

Emotionally charged speech is often simple in vocabulary but complex in prosodic meaning. The interpreter must track emotional register, not just propositional content.

EVS target: 2–3 seconds (the content is often simpler; the prosody carries the weight)


EVS Failure Modes and Recovery

Failure mode 1: Buffer overflow

The EVS has extended too far. The buffer is too full. Content at the beginning of the buffer is degrading.

Recovery: compress the oldest content in the buffer. Accept a less detailed rendering of what has been waiting longest, in order to reduce the buffer size and return to a sustainable EVS.

Failure mode 2: Content bleed

Current input has intruded on buffered output. The interpreter is no longer sure what came when.

Recovery: pause very briefly (1–2 seconds of silence), reset the buffer, and resume with what is currently being said — accepting a gap in the recording.

Failure mode 3: Frozen output

The interpreter stops producing English because meaning extraction has failed (an unknown word, an unexpected turn).

Recovery: continue to say something — even a general English phrase that is probably correct — to keep the output stream moving. Do not freeze in silence for more than 1–2 seconds. A slightly imprecise output is better than silence.

Failure mode 4: Correct but laggy

The interpretation is accurate but has drifted to a 7-8 second EVS. The English is now significantly behind the Spanish.

Recovery: when the speaker hits a natural pause (breath, rhetorical pause, congregational response), catch up aggressively — render the buffered content quickly and compress where possible until the EVS is back in range.


Practice Exercises

Exercise 1 — EVS Measurement

Use a recording tool to establish your current natural EVS. Play a Spanish sermon recording. Begin simultaneous interpretation into English. A partner marks on paper every time the speaker says a distinct keyword (every 5–10 seconds). Mark the corresponding moment in your English output where that keyword’s meaning appears. The average gap is your natural EVS.

Goal: understand your baseline. If your natural EVS is consistently above 5 seconds, the buffer training is the priority.

Exercise 2 — Three-Second Buffer Training (Phase 1)

Sustained 3-second delayed Spanish shadowing:

  1. Choose a 5-minute Spanish sermon recording.
  2. Count to 3 in silence before beginning.
  3. Reproduce in Spanish everything said 3 seconds ago.
  4. Sustain for 3 minutes.
  5. Log: How many times did current input intrude? How many seconds elapsed before the first intrusion?

Repeat daily until 3 minutes is clean.

Exercise 3 — EVS Calibration Drill

Run simultaneous interpretation on three different speaker types back-to-back:

  • 2 minutes: slow, structured speaker
  • 2 minutes: fast speaker
  • 2 minutes: dense theological exposition

After each segment, estimate your average EVS. Did it shift appropriately with the speaker type? Which type was hardest to calibrate?

Exercise 4 — Recovery Drills

Run the three recovery scenarios deliberately:

  1. Intentionally allow the EVS to grow to 7+ seconds, then practice aggressive catch-up compression.
  2. Introduce a deliberate pause in your output for 2 seconds, then practice resetting and re-entering the stream from the current moment.
  3. Encounter an unknown vocabulary word (have a partner introduce an unfamiliar term mid-passage). Practice producing something rather than freezing.

Key Takeaways for This Lesson

Before moving to Lesson 3:

  • The EVS is the buffer between speaker and interpreter — typically 2–5 seconds in effective simultaneous interpretation
  • Too short: premature commitment to English output before enough Spanish is received → errors and self-corrections
  • Too long: buffer overflow, content degradation, falling behind the speaker
  • The delayed shadowing protocol is the primary training drill: Spanish → 3-second buffer → Spanish output, then switch to English output
  • EVS calibration varies by speaker type: 2–3 seconds for simple/slow; 4–5 seconds for dense exposition
  • Recovery modes: compress to reduce EVS; brief pause to reset; produce something to avoid frozen output

Daily Practice

10 minutes of Phase 1 delayed shadowing daily. When Phase 1 is clean for 3 consecutive sessions, add 5 minutes of Phase 2 (English output) to each session. Log your EVS estimate after each session — tracking improvement over the unit.