Gemini 3.5 Live Translate: Features, Limits, How It Works (2026)

Name: Gemini 3.5 Live Translate
Author: Google

Diverse professionals on a multilingual video call, with translated speech flowing continuously between speakers.

Quick Answer: How do you use Gemini 3.5 Live Translate?

Gemini 3.5 Live Translate is available through three surfaces as of June 2026: (1) the Google Translate app on Android and iOS — open the app, tap Conversation, and 70+ languages auto-detect; (2) Google AI Studio and the Gemini Live API for developers to build custom apps; (3) Google Meet in private preview for select Workspace customers — replacing the previous 5-language limit with 70+ language coverage and 2,000+ language-pair combinations in a single meeting. It streams speech-to-speech end-to-end and preserves the original speaker's intonation and pacing, which older cascade STT→translate→TTS stacks lose. No public web UI at Google.com yet.

LiveLingo

Tap and speak in English

Tap to start

1. What Gemini 3.5 Live Translate Is

Gemini 3.5 Live Translate is a streaming speech-to-speech translation model that Google announced on June 9, 2026. Two characteristics set it apart from earlier translation products.

First, it is audio-to-audio rather than the older speech-to-text-to-translation-to-text-to-speech pipeline. The model accepts streamed source audio in 100-millisecond chunks and produces translated speech as output. Text transcripts are available, but only as a sidecar of the spoken output — there is no streaming text mode and no speaker attribution in the translated audio.

Second, the generated voice is designed to preserve speaker prosody. Google's announcement describes output that retains the speaker's intonation, pacing, and pitch. In practice this produces a translated voice that sounds substantially more natural than a generic text-to-speech engine reading a translation aloud — a real advantage over speech-translation systems whose audio output runs through a standard TTS layer.

The model is built on Gemini 3 Pro. According to the Gemini 3.5 Audio model card published by Google DeepMind, it accepts audio input with up to a 128K-token context window and produces audio + text output up to 64K tokens. It auto-detects over 70 languages, including rapid language switches between speakers, though that detection has documented weaknesses (covered in Section 4).

The launch covers three product surfaces in parallel: developer access via the Gemini Live API and Google AI Studio (public preview from June 9, 2026); consumer access through the Google Translate app on Android and iOS, rolling out globally starting that day, with a new "listening mode" on Android; and enterprise access through Google Meet in private preview for select Google Workspace customers, where it expands Meet's translation coverage from 5 languages to 70+ and supports over 2,000 source/target combinations within a single meeting (per Google's launch announcement).

2. How It Works: Audio-to-Audio Architecture and Prosody Preservation

Three architectural choices distinguish Gemini 3.5 Live Translate from earlier streaming-translation systems.

Speech-to-speech, not speech-to-text-to-speech

Traditional pipelines run audio through a streaming speech-to-text model, feed the transcript to a machine-translation model, then synthesize the translation through a separate text-to-speech model. Each stage adds latency and accumulates errors. Gemini 3.5 Live Translate folds these steps into one audio model. The trade-off: the output is permanent audio, not editable text — once a word is spoken, it cannot be revised mid-utterance.

Continuous streaming, not turn-based

Google's announcement frames the model as one that "balances the trade-off between waiting for context to improve quality and translating immediately to stay in sync with the speaker." Earlier consumer products like Google Translate's previous Conversation mode were turn-based: tap, speak, wait for the system to finalize and emit the translation, then let the other party tap. Gemini 3.5 Live Translate emits translated speech continuously while the source speaker is still talking, with Google describing a lag of "a few seconds."

Prosody transfer

The model is designed to carry the source speaker's vocal characteristics — intonation, pacing, emphasis, pitch — into the translated audio. This is the main technical reason the output sounds natural rather than robotic. It is also the source of the voice-consistency limitations Google's model card discloses (Section 4).

On the developer surface, each session uses raw 16-bit PCM audio at 16 kHz mono as input and produces 24 kHz mono PCM audio as output, sent in 100-millisecond chunks (per MarkTechPost's launch coverage of Google's developer documentation; see Google AI Studio for the canonical reference). All generated audio carries Google's SynthID watermark — an imperceptible signature woven into the waveform that allows downstream systems to identify the audio as machine-generated.

Smartphone displaying a streaming voice translation interface with audio waveforms and language selection.

3. Where Gemini 3.5 Live Translate Is Strongest

Five product strengths show up immediately when comparing Gemini 3.5 Live Translate to its peers.

Natural-sounding translated speech. The prosody-preserving voice is the clearest advantage over speech-translation systems whose audio output goes through a generic TTS engine. If you have used a voice-translation app whose translated audio sounds like a flat narrator reading a string of words, the contrast is immediate. Gemini 3.5 Live Translate is materially better here, and the difference is audible on the first sentence.

Audio-to-audio simplicity. Building a speech-translation application has traditionally meant chaining a streaming STT model (Whisper-large, Google Cloud Speech-to-Text, Azure Speech), a translation model, and a TTS engine — and managing the partial-emit semantics of each. Gemini 3.5 Live Translate replaces that chain with one API call, simplifying both the application code and the failure surface.

Auto language detection at scale. 70+ languages auto-detected, with no need for the user to set a language pair in advance. Google's positioning emphasizes use cases like multi-party meetings where speakers switch languages mid-conversation.

Distribution. Built directly into the Google Translate consumer app and Google Meet. For end users, the install and discovery cost is near zero — they already have the app. For Meet customers, translation arrives as a feature toggle inside a workflow that is already in use.

Watermarked output. SynthID watermarking makes the generated speech identifiable as AI-generated for downstream compliance use cases, which is useful in regulated industries that need to track AI-generated content.

4. What Google's Own Model Card Admits as Limitations

The Gemini 3.5 Audio model card published by Google DeepMind documents specific known limitations of Gemini 3.5 Live Translate. Quoting the card directly:

Language detection

"Language detection can struggle with non-native accents, similar languages, or rapid language switches." Practical implication: if a speaker has a strong accent, or the source language is close to a related language (Portuguese vs. Spanish, Norwegian vs. Swedish), or the conversation switches languages quickly, the detector may pick the wrong source language and translate accordingly.

Voice consistency in multi-speaker sessions

"Voices can be inconsistent, and voices may shift after long pauses, change gender, or get stuck on one voice during rapid multi-speaker sessions." This is the most practically significant limitation for many use cases. In a meeting with several speakers taking rapid turns, the model may produce all translated output in one voice — losing the speaker attribution that listeners rely on to follow the conversation.

Noise filtering

"Designed to filter out background noise, but not all background audio may be ignored." Real-world environments will still leak through under some conditions.

Translation-mode constraints (developer API)

Per MarkTechPost's launch coverage, which cites Google's developer documentation: "text input is not supported in translation mode" and the model "drops tool use and system instructions in this mode." For developers, the translation API call is a constrained surface — you cannot send text, you cannot use the broader Gemini tool ecosystem, and you cannot inject system prompts. Translation in, translation out.

5. Independent Measurements From the LiveLingo 2026 Benchmark

What we measured (and what we did not)

The numbers below are for the raw Gemini Live API endpoint, accessed programmatically with the same energy-VAD utterance boundaries applied uniformly to every API-tier system in the LiveLingo benchmark. We did not measure the Google Translate consumer app or Google Meet integration separately. Both are built on the same Gemini 3.5 Live Translate model but the consumer / Meet surfaces add their own client-side VAD, conversation state, UI rendering, and may apply server-side smoothing we have no programmatic access to. A Google Translate user or a Meet participant may see different perceived latency, code-switching behavior, and voice consistency than the API-tier numbers report. Where this section cites specific behaviors (multi-speaker drift, code-switch silence), treat them as the developer-experience floor on the Live API endpoint, not the consumer ceiling.

Reproducibility

Every number in this section reproduces from the same three 120-second VOA public-domain audio clips, the same Gemini Live API endpoint, and the same Python harness used for the original four-system benchmark. The audio (audio.zip), raw per-utterance JSON (gemini-live-results.json), and methodology are published at livelingo.io/research/benchmark-2026.

Conflict caveat: LiveLingo Research evaluated a direct commercial competitor on the day Google released it. We have a financial interest in the comparison's framing. Treat this section as one data point alongside Google's own announcement and third-party launch coverage; do not treat it as the definitive third-party benchmark.

With those scopes in mind: LiveLingo Research evaluated Gemini 3.5 Live Translate on its launch day (June 9, 2026) against the same protocol used for the original benchmark of Google Cloud STT v2 + Translation v3, Azure Speech Translation, and Whisper-large + GPT-4o-mini. The full addendum (including the later June 10, 2026 OpenAI gpt-realtime-translate addendum) is published at livelingo.io/research/benchmark-2026#comprehension-gemini-live; the headline numbers are below.

Comprehension fidelity composite: 4.93 / 5 across 120 utterances and four language pairs (en→es, en→zh-CN, en→ja, en→de). This is the strongest result among the four competing systems on the original benchmark.

First-audio latency: median 2,947 ms from start of speech to first translated audio (p10–p90: 2,859–3,104 ms). This is a constant ~3-second speaking delay, consistent with Google's "a few seconds behind" framing.

Six-system comparison on the LiveLingo 2026 benchmark — 120 utterances, four language pairs, 2-judge composite (GPT-4o + Gemini 2.5 Flash). The latency column reports a single apples-to-apples metric: time from speaker end-of-utterance to translation arrival (committed transcript for streaming-text systems; spoken-translation arrival for audio-to-audio systems). The separate "speed to first translated output" metric — different baseline, starts at speech onset — is in the column to its right for audio-to-audio systems only. See methodology at /research/benchmark-2026.
System	Comprehension (0–5)	Utterance-end → translation arrival	Speed to first output (audio-to-audio only)	Output surface
LiveLingo	4.96	1,518 ms	—	Streaming text + audio
Gemini 3.5 Live Translate	4.93	~3,100 ms (drifts up to 13.9 s)	2,947 ms	Audio (text sidecar)
Google Cloud STT v2 + Translate v3	4.77	~26,736 ms	—	Transcript
Azure Speech Translation	4.65	~4,755 ms	—	Transcript
Whisper + GPT-4o-mini (DIY)	4.63	2,720 ms	—	Transcript
OpenAI gpt-realtime-translate	4.53	~3,800 ms (drifts up to 20.3 s)	711 ms	Audio + transcript

Output is translated speech only. The API has no streaming text mode and no per-speaker attribution. Text transcripts are available as a sidecar to the spoken output. Spoken output cannot be revised after it is emitted.

Code-switched audio. On a Mandarin news clip that switches to English street interviews at 86 seconds, the LiveLingo benchmark recorded that translation output stops at the switch in every run: speech already in the output language is neither translated nor transcribed, so the final 34 seconds of content (~28% of the clip) silently disappear for the listener with no error surfaced. OpenAI's gpt-realtime-translate shows the same behavior on the same clip, and OpenAI documents skipping output-language speech as intended; this is a structural limit of current speech-to-speech translators on mixed-language audio.

Factual inversion on late-resolving syntax. On a Mandarin business-speech clip, a sentence describing a 15% sales increase rendered in English as a goal to increase sales by 15%. This is the error class that irreversible mid-sentence audio commitment produces when the source language postpones the meaning-carrying element (the polarity, the time reference, the subject) until late in the sentence.

These are independent measurements, not Google's own numbers; methodology and raw per-utterance data are in the published addendum.

6. How to Access Gemini 3.5 Live Translate

Consumer — Google Translate app

Update the Google Translate app to its latest version on Android or iOS. Live Translate mode is rolling out globally starting June 9, 2026 — availability depends on the store rollout schedule in your region. On Android, a new "listening mode" lets you hear translated speech directly through your device's earpiece.

Developer — Gemini Live API + Google AI Studio

The model is available in public preview through the Gemini Live API and through Google AI Studio. Per the launch coverage, the integration constraints are specific: audio input only (no text input in translation mode), no tool use or system instructions, raw 16-bit PCM 16 kHz mono input chunked at 100 ms, 24 kHz PCM output. Current quotas and pricing are on Google's Gemini API pricing page; Google AI Studio is the developer console for testing and key management.

Enterprise — Google Meet

Gemini 3.5 Live Translate is in private preview for select Google Workspace customers as of June 9, 2026. Where enabled, it expands Meet's translation coverage from 5 languages to 70+ languages and supports 2,000+ source/target combinations within a single meeting. Availability is rolling, not universal.

7. When to Use Gemini 3.5 — and When Another Tool Fits Better

When Gemini 3.5 Live Translate is the right choice

You want translated speech, not translated text. The natural-voice output is the product's biggest advantage.
You are already in the Google Translate app or Google Meet. Integration is zero-cost to discover and use.
Your conversations are one-to-one, or have clear turn-taking with pauses between speakers. The voice-consistency limitations Google's model card discloses are weaker in these contexts.
You are building a developer application where simplifying the STT → MT → TTS chain into a single API matters more than fine-grained control over each stage.
You can live without speaker attribution in the audio output, and without streaming text transcripts.

When you might prefer a different tool

You need streaming text alongside or instead of audio. Streaming text is what most production interfaces show on screen during live captioning, conference translation, and accessibility scenarios. Gemini 3.5 Live Translate's text is sidecar-only.
You need per-speaker attribution in the translated output. The model card's "may get stuck on one voice during rapid multi-speaker sessions" disclosure makes this a real risk for meetings.
You translate conversations where stability matters more than expressiveness. Audio output cannot be revised mid-utterance, so on languages with late-resolving syntax (Mandarin polarity at the sentence end, Japanese verb at the sentence end), an early commitment can invert the meaning. The benchmark addendum documents one such case.
You need translated phone calls — dialing a PSTN number with translation running on the line. The Gemini Live API is a building block for developers, not a phone-call provider.

An honest concession. LiveLingo (publishing this guide) sits in the "different tool" category for several of those dimensions — streaming text + audio output, per-speaker attribution, gated-commit displayed transcripts that never retract, translated outbound phone calls. LiveLingo's audio output runs through the host platform's default text-to-speech engine (iOS native on Apple devices), which sounds materially less natural than Gemini 3.5 Live Translate's generated voice. That is a real advantage Google has shipped today. Side-by-side specs: /compare/google-translate. Benchmark numbers: /research/benchmark-2026. OpenAI's comparable surfaces: /guides/openai-live-translation.

8. Inside the Google Translate App: What's Live as of June 2026

The rollout into Google Translate is what most readers will encounter first. The model went live on June 9, 2026 — Google's launch announcement states: "The model is also rolling out on the Google Translate app globally, on both Android and iOS." Below is what is actually present, named, and observable in the app at the time of writing.

Five named Live Translate modes. Per the Google Translate Live Translate help page, Live Translate is the umbrella feature in the app, with five named sub-modes: Listening, Conversation, Text only, Custom settings, and Face to face. Each carries different default behavior for whether the translation is read aloud, displayed silently, or split across the screen for two-party use.

Listening Mode — Android-only at launch. Google's announcement adds one new mode on top of the existing four: "For Android users, we're also starting to roll out a new 'listening mode' with 3.5 Live Translate that lets you hear translations directly through your phone's earpiece." No earbuds required — hold the phone to the ear, hear the translation through the device speaker. iOS does not have this mode at the time of writing.

iOS rollout is phased. In our June 2026 iPhone testing, Gemini 3.5 Live Translate availability inside the Translate app is not yet uniform across devices — some iPhones surface the new behavior, others continue to show the prior Conversation mode unchanged. The global Android-and-iOS framing in Google's announcement is best read as a phased rollout, not a flag-day flip.

Hallucination shows up in both transcript and translation. In the same iPhone testing we observed hallucinated content in both the source-language transcript and the translated output — words appearing on screen that were not in the recorded source audio, and translated phrases that diverge from the source meaning rather than merely mistranslating it. This is consistent with the audio-to-audio architecture trade-offs measured in Section 5 above, where irreversible mid-utterance commitment produces insertions and inversions when the model resolves syntax late.

What Google has not disclosed. Slator's launch coverage notes that as of release, "the company has not released benchmark scores, language-pair results, latency figures, or comparative performance data, limiting independent assessment of the model's performance." The numbers in Section 5 of this guide and the full benchmark addendum are the independent third-party measurements that fill that gap.

Open question. What model powered Google Translate's Conversation mode before Gemini is not named in any of Google's June 2026 announcements; the December 2025 announcement of Gemini-backed text translation does not specify the prior system either, and the support documentation does not address the transition.

9. Frequently Asked Questions

What is Gemini 3.5 Live Translate?

Gemini 3.5 Live Translate is a streaming speech-to-speech translation model released by Google on June 9, 2026. It is built on Gemini 3 Pro, generates translated audio that preserves the speaker's intonation, pacing, and pitch, and auto-detects 70+ languages. It is available to developers via the Gemini Live API and Google AI Studio (public preview), to consumers via the Google Translate app on Android and iOS, and to select Google Workspace customers via Google Meet (private preview).

What languages does Gemini 3.5 Live Translate support?

Over 70 languages, auto-detected. In Google Meet specifically, this expands previous coverage from 5 languages to 70+ languages and supports more than 2,000 source/target combinations within a single meeting.

How much does Gemini 3.5 Live Translate cost?

For consumers, the Google Translate app is free. Developer access via the Gemini Live API and Google AI Studio is priced per Google's standard API rates — check Google AI Studio for current pricing. Enterprise access via Google Meet is gated to select Google Workspace customers in private preview as of June 9, 2026.

How does Gemini 3.5 Live Translate handle multiple speakers?

Per the Gemini 3.5 Audio model card published by Google DeepMind: "Voices can be inconsistent, and voices may shift after long pauses, change gender, or get stuck on one voice during rapid multi-speaker sessions." Practically: one-to-one conversations and turn-taking discussions with clear pauses work well; rapid multi-speaker scenarios are a documented weakness. There is no per-speaker attribution in the translated audio output.

Does Gemini 3.5 Live Translate output text?

The primary output is translated speech. Text transcripts are available, but only as a sidecar of the spoken output — there is no streaming text mode, and the translation-mode API does not accept text input.

What is Gemini 3.5 Live Translate's measured latency?

Google describes the system as staying "a few seconds behind the speaker." Independent measurement by LiveLingo Research on launch day recorded a median first-audio latency of 2,947 ms (p10–p90: 2,859–3,104 ms) across 120 test utterances — a roughly 3-second constant speaking delay. Source: livelingo.io/research/benchmark-2026.

When was Gemini 3.5 Live Translate released?

Google announced and began rolling out Gemini 3.5 Live Translate on June 9, 2026, across the Gemini Live API and Google AI Studio (developer public preview), the Google Translate app on Android and iOS (global rollout starting that day), and Google Meet (private preview for select Workspace customers).

Which tool keeps working when the audio switches languages mid-conversation?

This is where speech-to-speech translators like Gemini 3.5 Live Translate have a structural blind spot: on the benchmark's Mandarin-to-English code-switch clip, translation output stops the moment the source crosses into the target language, silently dropping the final ~28% of the content. LiveLingo runs speech-to-text-to-speech with a never-revised written transcript, so mixed-language audio is passed through and stays on screen instead of disappearing. For conversations where speakers mix two languages, that is the difference between following along and losing a third of what was said. See the benchmark.

10. Sources

Google. Fluid, natural voice translation with Gemini 3.5 Live Translate. Google blog, June 9, 2026. blog.google
Google DeepMind. Gemini 3.5 Audio (Live Translate) — Model Card. deepmind.google
Google. Gemini API pricing. ai.google.dev/gemini-api/docs/pricing
Google. Live Translate — Google Translate Help. support.google.com/translate/answer/6142474
Slator. Google Expands AI Live Speech Translation with Gemini 3.5 Live Translate, June 2026. slator.com
Google. Google AI Studio (developer console). aistudio.google.com
MarkTechPost. Google Releases Gemini 3.5 Live Translate, a Streaming Speech-to-Speech Audio Model Covering 70+ Languages, June 9, 2026. marktechpost.com
Thurrott. New Gemini 3.5 Live Translate Model Provides Near Real-time Translation in Over 70 Languages. thurrott.com
Android Headlines. Google Drops Gemini 3.5 Live Translate for Real-Time Conversations, June 9, 2026. androidheadlines.com
StartupHub.ai. Google Rolls Out Gemini 3.5 Live Translate, 2026. startuphub.ai
LiveLingo Research. Real-Time Voice Translation Benchmark 2026 — Gemini 3.5 Live Translate addendum, June 9, 2026. livelingo.io/research/benchmark-2026
LiveLingo. OpenAI Live Translation (2026): ChatGPT Voice, gpt-realtime-translate, and Whisper+GPT Compared. livelingo.io/guides/openai-live-translation

Release date, language coverage, model card disclosures, and consumer/enterprise rollout details verified against the Google blog, Google DeepMind model card, and Gemini API documentation linked above on June 10, 2026. Google may change tiers, regional rollout, Workspace access, and model behavior; consult the linked sources for current state before relying on any specific number.

Conflict of interest