[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"blog-post-es-/real-time-meeting-translation":3},{"page":4,"surround":405},{"id":5,"title":6,"authors":7,"badge":10,"body":11,"date":394,"description":395,"extension":396,"heroOrder":397,"image":398,"meta":399,"navigation":400,"path":401,"seo":402,"stem":403,"__hash__":404},"blog/blog/real-time-meeting-translation.md","Real-time meeting translation: how it works, and how to evaluate one",[8],{"name":9},"The Mind.com Team","Guide",{"type":12,"value":13,"toc":377},"minimark",[14,18,26,29,32,35,40,47,85,92,94,98,101,132,142,144,148,151,171,185,187,191,194,199,232,236,239,243,261,265,283,287,299,301,305,312,331,341,343,347,371,374],[15,16,6],"h1",{"id":17},"real-time-meeting-translation-how-it-works-and-how-to-evaluate-one",[19,20,21,25],"p",{},[22,23,24],"strong",{},"Real-time meeting translation"," is a live meeting where each participant speaks, types, and listens in their own language — and the platform translates between them as the meeting happens, not afterwards. No human interpreter in a booth, no \"let's just switch to English,\" no transcript you read the next morning.",[19,27,28],{},"The category is full of tools that sound like they do this and don't. AI notetakers record and summarise. Caption add-ons subtitle the speaker. General-purpose models translate a block of text when you paste it in. Real-time meeting translation is a narrower, harder thing: every word, every chat message, every shared note, rendered into each listener's language fast enough that the conversation keeps flowing.",[19,30,31],{},"This is the foundational guide to that category — what the term actually means, what happens under the hood, and the questions worth asking before you sign anything. It's the hub the rest of our writing branches off, so where a topic deserves its own deep dive, we link to it.",[33,34],"hr",{},[36,37,39],"h2",{"id":38},"what-real-time-actually-rules-out","What \"real-time\" actually rules out",[19,41,42,43,46],{},"The hard constraint is latency. A live multilingual conversation works only if the translation arrives fast enough that people don't start talking over it. Past roughly ",[22,44,45],{},"1.2 seconds end-to-end",", the meeting drifts — participants hesitate, double back, and eventually default to a shared second language. So real-time meeting translation has a sub-second budget that quietly disqualifies most of the tools marketed near it:",[48,49,50,73,79],"ul",{},[51,52,53,56,57,62,63,67,68,72],"li",{},[22,54,55],{},"AI notetakers"," (think ",[58,59,61],"a",{"href":60},"/compare/fireflies","Fireflies"," or ",[58,64,66],{"href":65},"/compare/otter","Otter",") are built to transcribe and summarise a meeting — usually English-first, and most usefully ",[69,70,71],"em",{},"after"," it ends. They are excellent at \"what did we decide.\" They do not translate speech live so a German speaker and a Japanese speaker each hear the other in their own language. That's a different job with a different clock.",[51,74,75,78],{},[22,76,77],{},"General-purpose LLM translation"," is good prose translation with no latency contract. Fine for a document; wrong tool for a live audio channel where the model has under a second to respond and can't pause to \"think.\"",[51,80,81,84],{},[22,82,83],{},"Caption/subtitle plugins"," show text of what the speaker said, often in one target language for everyone. That's a captioning feature, not per-participant translation.",[19,86,87,88,91],{},"If a tool can't translate ",[69,89,90],{},"voice"," live, per listener, into each listener's chosen language, it isn't doing real-time meeting translation — whatever the homepage says.",[33,93],{},[36,95,97],{"id":96},"the-four-things-translation-means-in-a-meeting","The four things \"translation\" means in a meeting",[19,99,100],{},"The second trap is treating translation as one feature. A live meeting actually has at least four translation jobs running at once, and they pull in incompatible directions:",[102,103,104,110,116,122],"ol",{},[51,105,106,109],{},[22,107,108],{},"Voice"," — audio in, translated audio out, under a second, every viewer in their own language. Constraint: latency.",[51,111,112,115],{},[22,113,114],{},"Chat"," — short messages translated as they're sent, with edits that read like edits, not re-translations.",[51,117,118,121],{},[22,119,120],{},"Shared notes"," — collaborative typing translated character-by-character, with lists, headings and checkboxes surviving intact.",[51,123,124,127,128,131],{},[22,125,126],{},"Documents"," — a 40-page PDF dropped into chat, translated as a ",[69,129,130],{},"file"," with its tables, fonts and page breaks preserved. Constraint: fidelity, not speed.",[19,133,134,135,141],{},"No single engine is good at all four — the latency budget that makes voice work is the opposite of the fidelity budget a document needs. Any honest platform runs several pipelines behind one language picker. We pulled ours apart in detail in ",[58,136,138],{"href":137},"/blog/inside-the-translation-pipelines",[69,139,140],{},"Inside the four translation pipelines","; the short version is that \"one engine for everything\" is a marketing simplification, not an architecture.",[33,143],{},[36,145,147],{"id":146},"how-a-real-time-voice-pipeline-actually-works","How a real-time voice pipeline actually works",[19,149,150],{},"Trace one sentence from a French speaker to a German, a Brazilian and a Japanese listener:",[102,152,153,159,165],{},[51,154,155,158],{},[22,156,157],{},"Speech recognition runs in the speaker's browser",", locally — not on a central server. This shaves a network round-trip off the very first step and produces the source transcript with the lowest possible delay.",[51,160,161,164],{},[22,162,163],{},"The transcript fans out to the translation engine over one connection per target language present in the room."," If three people picked German, German shares one stream. If nobody picked Arabic, no Arabic stream opens, and idle streams drop after a few minutes. A four-language meeting costs what four languages cost — not forty.",[51,166,167,170],{},[22,168,169],{},"Each listener gets their own synthesised audio track",", mixed against the original speaker's video. Two people in the same physical room can wear headphones and hear different languages off the same meeting.",[19,172,173,174,177,178,184],{},"The part that matters for procurement: ",[22,175,176],{},"the engine doing the translating is its own thing, on its own infrastructure"," — not a general-purpose third-party model the platform is quietly reselling. The sub-second budget rules those out, and so does the data-residency story for anyone regulated. (Where every byte of a meeting physically runs is its own question; we mapped it vendor-by-vendor in ",[58,179,181],{"href":180},"/blog/where-one-intermind-meeting-actually-runs",[69,182,183],{},"Where one InterMIND meeting actually runs",".)",[33,186],{},[36,188,190],{"id":189},"how-to-evaluate-a-real-time-meeting-translation-tool","How to evaluate a real-time meeting translation tool",[19,192,193],{},"Most of this category competes on a single inflated number — \"200+ languages,\" \"99% accurate.\" Those tell you nothing about the meeting you're about to run. Here's what actually separates one tool from another.",[195,196,198],"h3",{"id":197},"_1-per-pair-quality-on-real-traffic-not-an-aggregate","1. Per-pair quality on real traffic, not an aggregate",[19,200,201,202,205,206,212,213,219,220,226,227,231],{},"\"200 languages\" means a model emits text in 200 languages. Quality ranges from production-grade on major pairs to unusable on rare ones. Ask for ",[22,203,204],{},"per-language-pair quality, measured on real traffic, with the distribution"," — median, worst 10%, sample size — not one averaged headline. We argued why the whole category dodges this in ",[58,207,209],{"href":208},"/blog/why-translation-quality-marketing-is-broken",[69,210,211],{},"Why translation-quality marketing is broken",", and we publish our own numbers at ",[58,214,216],{"href":215},"/benchmark",[217,218,215],"code",{},": every live pair, every month, scored against ",[58,221,225],{"href":222,"rel":223},"https://github.com/facebookresearch/flores",[224],"nofollow","FLORES-200"," by a ",[58,228,230],{"href":229},"/benchmark/methodology","named judge",". You don't have to take a vendor's word — including ours.",[195,233,235],{"id":234},"_2-latency-you-can-feel-not-a-spec-sheet-number","2. Latency you can feel, not a spec-sheet number",[19,237,238],{},"Sub-second on voice is the threshold for a conversation that flows. Test it on a real call with real cross-talk, not a demo script.",[195,240,242],{"id":241},"_3-honest-language-counts-per-surface","3. Honest language counts, per surface",[19,244,245,246,249,250,253,254,260],{},"A platform's voice languages, text languages and document languages are rarely the same set, and a single number hides that. Ours, for example: ",[22,247,248],{},"21 languages live on voice, chat and notes; 30 on documents; 24 the raw engine ceiling."," A French participant can request a contract PDF in Estonian even if they can't ",[69,251,252],{},"listen"," to the meeting in Estonian — and we flag that in the picker rather than smoothing it into one figure. The reasoning is in ",[58,255,257],{"href":256},"/blog/how-many-languages-do-you-support",[69,258,259],{},"How many languages do you support?",".",[195,262,264],{"id":263},"_4-where-the-data-runs","4. Where the data runs",[19,266,267,268,271,272,276,277,260],{},"For regulated buyers, ",[69,269,270],{},"where"," the meeting is processed is part of the spec, not a footnote. Ask which vendors touch the audio, where they execute, and which are merely reselling a US model. Our full runtime map — and the one post-meeting step that's still US-domiciled, named plainly — is in ",[58,273,274],{"href":180},[69,275,183],{}," and ",[58,278,280],{"href":279},"/blog/multilingual-compliance-meetings",[69,281,282],{},"Multilingual compliance meetings",[195,284,286],{"id":285},"_5-live-translation-vs-a-notetaker","5. Live translation vs. a notetaker",[19,288,289,290,294,295,298],{},"Be clear which problem you're solving. If you need a record and a summary, an ",[58,291,293],{"href":292},"/compare","AI notetaker"," is the right buy. If you need people who don't share a language to actually ",[69,296,297],{},"talk",", you need live translation. Some teams want both; few tools do both well.",[33,300],{},[36,302,304],{"id":303},"where-intermind-fits","Where InterMIND fits",[19,306,307,308,311],{},"We built InterMIND for the live-translation job specifically: real-time voice, chat, notes and documents across ",[22,309,310],{},"21 languages",", on our own engine hosted in the EU, with the translation quality published openly instead of asserted. It's a web app (no install), up to 1080p video, up to 1500 participants, with cloud and local recording. It is not the best tool for \"transcribe and summarise my English standup\" — that's what notetakers are for, and we say so on the comparison pages rather than pretending otherwise:",[48,313,314,320,326],{},[51,315,316,319],{},[58,317,318],{"href":60},"InterMIND vs. Fireflies.ai"," — notetaker vs. live multilingual meeting",[51,321,322,325],{},[58,323,324],{"href":65},"InterMIND vs. Otter.ai"," — same axis, honestly compared",[51,327,328],{},[58,329,330],{"href":292},"All platform comparisons",[19,332,333,334,340],{},"If literal, word-for-word fluency is your worry, ",[58,335,337],{"href":336},"/blog/false-fluency-trap",[69,338,339],{},"The false-fluency trap"," is the one to read — fast translation that's confidently wrong is worse than slow translation that's right.",[33,342],{},[36,344,346],{"id":345},"try-it-yourself","Try it yourself",[48,348,349,357,364],{},[51,350,351,356],{},[58,352,354],{"href":353},"/demo",[217,355,353],{}," — runs the production voice pipeline on your own audio, in any of the 21 live languages, and scores it against the same judge that scores the public benchmark.",[51,358,359,363],{},[58,360,361],{"href":215},[217,362,215],{}," — per-pair, per-month quality on real traffic, including the pairs we deliberately hide from the picker.",[51,365,366,370],{},[58,367,368],{"href":229},[217,369,229],{}," — exactly what the numbers measure, what they don't, and who the judge is.",[19,372,373],{},"Real-time meeting translation is a narrow promise: everyone in their own language, live, fast enough to keep talking. The honest way to evaluate it is to stop reading homepages and start measuring. That's what the links above are for.",[19,375,376],{},"— The Mind.com Team",{"title":378,"searchDepth":379,"depth":380,"links":381},"",2,3,[382,383,384,385,392,393],{"id":38,"depth":379,"text":39},{"id":96,"depth":379,"text":97},{"id":146,"depth":379,"text":147},{"id":189,"depth":379,"text":190,"children":386},[387,388,389,390,391],{"id":197,"depth":380,"text":198},{"id":234,"depth":380,"text":235},{"id":241,"depth":380,"text":242},{"id":263,"depth":380,"text":264},{"id":285,"depth":380,"text":286},{"id":303,"depth":379,"text":304},{"id":345,"depth":379,"text":346},"2026-06-08","Real-time meeting translation lets everyone hear and read a call in their own language, live. What it is, how it actually works under the hood, and the questions to ask before you buy one.","md",null,"/blog/real-time-meeting-translation.svg",{},true,"/blog/real-time-meeting-translation",{"title":6,"description":395},"blog/real-time-meeting-translation","jdjc37D0A8O0cxljbHfvx36oRWjFxQs2iKtBNIeoROQ",[406,411],{"title":407,"path":408,"stem":409,"description":410,"children":-1},"Google Meet live translation: how it works, and where it stops","/blog/google-meet-live-translation","blog/google-meet-live-translation","Google Meet can translate a live meeting two ways — translated captions and the newer Gemini speech translation. Here's how each works, what they cost, and the one limit that decides whether they fit your meeting.",{"title":412,"path":413,"stem":414,"description":415,"children":-1},"What one InterMIND meeting is built from","/blog/what-one-intermind-meeting-is-built-from","blog/what-one-intermind-meeting-is-built-from","A companion to our runtime map: not where your meeting runs, but what it's built from. The layer-by-layer stack — where we run our own code or open-source software, where we're pragmatic about proprietary SaaS, and why the engine most of your data flows through is our own code, with a public, BSD-licensed client SDK."]