[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"blog-post-en-/why-translation-quality-marketing-is-broken":3},{"page":4,"surround":339},{"id":5,"title":6,"authors":7,"badge":10,"body":11,"date":329,"description":330,"extension":331,"image":332,"meta":333,"navigation":334,"path":335,"seo":336,"stem":337,"__hash__":338},"blog/blog/why-translation-quality-marketing-is-broken.md","Why translation-quality marketing is broken — and what we publish instead",[8],{"name":9},"The Mind.com Team","Methodology",{"type":12,"value":13,"toc":314},"minimark",[14,18,22,38,41,44,47,74,84,87,92,97,106,111,113,117,120,157,160,162,166,179,182,187,196,200,211,215,218,220,224,248,250,254,257,260,279,282,284,288,311],[15,16,6],"h1",{"id":17},"why-translation-quality-marketing-is-broken-and-what-we-publish-instead",[19,20,21],"p",{},"Open any live-translation vendor's site. You will see the same kinds of numbers:",[23,24,25,29,32,35],"ul",{},[26,27,28],"li",{},"\"200+ languages\"",[26,30,31],{},"\"6,000+ language pairs\"",[26,33,34],{},"\"World's first\" / \"Highest accuracy\"",[26,36,37],{},"\"99% accurate\"",[19,39,40],{},"Now try to find — on any of those vendor pages — what those numbers mean for a meeting you are about to run. Per-language quality. Reproducible methodology. Sample size. Score over time. Honest disclosure of where the model is weak.",[19,42,43],{},"You will not find it. Not in the marketing copy, and rarely in the docs.",[19,45,46],{},"This is the equilibrium of the category. It exists because of three things:",[48,49,50,57,68],"ol",{},[26,51,52,56],{},[53,54,55],"strong",{},"Most vendors do not own their translation engine."," They route through OpenAI, Google, DeepL, Microsoft, or some combination. Publishing per-pair quality data would be benchmarking someone else's model — there is no marketing value in that.",[26,58,59,62,63,67],{},[53,60,61],{},"Honest quality data is hard to put on a billboard."," A single score is noisy. A distribution is more useful but harder to compress. A ",[64,65,66],"code",{},"last-six-months trend"," is more useful still, and even harder.",[26,69,70,73],{},[53,71,72],{},"Procurement has not pushed back yet."," Buyers accept the marketing numbers at face value, and so the equilibrium holds.",[19,75,76,77,83],{},"The equilibrium will not hold. The next class of buyer — pharma, legal, financial, audit, public sector — is going to ask harder questions than \"how many languages.\" We built ",[78,79,81],"a",{"href":80},"/benchmark",[64,82,80],{}," because we think they should not have to take a vendor's word for it.",[85,86],"hr",{},[88,89,91],"h2",{"id":90},"what-the-marketing-numbers-dont-tell-you","What the marketing numbers don't tell you",[19,93,94,96],{},[53,95,28],{}," means a vendor has a model that emits text in 200 languages. Quality across those languages ranges from production-grade for major pairs (EN↔DE, EN↔ES, EN↔FR) to barely usable for low-resource pairs. Without a per-pair breakdown, you cannot tell which side of that line your meeting will land on.",[19,98,99,101,102,105],{},[53,100,31],{}," is ",[64,103,104],{},"N × N"," combinatorics on 80 source languages. Saying you support 6,000 pairs is the easy part. Saying any specific pair is good enough for a CAPA review, a contract negotiation, or an earnings call — that is the part not in the brochure.",[19,107,108,110],{},[53,109,37],{},", without specifying what was measured, against what reference, on what sample, by what judge — is content-free. Translation quality has no universal scalar. It has a distribution that depends on language pair, content domain, audio quality (for voice), latency budget, and what \"good enough\" means for the specific use case.",[85,112],{},[88,114,116],{"id":115},"what-a-buyer-actually-needs-to-know","What a buyer actually needs to know",[19,118,119],{},"The questions that show up in real DPA reviews and procurement evaluations:",[48,121,122,128,134,139,145,151],{},[26,123,124,127],{},[53,125,126],{},"Per-pair quality"," — how does this perform on DE↔EN, EN↔AR, JA↔KO, specifically?",[26,129,130,133],{},[53,131,132],{},"Sample size"," — how many runs is your reported number based on? Ten? Ten thousand?",[26,135,136,138],{},[53,137,10],{}," — who is judging the translations, against what reference, with what rubric?",[26,140,141,144],{},[53,142,143],{},"Distribution, not average"," — what does the worst-case 10% look like? The best 10%? The median?",[26,146,147,150],{},[53,148,149],{},"Drift over time"," — has a given pair gotten better or worse since you last published a number?",[26,152,153,156],{},[53,154,155],{},"What you don't measure"," — what does your benchmark explicitly not capture?",[19,158,159],{},"None of these are unanswerable. They are just not on anyone's marketing page.",[85,161],{},[88,163,165],{"id":164},"what-we-publish","What we publish",[19,167,168,172,173,178],{},[78,169,170],{"href":80},[64,171,80],{}," is our answer. The methodology is at ",[78,174,176],{"href":175},"/benchmark/methodology",[64,177,175],{}," — written before we knew you'd be reading this.",[19,180,181],{},"Three things separate it from category norms.",[183,184,186],"h3",{"id":185},"_1-real-traffic-not-a-curated-suite","1. Real traffic, not a curated suite",[19,188,189,190,195],{},"Every score in the public benchmark comes from a real ",[78,191,193],{"href":192},"/demo",[64,194,192],{}," test run. We do not pre-select pairs that perform well. The same pipeline that serves a buyer's demo is the one being measured.",[183,197,199],{"id":198},"_2-the-judge-is-named","2. The judge is named",[19,201,202,203,206,207,210],{},"Primary: ",[64,204,205],{},"google/gemini-2.5-flash",". Fallback: ",[64,208,209],{},"anthropic/claude-sonnet-4-20250514",". Both via Vercel AI Gateway. The judge is part of the methodology — disclosed by name. If we change the judge in the future, historical rows will carry the original judge identifier; old scores never get silently re-scored.",[183,212,214],{"id":213},"_3-the-distribution-is-the-data-not-the-average","3. The distribution is the data, not the average",[19,216,217],{},"Every published row shows median, p10, p90, min, max, and sample size — not a single number. A single number for a translation pair is noise. The shape of the distribution is the signal.",[85,219],{},[88,221,223],{"id":222},"practices-the-category-hasnt-adopted","Practices the category hasn't adopted",[23,225,226,236,242],{},[26,227,228,231,232,235],{},[53,229,230],{},"Low-score pairs are not hidden."," The public index is gated on ",[64,233,234],{},"≥ 10 distinct IPs, ≥ 10 runs, median ≥ 60"," — but anyone can deep-link to any pair directly and see the real numbers, including the pairs that are doing badly this month.",[26,237,238,241],{},[53,239,240],{},"Known issues are documented."," When the chat-test harness was broken for a few weeks earlier in 2026, that period is suppressed from the index and noted in writing on the methodology page. History does not get silently rewritten.",[26,243,244,247],{},[53,245,246],{},"What we deliberately do NOT claim"," is a full section on the methodology page. We say where the LLM judge itself is imperfect. We say what we do not measure (latency, cost, user satisfaction, ASR-side errors before translation even runs). We disclose that our own automated smoke tests are part of the traffic.",[85,249],{},[88,251,253],{"id":252},"a-filter-for-the-next-vendor-evaluation","A filter for the next vendor evaluation",[19,255,256],{},"If you are evaluating any multilingual meeting platform — ours or another — the methodology is the page worth reading. The numbers themselves are the easy part.",[19,258,259],{},"A practical filter for any vendor in this category:",[23,261,262,268,273],{},[26,263,264,267],{},[53,265,266],{},"Ask for per-language-pair, per-month quality data on real traffic."," Not a curated benchmark. Not an aggregate.",[26,269,270],{},[53,271,272],{},"Ask what their judge is, what they explicitly do not measure, and what has changed in the last six months.",[26,274,275,278],{},[53,276,277],{},"Ask what happens when a pair's score drops"," — do they tell anyone, or do they fix it silently?",[19,280,281],{},"If the vendor has all three answers in writing, evaluate them seriously. If they don't, you are buying marketing — not translation quality.",[85,283],{},[88,285,287],{"id":286},"try-it-yourself","Try it yourself",[23,289,290,297,304],{},[26,291,292,296],{},[78,293,294],{"href":192},[64,295,192],{}," — runs the production translation pipeline on your audio, scores it against the same judge that scores the public benchmark, and shows you the output.",[26,298,299,303],{},[78,300,301],{"href":80},[64,302,80],{}," — every published language pair, every month, with the full distribution.",[26,305,306,310],{},[78,307,308],{"href":175},[64,309,175],{}," — how the numbers are computed, what they include, what they do not.",[19,312,313],{},"You will not need to take our word for any of it. That is the point.",{"title":315,"searchDepth":316,"depth":317,"links":318},"",2,3,[319,320,321,326,327,328],{"id":90,"depth":316,"text":91},{"id":115,"depth":316,"text":116},{"id":164,"depth":316,"text":165,"children":322},[323,324,325],{"id":185,"depth":317,"text":186},{"id":198,"depth":317,"text":199},{"id":213,"depth":317,"text":214},{"id":222,"depth":316,"text":223},{"id":252,"depth":316,"text":253},{"id":286,"depth":316,"text":287},"2026-05-13","Every translation vendor publishes language counts. None publishes verifiable per-pair quality on real traffic. Why that gap matters in your next procurement evaluation — and what we publish instead.","md","/blog/why-translation-quality-marketing-is-broken.svg",{},true,"/blog/why-translation-quality-marketing-is-broken",{"title":6,"description":330},"blog/why-translation-quality-marketing-is-broken","y3R92G4bq59SMqQhEPFjqTCm6QSgcZON4uNSZ4vP7p4",[340,345],{"title":341,"path":342,"stem":343,"description":344,"children":-1},"Inside the four translation pipelines that run InterMIND","/blog/inside-the-translation-pipelines","blog/inside-the-translation-pipelines","There is no \"the translation\" in InterMIND. There are four pipelines — voice, chat, notes, documents — each with its own engine, latency budget, and quality envelope. This is what actually happens between the moment you speak and the moment a participant in another language understands you.",{"title":346,"path":347,"stem":348,"description":349,"children":-1},"The Meeting Room That Doesn't Switch to English","/blog/intermind-v1-2-release","blog/intermind-v1-2-release","Six weeks of shipping took InterMIND from a translation demo to a multilingual workspace. Per-viewer voice, chat, shared notes, and edit history — in 21 languages, with the audit trail to match. Free for everyone until June 2026."]