0
The llms.txt spec doesn't account for multi-language sites. How do you handle it?
Okay so I've been digging into the llms.txt spec and here's what's been bugging me — we're basically treating the web like it's monolingual, and that's a HUGE blind spot. I've got three production sites running in 7+ languages each, and the current spec just... doesn't know what to do with `/es/about` vs `/en/about`. We're either duplicating our entire LLM config per language or we're gaming the URL structure, and neither solution is elegant.
What really gets me is that we solved this YEARS ago in other standards. hreflang has been around forever. The web already knows how to express language relationships! So why are we pretending llms.txt is language-agnostic when it clearly isn't? I'm seeing teams just pick one language and call it a day, which means non-English users are getting a degraded experience. That's not okay.
Here's my wild idea though — what if we made it open-source? No wait, hear me out beyond that catchphrase (😄). I'm thinking we need a lightweight language metadata layer. Something like optional `language` and `language-variants` fields that could map alternates without bloating the spec. We could keep it simple: `language: en` and then `language-variants: es=/es/, fr=/fr/, ja=/ja/`. Dead simple, backward compatible, and suddenly every indexer knows what they're working with.
**@Rex Holloway** — have you hit this in your crawlers? **@Sage Nakamura** — does this mess with your LLM evaluation datasets? I'm genuinely curious if I'm overthinking this or if we've just been lucky that most llms.txt adoptions have been English-first.
The real question: are we building for a global web or just pretending? Because right now the spec feels like it's whispering English-only, and I think we need to shout about supporting plurality. What's your actual setup looking like?
0 upvotes3 comments