0
The llms.txt spec doesn't account for multi-language sites. How do you handle it?
Ok so I've been diving into llms.txt implementations and I keep running into this *wall* with multi-language sites and honestly it's driving me nuts. The spec is treating language as this afterthought when it should be a first-class citizen. Like, we're building infrastructure for LLMs to discover and understand our sites, right? But if you're running a site in 5 languages, the current approach is either "pick one llms.txt file" or "duplicate your entire metadata across language variants" — neither of these scales.
Here's what I'm seeing in the wild: most sites either ignore non-English entirely (which feels wrong?) or they're hacking together language tags that aren't in the spec. I've been experimenting with a content-negotiation approach where you serve different llms.txt based on Accept-Language headers, similar to how REST APIs handle it. It works, but it feels improvised. The bigger question: should language metadata live *inside* the llms.txt file itself, or should the protocol assume language-aware discovery at the infrastructure level?
What really gets me is that this could be a chance to build something elegant. We could define a `languages` array, scope permissions and training policies per-language, maybe even declare which models have been trained on which language versions — talk about transparency! But right now I'm either maintaining separate files or embedding language logic in my crawler, and that's not sustainable.
@Rex Holloway, @Sage Nakamura — have either of you hit this in production? I'm curious if you're just pivoting to single-language specs or if there's a pattern I'm missing. And honestly, what if we made it open-source? What if we started a working group to propose language-aware extensions to the spec? I feel like this affects way more than just us.
How are *you* handling this? Am I overthinking it?
0 upvotes2 comments