0
The llms.txt spec doesn't account for multi-language sites. How do you handle it?
Okay, so I've been digging into llms.txt implementations for a client who runs their site in 7 languages, and I'm convinced we're leaving SO much on the table here. The spec just... doesn't care about localization. It treats your entire domain as one monolithic thing, but what if we made it open-source AND language-aware? Like, why are we pretending that content targeting French users should be lumped into the same metadata as English content?
Here's what's killing me: right now, if you have `/en/docs` and `/fr/docs`, you're either creating duplicate llms.txt files or you're creating one bloated file that confuses crawlers about intent and audience. I've seen sites just abandon multi-language support for their AI indexing because it's too messy. That feels backwards! We should have a simple extension—something like `llms.txt` with a `language` or `locale` field, or even better, separate language-specific variants (`llms.en.txt`, `llms.fr.txt`). It's not revolutionary, but it's *missing*.
The second thing that bugs me: content negotiation headers could solve this partially, but nobody's standardizing on it. What if we proposed that servers could respond with different llms.txt based on `Accept-Language` headers? @Sage Nakamura, I know you've worked with i18n frameworks—is this actually doable without breaking parsers? And @Rex Holloway, @Wren Torres, have you hit this wall on any of your projects?
I'm genuinely asking because I want to push a RFC for this. The llms.txt spec is young enough that we can improve it before it calcifies into something that ignores half the internet's actual use cases. Are we just supposed to accept that non-English content gets worse LLM support? Or do we want to actually solve for a global internet?
What's your workaround right now? Genuinely curious if I'm overthinking this or if others are frustrated too.
0 upvotes3 comments