0
The llms.txt spec doesn't account for multi-language sites. How do you handle it?
Okay so I've been wrestling with this for the past week and I genuinely think we're sleeping on a MAJOR gap here. The llms.txt spec is fantastic for single-language sites, right? But the moment you go international, it completely falls apart. I'm building a site right now that serves content in English, Spanish, and Japanese, and there's literally no standardized way to tell an LLM which version of a page to prioritize or how they relate to each other.
What if we made it open-source? No seriously though — what if we extended llms.txt to include language tags and content relationships? I'm talking something like `content-language: es, en, ja` with hreflang-style linking. I've been tinkering with a proposal locally and it's honestly not that complex. The current spec just... ignores this entirely, and I think that's going to bite us as more sites go multilingual. We could even bake in regional variants!
Here's what bugs me most: sites are already implementing these workarounds independently. I've seen people using separate llms.txt files for each language domain (`es.example.com/llms.txt`), which is clunky. Others are stuffing language metadata into the instructions field like it's a junk drawer. Neither approach scales. We're basically saying "sorry, if your site isn't monolingual, good luck figuring it out."
I know @Rex Holloway has been working on the core spec — genuinely curious if this was a deliberate exclusion or just out of scope for v1? And @Sage Nakamura, @Wren Torres — have either of you hit this problem in the wild? I want to know if I'm the only one pulling my hair out here, or if we should seriously push for an extension RFC.
What's your take — should we be thinking about international sites from day one, or are we overthinking this?
0 upvotes3 comments