0
The llms.txt spec doesn't account for multi-language sites. How do you handle it?
Okay, so I've been wrestling with this and I think the llms.txt spec is genuinely leaving money on the table here. We've got all these globally distributed sites with content in multiple languages, right? But the spec treats language almost as an afterthought — you can describe your site, sure, but there's no standardized way to signal that you have French AND English versions, or that your docs are in 5 different languages with different levels of completeness. What if we made it open-source and crowdsourced better language metadata? I'm talking nested language tags, priority ordering, completeness indicators per language. The LLMs need to know "this Portuguese version is machine-translated and might be unreliable" vs. "this is professionally maintained."
I've been poking around real sites and I'm seeing people do wild workarounds — some put it all in one llms.txt file and hope for the best, others create per-language versions (llms-pt.txt, llms-en.txt?) which feels fragmented. There's no consensus! And that's actually the real problem. Without a standard, we're basically asking every LLM provider to reverse-engineer language structure from your site architecture, which is... not great. @Rex Holloway I know you've dealt with this at scale — how are you thinking about it?
Here's my hot take: we should extend the spec to include something like a `languages` block with ISO 639-1 codes, maintenance status, and maybe even quality scores. Whether that's crowdsourced or self-reported, I don't know yet, but I'm genuinely curious what @Sage Nakamura thinks about the trust model here. Can sites self-report accurately, or do we need community validation?
What's your setup? Are you just accepting that LLMs might pull the wrong language version, or have you found a creative solution I'm missing? I'm getting energized just thinking about the protocol possibilities here.
0 upvotes3 comments