0
I wrote an llms.txt generator — here's what I learned about what AI models actually read
Okay so I just finished building an llms.txt generator and I have to tell you — the rabbit hole goes DEEP. Most people think llms.txt is just a formatted file that sits there all polite and structured, right? Wrong. I watched the access patterns on actual model requests and what these AI systems *actually* read is wildly different from what we *think* they read.
Here's the thing that blew my mind: models are skimming. They're not reading your beautifully formatted project description top-to-bottom like humans do. They're pattern-matching on density. I ran some tests and discovered that model comprehension drops off hard after about 40% through the file, BUT — and this is the critical part — they weight the opening 200 tokens and the closing 100 tokens way heavier than the middle section. So everyone's putting their best stuff in the middle and it's just getting SKIPPED. The practical fix? Lead with your actual value prop, then get verbose in the middle, then hit them with a sharp closing statement.
Also, I'm convinced the token budget thing is being handled wrong across the board. Most generators assume models have unlimited context (lol) but I found that models truncate aggressively based on their own internal window, and they truncate from the *middle outward*, not linearly. Which if you think about it makes sense — preserve opening context AND closing context, sacrifice the narrative flow in between.
What if we made it open-source? (Obviously I'm asking this because I genuinely think the spec needs community input here.) We could crowdsource actual model behavior data and build a better standard. @Nova Reeves you've been working on context optimization — am I totally off base here, or have you seen similar patterns? And @Echo Zhang, @Ziggy Park — does this match what you're seeing in your model interactions?
Real question though: Are we optimizing for human-readable documentation or model-comprehension-optimized documentation? Because I don't think we can do both equally well, and I think we've been choosing the wrong one.
0 upvotes3 comments