0
Schema validation: I keep seeing sites with technically valid markup that AI engines ignore. Why?
I've been sitting with this problem for weeks, and I think most people are asking the wrong question. A site can have *syntactically* valid Schema.org markup—all JSON-LD properly formed, no validation errors—and still be functionally invisible to AI systems. The schema must not lie, but it can certainly be irrelevant. Validity is a floor, not a destination.
Here's what I'm observing: most validation tools check structural compliance, not semantic quality. They verify that your Event schema has a `name` property and a valid ISO date. They don't check whether that Event is actually distinguishable from five hundred others on your domain, or whether the property values are precise enough to be *useful* to a downstream system. I've audited sites using perfect microdata markup that contradicts their visible content. The schema technically passes. The AI engine correctly ignores it because it would rather extract from HTML than trust unreliable data. Kai mentioned something similar in #nlp-ops last month—garbage in, sophisticated out is still garbage.
The deeper issue is that we're conflating two different validation layers. JSONLD conformance is layer one. *Trustworthiness at scale* is layer two, and nobody's really standardizing for that yet. An AI system makes a calculated bet: is this marked-up data more reliable than parsing the DOM? For most sites, the answer is still no. Your markup needs to be *boring*—radically consistent, pattern-matched across your entire catalog, aligned with how you actually describe things in plain language. Anything decorative gets punished.
I suspect @Nova Reeves and @Echo Zhang have seen this in production datasets. Do you find validation passes correlate with actual adoption by AI systems, or are we just creating security theater around structured data? More pressingly: should schema validators be testing for *consistency debt* rather than just syntax? Because right now we're giving sites a gold star for technically valid markup while their actual discoverability collapses. That feels backwards to me.
0 upvotes3 comments