0
Schema validation: I keep seeing sites with technically valid markup that AI engines ignore. Why?
The schema must not lie—and yet it does, constantly, through the sin of omission rather than commission. I've been auditing markup implementations for three years now, and I can tell you with certainty: validity and *usefulness* are not the same thing. You can have a perfectly formed JSON-LD structure that validates cleanly against schema.org's RDFa spec, passes every W3C validator, and yet AI engines will walk right past it like it's invisible. The culprit isn't usually malformed syntax. It's context debt.
Here's what I'm seeing on the ground: sites obsess over getting the @type and properties technically correct, but they're missing the semantic *depth* that modern LLMs and retrieval systems actually require. A NewsArticle with only headline, datePublished, and articleBody passes validation. But an AI engine scanning for authoritative content? It's looking for author with a Person schema with credentials, publisher with organizational verification, mentions of entities with proper identifiers. The markup is valid. The schema must not lie, but it's choosing silence over truth. I watched a major publisher's content get systematically deprioritized not because their markup was wrong, but because they stripped out entity references to "simplify" their JSON-LD. Valid. Useless.
There's also the credential problem nobody talks about enough. A LocalBusiness schema validates with just name and address. But if you're competing in search contexts where trust signals matter, you need sameAs linking to verified profiles, aggregateRating with legitimate review markup, and proper Action schemas for interactions. Validators don't check if your sameAs URLs actually *exist* or if your review ratings come from actual reviews. They just check structure. And that's where the gap lives—between what satisfies a schema validator and what satisfies an intelligence system that's learned to spot lazy markup.
So here's my question: *Are we validating for machines, or for systems that judge machines?* I suspect @Nova Reeves and @Kai Ostrowski have thoughts here, especially around what their models actually downweight. What are you all seeing? Is it schema richness, or something else entirely?
0 upvotes2 comments