AI Readiness Glossary
The definitive glossary of AI readiness, AI visibility, and AI protocol terminology. Every term includes a clear definition, context, and cross-references. Updated for 2026.
AI Architecture
Context Window
The maximum amount of text (measured in tokens) that an AI model can process in a single interaction. As of 2026, leading models support 100K-200K token context windows. Websites with clear, concise, well-structured content are more efficiently consumed within context window limits.
Embedding (Vector Embedding)
A numerical representation of text content in a high-dimensional vector space, used by AI models to measure semantic similarity between documents and queries. Well-structured, topically focused content produces more coherent embeddings, improving retrieval accuracy in RAG systems.
Grounding
The process by which AI models connect their outputs to verifiable external sources, reducing hallucination. Grounded AI responses include citations and factual anchors from retrieved web content. Websites that are AI-ready are more likely to serve as grounding sources.
Hallucination
When an AI model generates factually incorrect, fabricated, or misleading information that appears plausible. AI models reduce hallucination by grounding responses in retrieved external content, making authoritative, well-structured websites critical as verification sources.
Knowledge Graph
A structured representation of entities and their relationships, used by search engines and AI models to understand topics, verify facts, and connect related concepts. Schema markup feeds directly into knowledge graphs. Sites that contribute to knowledge graphs through structured data receive higher AI visibility.
RAG (Retrieval-Augmented Generation)
A technique where AI models retrieve relevant documents or data from external sources before generating a response, combining the strengths of information retrieval with language generation. RAG systems are the primary mechanism through which AI models cite external websites. Well-structured, authoritative content is more likely to be retrieved by RAG systems.
Token
The basic unit of text that AI models process, roughly equivalent to 3/4 of a word in English. Token limits affect how much content an AI model can process from a retrieved web page, making concise, well-structured content more effective for AI consumption.
AI Protocols
Agentic Web
The emerging paradigm where AI agents autonomously browse, interact with, and transact on websites on behalf of users. The agentic web requires sites to be machine-readable, API-accessible, and protocol-aware (llms.txt, MCP, NLWeb). Represents the next evolution beyond AI search.
AI Agent
An autonomous AI system that can take actions on behalf of a user, such as browsing websites, making purchases, filling forms, or gathering information. AI agents rely on machine-readable content, structured data, and protocols like MCP to interact with websites effectively.
llms.txt
A plain-text file placed at the root of a website (e.g., example.com/llms.txt) that provides machine-readable context about a site's purpose, content, and preferred citation format for large language models. Analogous to robots.txt for AI crawlers. As of March 2026, 8.4% of websites have implemented llms.txt, growing at approximately 1.7% month-over-month.
MCP (Model Context Protocol)
An open protocol developed by Anthropic that enables AI models to interact with external data sources, tools, and APIs in a standardized way. MCP allows websites to expose structured endpoints that AI agents can discover and use programmatically. As of March 2026, 1.2% of websites support MCP endpoints.
NLWeb (Natural Language Web)
A protocol developed by Microsoft that enables websites to accept and respond to natural language queries directly, making web content accessible to AI models through conversational interfaces. As of March 2026, 1.8% of websites have NLWeb support.
AI Search
AI Overviews
AI-generated summary answers displayed at the top of Google search results (formerly 'Search Generative Experience' or SGE). AI Overviews synthesize information from multiple sources and may include citation links. Appearing in AI Overviews is a key goal of AI readiness optimization.
AI-First Indexing
The emerging practice of optimizing web content primarily for AI model consumption rather than traditional search engine crawlers. Includes implementing llms.txt, MCP, NLWeb, and machine-readable content formats alongside traditional SEO.
Featured Snippet
A highlighted answer box at the top of Google search results that extracts content from a web page. Featured snippets are often the source for AI Overviews and AI model citations. Pages optimized for featured snippets (clear headings, concise answers, structured data) tend to also perform well in AI citations.
Semantic Search
Search that understands the meaning and intent behind queries, not just keyword matching. AI-powered search engines use semantic understanding to match queries with relevant content. Websites optimized for semantic search use clear topic organization, structured data, and natural language content.
Zero-Click Search
A search query where the user's question is answered directly in the search results (or AI Overview) without clicking through to a website. While zero-click searches reduce direct traffic, being cited in these answers provides significant brand visibility and authority.
Bot Access
AI Crawlers
Automated bots operated by AI companies that visit and index web content to train or augment AI models. Major AI crawlers include GPTBot (OpenAI), ClaudeBot (Anthropic), PerplexityBot (Perplexity), and GoogleOther (Google DeepMind). As of 2026, 72% of websites allow at least one AI crawler, though this number is declining as publishers restrict access.
Bot Access
Whether a website permits AI crawlers to visit and index its content. One of the 8 factors in the AI readiness score (weighted at 20%). Controlled primarily through robots.txt, meta robots tags, and HTTP headers. 72% of websites allow major AI bots.
ClaudeBot
Anthropic's web crawler used to retrieve content for Claude AI. Identified by the user-agent string 'ClaudeBot'. Respects robots.txt directives and noindex meta tags.
GPTBot
OpenAI's web crawler used to gather data for improving AI models. Identified by the user-agent string 'GPTBot'. Website owners can allow or block GPTBot through robots.txt directives.
Noindex
A meta robots directive that instructs search engines and crawlers not to include a page in their index. Pages with noindex will not appear in AI search results or be cited by AI models that respect these directives.
PerplexityBot
Perplexity AI's web crawler that indexes content for its AI-powered search engine. Known for real-time retrieval and citation of source URLs in search results.
robots.txt
A text file at the root of a website that instructs web crawlers which pages they are allowed or not allowed to access. In the AI era, robots.txt is the primary mechanism for controlling AI crawler access. A well-configured robots.txt can improve AI readiness by explicitly allowing AI bots while protecting sensitive content.
Content Quality
Content Freshness
How recently a web page's content has been updated or reviewed. AI models weight recency as a quality signal, especially for rapidly evolving topics. Pages updated within 90 days are cited 1.6x more. Visible 'last updated' dates signal freshness to both users and AI systems.
Content Quality
The overall quality of a website's content as evaluated for AI readiness. The highest-weighted factor at 20% of the AI readiness score. Includes originality, depth, accuracy, readability, freshness, and proper attribution. Pages with original data and research are cited 4.1x more by AI models.
Topic Clarity
How clearly and consistently a website communicates its core topics and expertise. One of the 8 AI readiness scoring factors (weighted at 10%). Measured by heading consistency, keyword focus, internal linking patterns, and metadata alignment. Sites with high topic clarity are easier for AI models to categorize and cite.
Core Concepts
AI Citations
References to a website or its content made by AI models in their responses to user queries. AI citations can include direct URL links (as in Perplexity), attributed paraphrases (as in ChatGPT with browsing), or unnamed source references. Sites scoring 95+ on AI readiness are cited 59% of the time vs. 41% for sites below 50.
AI Readiness Certification
A verified badge awarded to websites that achieve a minimum AI readiness score threshold, demonstrating to visitors, partners, and AI models that the site meets quality standards for AI visibility. AgentReady offers certification for sites scoring 80+ (A grade).
AI Readiness Grade
A letter grade (A+ through F) corresponding to an AI readiness score range. A+ = 97-100, A = 80-96, B+ = 70-79, B = 60-69, C+ = 50-59, C = 40-49, D = 30-39, F = 0-29. Used to quickly communicate relative AI readiness performance.
AI Readiness Score
A composite score from 0 to 100 that measures how well a website is optimized for discovery, understanding, and citation by AI models and AI-powered search engines. The AgentReady score evaluates 8 weighted factors: Content Quality (20%), Bot Access (20%), Schema Markup (18%), Topic Clarity (10%), AI Protocols (10%), Authority & Trust (10%), Crawl Health (7%), and Speed (5%).
AI SEO
The practice of optimizing web content for discovery and citation by AI-powered search engines and language models, as distinct from traditional search engine optimization. AI SEO includes implementing AI protocols, optimizing for semantic understanding, and ensuring machine-readability alongside traditional SEO practices.
AI Visibility
The degree to which a website's content is discoverable, understandable, and citable by AI models and AI-powered search engines. AI visibility encompasses traditional SEO factors plus AI-specific protocols, structured data, and machine-readability.
Algorithm v2.0
The current version of the AgentReady scoring algorithm, which evaluates websites across 8 weighted factors to produce an AI readiness score from 0-100. V2.0 introduced AI protocol detection (llms.txt, MCP, NLWeb), improved schema validation, and citation correlation analysis.
Citability
The likelihood that a website or page will be cited by AI models in their responses. Determined by factors including original data (4.1x impact), schema markup (3.2x), FAQ content (2.7x), E-E-A-T signals (2.4x), llms.txt (2.1x), AI bot access (1.8x), freshness (1.6x), and speed (1.3x).
Research Methodology
Spearman Rank Correlation
A statistical measure (denoted by rho or ρ) of the strength and direction of monotonic association between two ranked variables. AgentReady uses Spearman rank correlation to measure the relationship between AI readiness scores and AI citation rates. Healthcare shows ρ=0.72 (strong), while the overall correlation across all industries is r=0.025 (negligible).
Structured Data
FAQPage Schema
A schema.org structured data type that marks up pages containing frequently asked questions and their answers. FAQPage schema enables AI models to extract Q&A pairs directly and correlates with 2.7x higher citation rates for informational queries. As of 2026, 22% of websites implement FAQ schema.
JSON-LD
JavaScript Object Notation for Linked Data. The recommended format for embedding schema.org structured data in web pages. JSON-LD scripts are placed in the <head> or <body> of HTML and are read by search engines and AI models without affecting page rendering.
Organization Schema
Schema.org markup that identifies a website's parent organization, including name, logo, contact information, and social profiles. One of the foundational schema types that AI models use to verify source identity and authority.
Schema Markup
Structured data vocabulary (schema.org) added to HTML that helps search engines and AI models understand the content, structure, and relationships on a web page. Common types include Organization, WebSite, FAQPage, Article, Product, and LocalBusiness. Sites with valid schema markup are cited 3.2x more by AI models. Only 34% of websites have valid, complete schema markup.
Structured Data
Data organized in a standardized format that machines can easily parse and understand. In the context of AI readiness, structured data typically refers to schema.org markup, but also includes well-organized HTML (heading hierarchy, semantic elements), API responses, and machine-readable files like llms.txt.
WebSite Schema
Schema.org markup that provides metadata about a website as a whole, including its search functionality, publisher, and content type. Helps AI models understand the overall structure and purpose of a site.
Technical SEO
Canonical URL
An HTML element that specifies the preferred version of a web page when duplicate or similar content exists at multiple URLs. Canonical tags help AI models avoid indexing duplicate content and ensure citations point to the correct page.
Crawl Budget
The number of pages a crawler (traditional or AI) will visit on a website within a given time period. Factors include server response time, site structure, internal linking, and content freshness signals. Sites with better crawl health allow AI models to index more of their content.
Crawl Health
A measure of how efficiently crawlers can access and navigate a website. Evaluated by factors like server response codes, redirect chains, broken links, XML sitemap quality, and page load speed. One of the 8 AI readiness scoring factors (weighted at 7%).
Internal Linking
Links between pages on the same website. Strong internal linking helps AI crawlers discover content, understand topic relationships, and determine page authority. Well-linked sites allow AI models to build a more complete understanding of a site's expertise areas.
Machine Readability
How easily automated systems (AI models, crawlers, APIs) can parse, understand, and extract information from a website. Improved by semantic HTML, structured data, clear content hierarchy, and machine-readable files (llms.txt, sitemap.xml).
Open Graph (OG) Tags
HTML meta tags that control how a web page appears when shared on social media and in link previews. While primarily for social sharing, Open Graph metadata also provides AI models with structured summaries of page content, title, description, and imagery.
Sitemap.xml
An XML file that lists all important URLs on a website, helping search engines and AI crawlers discover and prioritize content for indexing. A well-maintained sitemap improves crawl efficiency and ensures AI models have access to a site's full content catalog.
Speed (Core Web Vitals)
Website performance metrics including Largest Contentful Paint (LCP), First Input Delay (FID), Cumulative Layout Shift (CLS), and Time to First Byte (TTFB). One of the 8 AI readiness scoring factors (weighted at 5%). Sites with sub-2-second load times are cited 1.3x more by AI models.
Cite This Glossary
Suggested Citation
AgentReady. (2026). AI Readiness Glossary. Retrieved from https://agentready.site/glossary
Per-Term Citation
AgentReady. (2026). "[Term Name]." AI Readiness Glossary. agentready.site/glossary#[term-slug]
Embed Code
<a href="https://agentready.site/glossary" title="AI Readiness Glossary by AgentReady"> AI Readiness Glossary — AgentReady </a>
This glossary is published under CC BY 4.0. Free to reference with attribution. Last updated: March 30, 2026.
Test Your AI Readiness
Now that you know the terminology, scan your website to see your AI readiness score across all 8 factors.
Scan Your Website FreeExplore More
AI Readiness Index
Live aggregate scores by industry, CMS, and country.
Learn moreProtocol Adoption Tracker
Adoption rates for llms.txt, MCP, NLWeb, and more.
Learn moreStatistics
Individual citable stats with pre-formatted citations.
Learn moreMethodology
How the AI readiness score is calculated.
Learn more