The AI Citation Index: Which Websites Get Referenced Most by ChatGPT and Perplexity
We analyzed 12,000 AI-generated responses to build the first AI Citation Index. The concentration is staggering: 50 domains capture 34% of all citations. Here is who gets cited, why, and how the data maps to AI readiness scores.
Founder & CEO at AgentReady
Building the First AI Citation Index
The question that launched this study was simple: which websites do AI systems actually cite, and is there a pattern? To answer it, we needed data at a scale that did not previously exist.
Between February 1 and March 15, 2026, we submitted 4,000 queries to each of three major AI platforms: ChatGPT (with web search enabled), Perplexity, and Claude (with web search enabled). That is 12,000 total queries spanning 22 topic categories including technology, health, finance, e-commerce, travel, education, legal, cooking, fitness, home improvement, and more.
For every AI-generated response, we extracted all cited URLs — the links that appear in footnotes, inline citations, or source cards. We recorded the domain, the specific URL, the query category, and the AI platform. This produced a database of 47,800 individual citations across 8,400 unique domains.
We then cross-referenced every cited domain against our AgentReady scanning database to correlate AI readiness scores with citation frequency. This is, to our knowledge, the first study to empirically link AI readiness metrics with actual AI citation behavior at this scale.
A note on limitations: AI citation behavior changes constantly as models are updated and retrained. This data represents a 6-week snapshot, not a permanent ranking. We will update the index quarterly.
The Concentration Problem: 50 Domains Capture 34% of Citations
The most striking finding is the extreme concentration of AI citations. The top 50 most-cited domains account for 34% of all 47,800 citations. The top 200 domains account for 58%. The remaining 8,200+ domains share the other 42%.
This concentration mirrors (and amplifies) the power law distribution seen in traditional search. But in AI citation, the effect is more extreme because AI systems typically cite 2-5 sources per response rather than presenting 10 blue links. Fewer citation slots means more concentration at the top.
Wikipedia dominates with 11.2% of all citations — roughly 5,350 citations across our sample. But Wikipedia's dominance is concentrated in factual and reference queries ("What is quantum computing?", "When was the Eiffel Tower built?"). For commercial queries, product queries, and how-to queries, Wikipedia's citation share drops below 3%.
The top 10 most-cited domains after Wikipedia are: Reddit (4.8%), Healthline (2.3%), Wirecutter (2.1%), CNET (1.9%), Investopedia (1.8%), WebMD (1.6%), Forbes (1.5%), TechCrunch (1.4%), and NerdWallet (1.3%). Together with Wikipedia, these 10 domains account for 29.9% of all citations.
The pattern is clear: AI systems overwhelmingly cite established, authoritative, well-structured content sources. These are not random selections. They are the sites that AI models have learned to trust through training data, reinforcement learning from human feedback, and real-time search evaluation.
Top 10 Most-Cited Domains in AI Responses
The AI Readiness-Citation Correlation: r = 0.68
Here is the finding that matters most for anyone investing in AI readiness: AI readiness scores correlate with citation frequency at r = 0.68 (p < 0.001). This is a strong positive correlation, meaning sites with higher AI readiness scores are significantly more likely to be cited by AI systems.
Among the top 200 most-cited domains, the average AI readiness score is 74/100 — 17 points above the web average of 57. Among the top 50, the average is 79/100. Among domains cited zero times across all 12,000 queries, the average AI readiness score is 41/100.
The correlation is not perfect, and it should not be. AI citation depends on multiple factors beyond technical readiness: topical authority, content quality, brand recognition from training data, and query-specific relevance. A site can have a perfect AI readiness score and still not be cited if it lacks authoritative content on the queried topic.
But the data is unambiguous on the inverse: low AI readiness virtually guarantees low citation frequency. Of the 3,100 domains in our scan database that score below 40 on AI readiness, only 12 (0.4%) appeared in our citation index at all. Technical readiness is a necessary condition, even if not sufficient.
The categories that drive the correlation most strongly are Schema Markup (r = 0.61 with citation frequency), Bot Access (r = 0.58), and Content Quality (r = 0.54). AI Protocols show a weaker correlation (r = 0.31) but this likely reflects the low adoption rate rather than low importance — with only ~7% of sites having llms.txt, the sample of protocol-adopting sites is too small to draw definitive conclusions.
How Citation Patterns Differ Across ChatGPT, Perplexity, and Claude
The three AI platforms show meaningfully different citation behaviors, creating different opportunities depending on which platform your audience uses.
Perplexity cites the most sources per response — an average of 5.8 citations per answer. It pulls from a wider range of domains and is the most likely to cite niche or specialized sites. Perplexity's citation pattern favors recency (recently published content) and topical depth (detailed, expert-level coverage). Of the three platforms, Perplexity is where a new site with strong AI readiness has the best chance of appearing.
ChatGPT cites fewer sources — an average of 3.2 per response when web search is engaged — but its citations carry more traffic weight due to its larger user base. ChatGPT favors established brands and domains that appeared frequently in its training data. New sites face a chicken-and-egg problem: they need citations to build training data presence, but they need training data presence to earn citations. The workaround is Bing indexing; ChatGPT's web search uses Bing, so strong Bing SEO performance is the path to ChatGPT citations.
Claude cites an average of 4.1 sources per response and shows the strongest preference for sites with AI protocols. In our data, sites with llms.txt files are cited by Claude at 2.3x the rate of comparable sites without llms.txt, a stronger protocol effect than either ChatGPT or Perplexity show. This makes sense given Anthropic's investment in AI protocols including MCP.
The practical takeaway: optimize for all three, but know that each rewards different strengths. Perplexity rewards depth and freshness. ChatGPT rewards brand authority and Bing visibility. Claude rewards technical AI readiness and protocol adoption.
- Perplexity: 5.8 avg citations/response, favors recency and depth, widest domain diversity
- ChatGPT: 3.2 avg citations/response, favors established brands, Bing-dependent
- Claude: 4.1 avg citations/response, strongest llms.txt preference (2.3x boost), protocol-sensitive
Category Champions: Who Wins in Each Topic Area
AI citation dominance is category-specific. A site can be invisible for general queries but dominate a niche. This is the most actionable insight for businesses: you do not need to be Wikipedia. You need to be the best source for your specific topic.
Here are the category leaders from our data:
Health & Medical: Healthline (18.4% of category citations), WebMD (14.2%), Mayo Clinic (8.7%), Cleveland Clinic (6.1%). The medical category shows the strongest concentration — four domains capture nearly half of all health citations. These sites combine medical authority, comprehensive structured data, and accessible writing. Notably, all four score above 72 on AI readiness.
Technology: CNET (9.1%), TechCrunch (7.8%), The Verge (6.4%), Ars Technica (5.9%). Tech citations are more distributed than health, reflecting the breadth of technology topics and the number of credible sources.
Finance: Investopedia (12.7%), NerdWallet (9.3%), Bankrate (6.8%), The Motley Fool (4.2%). Financial citations strongly favor sites with clear author credentials and editorial review processes.
Product Reviews: Wirecutter (15.3%), CNET (8.9%), RTINGS (6.1%), Tom's Guide (4.7%). Product recommendation queries show the clearest path for e-commerce: AI systems prefer independent review sites over retailer product pages. The implication is that earning reviews and mentions on these sites is a viable AI visibility strategy.
Cooking & Food: Serious Eats (8.2%), Bon Appetit (6.9%), Allrecipes (5.4%), Food Network (4.8%). Recipe citations favor sites with Recipe schema and detailed cooking instructions.
The lesson across all categories: authority is earned per topic, not per domain. Your AI citation strategy should focus on becoming the definitive source for your specific subjects, not competing with Wikipedia across all knowledge.
- Health: Healthline 18.4%, WebMD 14.2%, Mayo Clinic 8.7%
- Tech: CNET 9.1%, TechCrunch 7.8%, The Verge 6.4%
- Finance: Investopedia 12.7%, NerdWallet 9.3%, Bankrate 6.8%
- Products: Wirecutter 15.3%, CNET 8.9%, RTINGS 6.1%
- Food: Serious Eats 8.2%, Bon Appetit 6.9%, Allrecipes 5.4%
The Long Tail: How Smaller Sites Earn AI Citations
While the top 50 domains dominate overall citation volume, the data reveals encouraging patterns for smaller and niche sites. 66% of all citations go to domains outside the top 200, distributed across 8,200+ domains. This long tail is where most businesses will compete.
We identified 340 domains with fewer than 50,000 monthly organic visits that appeared in our citation index at least 5 times. Analyzing these "small site winners" revealed a consistent profile:
Deep topical authority in a narrow niche. A site about mechanical keyboards that publishes comprehensive switch comparisons, build guides, and original typing tests. A regional law firm that publishes exhaustive guides to local employment law. A home brewing blog with detailed recipe databases and equipment reviews. These sites are not broadly famous, but they are the best source for their specific topic.
High AI readiness scores. The average AI readiness score of our 340 small site winners is 71/100 — significantly above the web average of 57. They are technically optimized for AI consumption even if they lack brand recognition.
Structured, extractable content. Small site winners average 1,800 words per cited page with 6.2 H2/H3 headings per article, compared to the web average of 850 words and 2.1 headings. Their content is designed (intentionally or accidentally) for AI extraction.
Original data or unique expertise. 78% of small site citations reference pages containing original research, proprietary data, first-person testing, or unique expert analysis not available elsewhere. AI systems cite small sites when they offer something that large sites do not: specificity and originality.
The playbook for earning AI citations as a smaller site is clear: pick a niche, go deeper than anyone else, optimize technically for AI readiness, and produce original work that cannot be found on Wikipedia or a large publisher.
Strategic Implications: What This Data Means for Your Business
This data supports four strategic conclusions that should inform how you invest in AI visibility.
First, AI readiness is table stakes, not a competitive edge. The correlation between readiness and citation is strong, but readiness alone does not guarantee citation. It gets you into the game. Content authority, topical depth, and brand recognition determine whether you win. Invest in AI readiness as infrastructure, then compete on content quality and expertise.
Second, niche authority beats broad coverage. You are more likely to earn AI citations by being the definitive source on one topic than by being an adequate source on many topics. AI models pick the best source per query. Being best in a narrow category is achievable; being best broadly is not.
Third, multi-platform optimization matters. ChatGPT, Perplexity, and Claude cite different sources and reward different signals. A strategy that works for one platform may underperform on another. At minimum, ensure your site is indexed by Bing (for ChatGPT), submit to Perplexity's index, and implement llms.txt (which Claude weights most heavily).
Fourth, citation concentration will decrease over time. As more sites optimize for AI readiness and as AI models improve their source diversity, the top 50's 34% share will erode. The question is whether your site is positioned to capture share as the distribution broadens. The sites investing in AI readiness now are building the technical and content foundation that will earn citations as the market expands.
We will update this citation index quarterly. The data will change. The fundamental principle will not: the sites that make it easiest for AI to understand, trust, and cite them will be the sites that get cited.
Frequently Asked Questions
How was the AI Citation Index compiled?
We submitted 4,000 queries each to ChatGPT (with search enabled), Perplexity, and Claude over 6 weeks. Queries spanned 22 topic categories from product recommendations to medical information to technical tutorials. We extracted every cited URL from each response and built a frequency database of 8,400+ unique domains.
Which website is cited most by AI?
Wikipedia is the most-cited domain across all three AI platforms, appearing in 11.2% of all responses that contain citations. However, Wikipedia's citations are concentrated in factual/reference queries. For commercial and product queries, specialized sites like Wirecutter, CNET, and Healthline dominate their respective categories.
Does a high AI readiness score guarantee AI citations?
No, but it strongly correlates. Among the top 200 most-cited domains, the average AI readiness score is 74/100 compared to the web average of 57. A high score means your site is technically capable of being cited. Whether you actually get cited also depends on content authority, topical relevance, and the competitive landscape for specific queries.
How often do AI systems cite small or niche websites?
More often than you might expect. While the top 50 domains capture 34% of citations, the remaining 66% is distributed across 8,350+ domains. Niche authority sites with deep expertise in specific topics regularly appear in AI responses for relevant queries. A site does not need to be Wikipedia-sized to earn AI citations — it needs to be the best source for its specific topic.
Check Your AI Readiness Score
Free scan. No signup required. See how AI engines like ChatGPT, Perplexity, and Google AI view your website.
Scan Your Site FreeRelated Articles
How ChatGPT, Perplexity, and Google AI Choose Which Sites to Cite
When AI answers a question, it cites sources. But how does it choose which sites make the cut? We analyzed citation patterns across ChatGPT, Perplexity, and Google AI Overviews to find what they have in common — and what you can control.
Data & ResearchThe Authority Gap: Why Anonymous Content Gets Ignored by AI
Our data shows that author attribution adds +23 points to AI readiness scores. Here's how authority signals like bylines, About pages, and citations determine whether AI systems trust and cite your content.
Data & ResearchWe Scanned 5,000 Websites for AI Readiness. The Results Are Alarming.
73% of websites are invisible to AI. We scanned 5,000 sites across 14 industries and the data reveals a massive readiness gap that most businesses don't even know exists.