TL;DR
llms.txt is a plain Markdown file at your domain root that tells AI engines which pages on your site are worth reading and why — it is neither robots.txt (which controls crawling) nor sitemap.xml (which lists URLs). Proposed by Jeremy Howard of Answer.AI in September 2024, it has grown to roughly 10% adoption across the web as of mid-2026, with confirmed support from Perplexity and Anthropic/Claude and observable (but unconfirmed) effects on ChatGPT citations. It takes under an hour to create, costs nothing to host, and compounds over time as more platforms adopt it — so there is no credible argument against publishing one.
llms.txt is a plain Markdown file you publish at your domain root to tell AI engines which pages on your site are worth reading and why. It is not a standard yet, not universally supported, and will not single-handedly get you cited by ChatGPT overnight — but it takes under an hour to create, costs nothing to maintain, and is already confirmed to influence citation behavior on Perplexity and Anthropic's Claude. That risk/reward ratio is hard to argue against.
Here is everything you actually need to know to create one that works.
What llms.txt is — and what it is not
Jeremy Howard, co-founder of Answer.AI, published the original proposal at llmstxt.org in September 2024. The premise is simple: AI answer engines struggle with full websites. They have to parse HTML, navigate JavaScript, skip ads and boilerplate, and guess which of your two hundred pages actually answers the question being asked. llms.txt solves that by giving them a structured shortcut.
Think of it as a curated index — a document you write that says: here is what my site does, here are the pages that matter, here is a one-line explanation of what each page answers.
Three files, three completely different jobs:
- Job
- Crawl control
- What it controls
- Which URLs bots may or may not fetch
- Job
- URL index
- What it controls
- All your pages with metadata (modified date, priority)
- Job
- Curation guide
- What it controls
- Which pages are most relevant for AI retrieval, with context
| File | Job | What it controls |
|---|---|---|
| robots.txt | Crawl control | Which URLs bots may or may not fetch |
| sitemap.xml | URL index | All your pages with metadata (modified date, priority) |
| llms.txt | Curation guide | Which pages are most relevant for AI retrieval, with context |
robots.txt tells crawlers where they are allowed to go. sitemap.xml lists every destination. llms.txt tells them which stops are actually worth making. You need all three; they do not replace each other.
What the format looks like
The spec calls for plain Markdown with a specific structure. Here is a working skeleton:
> # Your Site or Brand Name > > > One-paragraph summary of what your site does and who it serves. This is what AI engines read to decide if your domain is relevant to a query. > > ## Core Pages > > - Page Title: What this page answers in one direct sentence. > - Service Name: Who this service is for and what problem it solves. > > ## Resources > > - Guide Title: What the reader learns from this guide.
The H1 is your site name. The blockquote is your site summary — write it as if answering "what does this website do?" in one paragraph. H2 sections group related pages. Each bullet is a Markdown link followed by a colon and a short, honest annotation.
What makes a good annotation: state exactly what the page answers, not what it sells. "Step-by-step checklist for setting up Google Business Profile for a service-area business" is useful to an AI engine deciding whether to retrieve your page. "Our comprehensive guide to dominating local search" is not.
One optional companion: if a page is long or complex, you can also publish llms-full.txt with the actual content of your key pages concatenated. This is more effort and only useful for documentation-heavy sites — for most small businesses, a clean llms.txt is sufficient.
Where to host it
One rule: https://yourdomain.com/llms.txt — at the root of your primary domain, not in a subfolder, not on a subdomain.
- WordPress: Drop the file in your site root via FTP or your host's file manager (same directory as wp-config.php).
- Next.js: Place it at
/public/llms.txt— Next.js serves everything in/publicas static files. - Webflow / Squarespace / Wix: Use the custom code or file upload feature to place a static file at the root.
- Static site generators: Put it in the build output root (same level as index.html).
Verify by loading the URL in a browser. You should see plain text — not HTML, not a file download prompt. If you see a download dialog, your server is sending the wrong Content-Type header; set it to text/plain.
Does it actually work today?
Honest answer: partially, and unevenly across platforms.
Confirmed support:
- Perplexity has publicly confirmed it retrieves llms.txt and uses it to prioritize which pages to read when assembling answers.
- Anthropic's Claude (Claude.ai and Claude Desktop retrieval workflows) respects llms.txt directives.
- IDE agents — Cursor, Continue, Cline, Aider — actively look for llms.txt when you point them at a documentation site. This is the highest-confidence use case today.
Observable but unconfirmed:
- ChatGPT / OpenAI has not officially documented llms.txt support. Practitioners who publish well-formed files report correlated improvements in ChatGPT citation patterns, but this cannot be verified against OpenAI's retrieval internals.
No confirmed support:
- Google has not confirmed that Gemini or AI Overviews use llms.txt. Google's preferred signal stack remains robots.txt, sitemap.xml, and structured data.
A SE Ranking analysis of 300,000 domains found roughly 10% adoption as of early 2026 — meaning nine out of ten sites have not published one yet. That is actually a window: in categories where most competitors have not filed llms.txt, your curated file is the only structured signal an AI engine has when comparing sources.
This is a proposed convention gaining real traction, not an official standard. Do not expect it to single-handedly fix weak content or an untrustworthy domain. But as a low-cost, additive signal on top of solid GEO foundations, it earns its hour of setup time.
The actual impact on citation rates
Treat llms.txt as a directional signal, not a citation guarantee. Brands publishing well-curated files report modest but measurable citation rate improvements on Perplexity and Claude — particularly for niche queries where the annotated page is the clearest answer available.
The bigger compounding effect is in developer and agentic contexts: when someone uses an AI coding assistant or research agent and points it at your documentation or service pages, llms.txt is often the first thing the agent reads to understand your site. That is a different kind of citation — not a consumer search result, but a business tool that now treats your content as authoritative.
The bar to publish llms.txt is low enough that the right question is not "should I?" — it is "what is taking me more than an hour?" Write it for the platforms that already support it, file it, and revisit quarterly as the standard matures.
Frequently asked questions
Is llms.txt an official standard like robots.txt?
No. llms.txt is a community-driven proposal, not an IETF or W3C standard. Jeremy Howard published the original spec at llmstxt.org in September 2024 and it gained traction through voluntary adoption — much the same path robots.txt took in the 1990s before it became a de-facto standard. As of mid-2026, no formal standardization body has ratified it.
Do ChatGPT and Perplexity actually read llms.txt today?
Perplexity has publicly confirmed it retrieves llms.txt and uses it to prioritize page selection. Anthropic's Claude (in Claude.ai and Claude Desktop retrieval workflows) also respects it. OpenAI has not officially confirmed support, but practitioners report correlated changes in ChatGPT citation patterns after publishing a well-formed llms.txt. Google has not confirmed support — its preferred signals remain robots.txt, sitemap, and structured data.
What is the difference between llms.txt, robots.txt, and sitemap.xml?
Three different jobs: robots.txt is a crawl-control file (it tells bots which URLs they may or may not fetch). sitemap.xml is an index of URLs with metadata (last modified, priority). llms.txt is a curation file — it tells AI engines which pages on your site are most useful, gives them context about what your site does, and can include short annotations explaining why each page matters. You need all three; they do not overlap.
Where exactly do I put llms.txt on my site?
At the root of your domain: https://yourdomain.com/llms.txt — not in a subfolder, not on a subdomain. The file must be publicly accessible without authentication. Serve it as plain text (Content-Type: text/plain). Many hosting platforms let you drop a static file directly into the web root; for Next.js, place it in the /public directory.
How often should I update llms.txt?
Update it whenever you publish a major new page or resource that AI engines should know about. It is not a real-time feed — quarterly or whenever content significantly changes is adequate for most small business sites. More important than frequency is accuracy: a stale llms.txt that points to outdated or low-quality pages is worse than none.
Related articles
robots.txt for AI Crawlers: The Complete Allow List for 2026
Blocking AI crawlers means zero AI citations. The 2026 taxonomy of AI user-agents (training, search, user-triggered, opt-out tokens), the exact robots.txt to copy, and the CDN traps that block bots silently.
Generative Engine Optimization (GEO): The Complete Guide for 2026
GEO is the practice of getting your brand cited in AI answers from ChatGPT, Perplexity, and Gemini. The techniques with real evidence behind them, how to measure results, and how long it takes — from the team that runs them.
Schema Markup for AI Search: What Actually Helps in 2026 (And What Is Hype)
No special schema makes AI engines cite you — but Organization, Article, Person, and FAQPage markup builds the entity recognition that citation depends on. The honest guide: what to implement, what to skip, and the JSON-LD to copy.