06 GEO TECHNIQUE · GUGUBRAND PLAYBOOK

llms.txt & AI Crawler Access

Before an AI engine can cite you, its crawler must be able to fetch your pages and understand your site. That access layer is configuration, not content — and most sites get it wrong by accident.

TECHNIQUE 01

Decide your AI-crawler policy deliberately

Letting AI engines read your marketing site is how you become citable; blocking them is how you disappear from AI answers. For most businesses the trade is obvious — but make it a decision, not an accident.

Audit robots.txt today: confirm GPTBot, ClaudeBot, PerplexityBot, and Google-Extended are not disallowed from your public marketing pages.

TECHNIQUE 02

Whitelist AI bots past your rate limiter

Hosting firewalls and DDoS protection often throttle exactly the crawlers you want. The site looks fine to you while every AI fetch silently times out.

If your host rate-limits (Hostinger, Cloudflare, etc.), add user-agent exceptions for the major AI crawlers — and verify with the host's logs that their fetches return 200.

TECHNIQUE 03

Ship an llms.txt

llms.txt is a plain-markdown map at your domain root that tells language models what your site is, what you do, and which pages matter — the AI-era counterpart to a sitemap. It costs an hour and removes all ambiguity about your entity.

Write /llms.txt with: one-paragraph description, your services, locations, languages, and links to your 10 most important pages with one-line summaries. Keep it in sync with your actual positioning.

TECHNIQUE 04

Serve real HTML, not a JavaScript shell

Most AI crawlers do not execute JavaScript. A client-side-rendered page reads as empty to them — whatever your lighthouse score says.

Fetch your key pages with curl and confirm the full text content is present in the response body. Static export or SSR solves this permanently.

TECHNIQUE 05

Keep entity facts in crawlable text

Phone numbers in images, addresses in footers rendered by JS, services listed only in a PDF — all invisible to the systems deciding whether to recommend you.

Put name, services, cities, phone, and languages in plain HTML text on the homepage and contact page. Boring, crawlable, quotable.

TECHNIQUE 06

Monitor AI-crawler traffic

Your server logs show which AI bots visit, how often, and what they fetch — a direct read on whether your access layer works and which pages the engines care about.

Monthly, grep access logs for GPTBot, ClaudeBot, and PerplexityBot. Zero visits from all three means an access problem, not a content problem.

The checklist

robots.txt explicitly allows major AI crawlers on marketing pages
Rate-limiter exceptions configured and verified for AI bots
/llms.txt live, accurate, and aligned with current positioning
Key pages fully readable via curl (no JS-only content)
Entity facts (services, cities, phone, languages) in plain text
Monthly log check: AI crawlers fetching and receiving 200s

Common questions

Does llms.txt actually get used by AI engines?

Adoption is growing across AI crawlers and agent tools, and the cost is trivial. It also forces you to write the canonical one-page description of your business — valuable even before every engine reads it.

Should I block AI crawlers to protect my content?

For proprietary content, maybe. For a marketing site, blocking AI crawlers means AI assistants describe your market without you in it. Most businesses should allow them on public pages.

Want this done for you?

We build the brand, encode it as an AI skill, and run the SEO and GEO playbook with AI agents — bilingual EN/ES, measured monthly.

Book a fit call