TECHNIQUE 01
Decide your AI-crawler policy deliberately
Letting AI engines read your marketing site is how you become citable; blocking them is how you disappear from AI answers. For most businesses the trade is obvious — but make it a decision, not an accident.
Audit robots.txt today: confirm GPTBot, ClaudeBot, PerplexityBot, and Google-Extended are not disallowed from your public marketing pages.
TECHNIQUE 02
Whitelist AI bots past your rate limiter
Hosting firewalls and DDoS protection often throttle exactly the crawlers you want. The site looks fine to you while every AI fetch silently times out.
If your host rate-limits (Hostinger, Cloudflare, etc.), add user-agent exceptions for the major AI crawlers — and verify with the host's logs that their fetches return 200.
TECHNIQUE 03
Ship an llms.txt
llms.txt is a plain-markdown map at your domain root that tells language models what your site is, what you do, and which pages matter — the AI-era counterpart to a sitemap. It costs an hour and removes all ambiguity about your entity.
Write /llms.txt with: one-paragraph description, your services, locations, languages, and links to your 10 most important pages with one-line summaries. Keep it in sync with your actual positioning.
TECHNIQUE 04
Serve real HTML, not a JavaScript shell
Most AI crawlers do not execute JavaScript. A client-side-rendered page reads as empty to them — whatever your lighthouse score says.
Fetch your key pages with curl and confirm the full text content is present in the response body. Static export or SSR solves this permanently.
TECHNIQUE 05
Keep entity facts in crawlable text
Phone numbers in images, addresses in footers rendered by JS, services listed only in a PDF — all invisible to the systems deciding whether to recommend you.
Put name, services, cities, phone, and languages in plain HTML text on the homepage and contact page. Boring, crawlable, quotable.
TECHNIQUE 06
Monitor AI-crawler traffic
Your server logs show which AI bots visit, how often, and what they fetch — a direct read on whether your access layer works and which pages the engines care about.
Monthly, grep access logs for GPTBot, ClaudeBot, and PerplexityBot. Zero visits from all three means an access problem, not a content problem.