TL;DR
Perplexity runs its own web crawler (PerplexityBot), builds a separate index, and uses Sonar models to retrieve roughly 10 candidate pages per query — then surfaces only 3-4 as cited sources. Pages that win citations answer the specific question in the first 100 words, carry structured data (FAQ/Article schema), show a visible author and date, and come from domains with cross-platform authority signals. Add those four elements to your best pages and you are competing for the slot.
Perplexity reviews roughly 10 candidate pages per query and surfaces only 3-4 as cited sources. The question is not whether your page exists on the web — it is whether it clears every filter in that retrieval and ranking pipeline. Most pages fail at one of four checkpoints: they are not crawled, they do not answer the question quickly enough, they carry no structural signals, or the domain has thin authority. Fix those four and you are competing for the slot.
Here is how the system actually works, what to put on your pages, and the common reasons you are currently not cited.
How Perplexity picks its sources
Perplexity runs its own web crawler called PerplexityBot — a dedicated indexing bot that is explicitly not used to train AI foundation models, only to build the retrieval index behind Perplexity's real-time answers. That distinction matters: Perplexity's index is separate from Google's, Bing's, and any other search engine. Ranking on Google is correlated with Perplexity citations but is neither necessary nor sufficient.
When a query comes in, Perplexity's Sonar models run retrieval, ranking, synthesis, and attribution as a single pipeline. The system retrieves candidate documents, ranks them by query relevance and source authority, synthesizes a combined answer, and then exposes only the subset of sources whose content actually survived the synthesis step. That last filter — the attribution step — is why a page can be retrieved and ranked and still not appear as a citation.
What the pipeline rewards:
- Effect on citation rate
- ~90% of top-cited pages follow this pattern
- Effect on citation rate
- ~42% higher citation rate for question queries
- Effect on citation rate
- ~38% lift for process queries
- Effect on citation rate
- ~23% lift for informational content
- Effect on citation rate
- ~3.2× more citations than derivative content
- Effect on citation rate
- Significantly higher retrieval frequency
| Signal | Effect on citation rate |
|---|---|
| Direct answer in first 100 words | ~90% of top-cited pages follow this pattern |
| FAQ Schema markup | ~42% higher citation rate for question queries |
| HowTo Schema markup | ~38% lift for process queries |
| Article Schema with author + date | ~23% lift for informational content |
| Original proprietary data | ~3.2× more citations than derivative content |
| Freshness (updated ≤30 days) | Significantly higher retrieval frequency |
These figures come from third-party GEO research and practitioner experiments — Perplexity does not publish an official ranking factor list. Treat them as directional, not exact.
What to put on the page
Answer first, every time. Perplexity's models decide whether a passage is relevant by reading the opening. If the first 100 words are a preamble, the system moves to the next candidate. If the first two sentences state the direct answer, you stay in the pool.
This is editing, not a rewrite. Take your current introduction, move the actual answer to the top, and compress the preamble into one transitional sentence below it.
Structure that helps the model parse you:
- Section headings as questions. Format
##headings as the actual question the section answers ("How does Perplexity rank sources?" not "Source Ranking"). This maps directly to query matching. - Self-contained 200-400 word sections. Each
##section should be readable as a standalone answer — no dangling references, no "as we discussed above." Perplexity often cites a single passage, not the full page. - Tables for comparisons and data. Structured data in Markdown tables is easier for models to parse and synthesize than the same information in prose.
- A visible FAQ block. Add 3-5 Q&A pairs at the bottom of your page, formatted as actual questions with complete answers. This is the same content you add FAQ Schema to, and it gives the model a clean, parseable answer-question index.
Freshness is not optional for competitive queries. Pages updated within the last 30 days are retrieved significantly more often than the same content sitting untouched since 2023. For evergreen pages — service descriptions, guides, comparison pages — schedule a quarterly refresh: update one statistic, add one new section, and update your dateModified schema field.
Structured data that helps
Perplexity's Sonar models read schema markup before they read your page body. Think of it as a header that tells the model what type of content follows and who is responsible for it.
The three schemas worth adding:
FAQ Schema (JSON-LD) — Add this to any page with a Q&A section. It is the highest-leverage markup for informational queries, where Perplexity is specifically trying to match a question to a clear answer.
Article Schema — Add this to every blog post and guide. The critical fields are datePublished, dateModified, author.name, and author.url (pointing to a real profile — your LinkedIn or your About page). A December 2025 Moz experiment found that adding author schema with LinkedIn verification increased citation probability by 19% across a test set of 500 articles.
HowTo Schema — Add this to any page that walks through a process step by step. Each step becomes a discrete retrieval target, which is why HowTo pages perform well on process queries.
What you do NOT need: Dataset Schema is powerful for statistically-heavy reference pages but irrelevant for most small business content. Product Schema helps e-commerce queries, not informational ones. Start with FAQ + Article + HowTo and move on.
Common reasons you are not cited
1. PerplexityBot is blocked. The most common silent killer. If your robots.txt has a broad Disallow: / or blocks PerplexityBot by name, you are invisible regardless of content quality. Check this first.
2. Your page answers the category, not the question. "We provide comprehensive marketing services to businesses of all sizes" is not an answer to any specific query. Perplexity is matching passages to specific questions. A page that explains everything about marketing services will lose to a 600-word page that answers "how much should a small business spend on digital marketing" specifically.
3. Thin authority. Perplexity's citation algorithm concentrates among a small number of domains per topic — authority compounds. If your domain has no third-party mentions, no real author profiles, and no presence in industry directories, fix the entity signals before optimizing page-level copy.
4. Stale content. A guide published in 2022 and never touched will lose to a slightly worse guide updated this quarter. For any query where freshness matters (pricing, tools, regulations, market data), Perplexity heavily favors recent sources.
5. No structured data. Pages without schema markup require Perplexity's models to infer page type, authorship, and date from unstructured HTML. That inference is imperfect. Schema removes the guesswork.
Getting cited by Perplexity is an engineering problem, not a content volume problem. You do not need more pages — you need your existing best pages to clear every filter in the retrieval pipeline. Allow the crawler, answer first, add schema, build cross-platform authority, and refresh quarterly. That combination covers the five main failure modes and puts you in the pool of sources Perplexity actually considers.
Frequently asked questions
Does Perplexity use Google's index?
No. Perplexity maintains its own proprietary index built by PerplexityBot, its dedicated web crawler. PerplexityBot does not crawl for AI model training — it crawls to populate the retrieval index that powers Perplexity's real-time answers. This means ranking on Google is neither necessary nor sufficient to be cited by Perplexity, though pages that are crawlable and authoritative tend to do well in both.
How fast does Perplexity pick up new pages?
Because Perplexity uses real-time web retrieval on top of its index, new pages can begin appearing in answers within 2-4 weeks of publication — faster than Google for fresh, authoritative content. Freshness is also a ranking signal: pages updated within the last 30 days are retrieved significantly more often than stale content for the same query.
Do backlinks matter for Perplexity citations?
Traditional backlink count matters less than cross-platform authority signals. Perplexity's citation algorithm weighs mentions across trusted third-party platforms — industry directories, review sites, media coverage — alongside domain authority. Research from Ahrefs found that unlinked brand mentions correlate roughly 3× more with AI visibility than raw backlink count, which suggests Perplexity is reading entity recognition signals, not just link graphs.
Does structured data directly affect Perplexity citations?
Yes, and the effect sizes are notable. FAQ Schema is associated with a 42% higher citation rate for question-based queries; HowTo Schema with a 38% lift for process queries; Article Schema with a 23% lift for informational content. These figures come from third-party GEO research and have not been confirmed by Perplexity directly, but the directional signal is consistent: schema markup helps Perplexity's models parse what a page answers, which makes it easier to match to queries.
Can I block PerplexityBot if I don't want to be cited?
Yes. PerplexityBot respects robots.txt. Add "User-agent: PerplexityBot / Disallow: /" to block it from your entire site, or target specific directories. Some publishers have done this in response to questions about content licensing; the trade-off is that blocking PerplexityBot removes you from Perplexity citations entirely.
Related articles
How to Get Your Content Cited by ChatGPT: 7 Techniques That Work in 2026
ChatGPT cites sources that lead with the answer, back claims with named statistics, and structure every section as a standalone passage. The 7 evidence-backed techniques, with the exact targets we use.
How AI Engines Choose What to Cite: The Retrieval Pipeline Explained (2026)
Every AI citation passes through four stages: crawl, chunk, retrieve, select. Understanding what each stage rewards explains every GEO technique — and why only ~11% of sites are cited by both ChatGPT and Perplexity.
Generative Engine Optimization (GEO): The Complete Guide for 2026
GEO is the practice of getting your brand cited in AI answers from ChatGPT, Perplexity, and Gemini. The techniques with real evidence behind them, how to measure results, and how long it takes — from the team that runs them.