FAQ schema where the answers do not match visible page content. AggregateRating with no real reviews. Article schema on a thin landing page. AI engines parse the schema, compare to visible content, and discount or ignore the entire page when there is a mismatch. Google has been actively penalizing this since 2023; LLMs simply skip it.
Fix: every schema block must mirror visible page content exactly. If the FAQ schema lists 'What is X?' the visible page must contain that question and answer.
'In today's fast-paced digital landscape, businesses are constantly looking for…' Pages that open with marketing boilerplate get truncated by the LLM before it reaches your substantive content. The retrieval layer does not understand that you were warming up; it sees 80 words of nothing and assigns the page low information density.
Fix: lead with the answer. The first 80 words must contain the substantive claim, definition, or insight. Marketing context, if any, comes after.
Hundreds of near-identical pages with only a city name or product variant swapped. AI engines aggressively dedupe these and the surviving page is the one with the most original content — meaning your other 99 pages contributed zero citation share.
Fix: either invest in genuine differentiation per page (real local content, real product specifics) or consolidate.
Anonymous content, ghostwritten content with stub bylines, or AI-generated content with no editorial responsibility all underperform in AI citation. The trust signal is absent.
Fix: real author with sameAs to LinkedIn, structured Person schema, and published track record.
Pages with no visible publish date, or with a publish date five years old and no update marker, get deprioritized for any time-sensitive query. The freshness penalty is heavy in Perplexity and Bing Copilot, lighter but real in ChatGPT and Gemini.
Fix: visible publishedDate AND updatedDate in the DOM and JSON-LD. Update the updatedDate when the content is genuinely refreshed (not just touched).
Robots.txt that blocks GPTBot, OAI-SearchBot, PerplexityBot, ClaudeBot, or Google-Extended will reliably make you invisible to the corresponding engine. We see this most often as accidental — someone added a blanket User-agent: * Disallow somewhere and never audited.
Fix: explicit allow rules for the retrieval bots you want citing you. The AI Crawlers guide has the bot-by-bot reference.
Content behind login or aggressive paywall walls cannot be retrieved by most AI bots. Cookie-wall modals that intercept page render also block crawlers from reading the actual content.
Fix: progressive disclosure with the substantive answer visible above any wall. Cloudflare-style bot allow lists for the engines you want.
Pages that render content client-side via JavaScript with no SSR fallback are partially or fully invisible to most AI bots, which do not execute JS as aggressively as Googlebot.
Fix: server-side render or static-prerender all substantive content. The page should be readable with JavaScript disabled.
'27 ways to grow your business' where 22 of the entries are weak filler. AI engines reward the strong entries and ignore the rest. Worse, the padding dilutes the entity signal — the page reads as low-quality.
Fix: 8 strong points beats 27 mediocre ones. Cut ruthlessly.
Repetition of head terms in body copy was an old SEO tactic that now reads as low-quality to retrieval models. The semantic-matching layer in modern AI engines treats redundant phrasing as a negative signal.
Fix: write for the model the way you write for a smart human. Define once, vary phrasing, never repeat for keyword density.
Pages that summarize others' work without adding original analysis, data, or perspective compete poorly against the primary sources they cite. The engine prefers to cite the source.
Fix: add original analysis, primary data, internal benchmarks, or unique perspective. Even small original contributions promote you from secondary to primary source.
Pages with no inbound internal links sit outside the topical cluster and miss the entity-strength multiplier the rest of the cluster provides. Even great content can be invisible if it is structurally orphaned.
Fix: every substantive post lives inside a hub. The Internal Linking guide has the architecture.
No. There is no formal manual-action mechanism for AI engines. These are negative signals that cause the retrieval layer to skip your URL or refuse to cite the page. The effect is the same as a penalty even though the mechanism is different.
Most fixes show measurable lift within 2–4 weeks. The exception is the entity-strength fixes (original data, author bios), which compound over months.
Sometimes — especially if the content is uniquely valuable. But every problem on the list reduces your citation rate; stack two or three and you become functionally invisible.
Boilerplate-heavy openings. Roughly 70% of the pages we audit lead with marketing fluff instead of the answer. The fix is simple and the impact is large.
Almost universally yes. The list is essentially a quality checklist; everything that helps AI search citation also helps Google ranking.