Skip to content
// geo-seo · ai-search · content-craft

My GEO citability scoring sheet: how I grade every page before publishing

A 100-point checklist for scoring a page on how likely AI search engines are to cite it. The five categories, the actual scoring rubric, and the cut-off score I will not publish below.

by İsmail Günaydın7 min readupdated

I score every page on toolgenx.com against a 100-point GEO citability rubric before publishing. Pages below 75 get rewritten. Pages above 85 typically appear in AI search citations within two weeks. This is the sheet, the rubric, and the rationale.

The five categories total 100 points:

  1. Answer block quality — 30 points
  2. Passage self-containment — 25 points
  3. Structural extractability — 20 points
  4. Authority signals — 15 points
  5. Update freshness — 10 points

Category 1: Answer block quality (30 points)

This measures whether the page contains 40-60 word passages that read as direct, quotable answers to specific questions. This is the single biggest factor in AI citation rates — the Princeton + GA Tech + IIT Delhi research from 2024 found that GEO-structured content produces 30-115% higher visibility in AI answers, almost entirely because of quotable passage density.

Scoring

  • 27-30: Every major heading is followed by a 40-60 word self-contained answer. Uses definition patterns ("X is...", "X refers to...") consistently. At least one quantified fact per passage.
  • 22-26: Most major headings have answer blocks. Definition patterns appear but inconsistently. Numbers and dates present but not in every passage.
  • 15-21: Some answer blocks present but buried in middle paragraphs rather than leading. Mixed quotable and unquotable structure.
  • 8-14: Narrative-driven prose. Reader has to extract answers from longer paragraphs. Few specific facts.
  • 0-7: Pure narrative or pure marketing. Nothing extractable.

Test for this

Read your page. Highlight every 40-60 word block that could be a standalone answer. If you cannot find five on a typical blog post, the score is below 20.

The cheap fix: start each H2 section with a one-line answer ("X is...") followed by 30-40 words of supporting detail. That alone moves most pages from the 15-21 band to the 22-26 band.

Category 2: Passage self-containment (25 points)

A passage scores well on extractability only if it can be lifted out of the page and still make sense. AI search engines extract; they do not quote with surrounding context.

Scoring

  • 23-25: 80%+ of content blocks are fully self-contained. Each passage names its subject explicitly. No pronouns referring to earlier sections.
  • 19-22: 60-79% self-contained. Most passages name their subject. Occasional pronoun reliance.
  • 13-18: 40-59% self-contained. Mixed pronoun and noun usage. Some passages need surrounding context.
  • 7-12: 20-39% self-contained. Heavy pronoun use. Most passages lose meaning when extracted.
  • 0-6: Continuous narrative. Extracting any paragraph loses meaning.

Test for this

Pick three paragraphs from the middle of your page. Read each in isolation. If you cannot tell what they are about without the surrounding context, the page scores below 18.

The cheap fix: replace ambiguous pronouns ("it", "this", "they") with the actual noun, even when it feels repetitive. AI extractors do not care about prose elegance; they care about referential clarity.

Category 3: Structural extractability (20 points)

This is the JSON-LD and HTML semantics layer. AI search engines parse structured data first because it is unambiguous. Pages with clean structure score higher even before the prose is evaluated.

Scoring

  • 18-20: Page has all relevant schemas (Article + Speakable + FAQPage + Person for blog; Product + Offer + FAQPage + BreadcrumbList for product). Schemas validate against Google's Rich Results Test. Headings are hierarchical (h1 → h2 → h3, no skips).
  • 14-17: Most schemas present. Validates with warnings but no errors. Heading hierarchy correct.
  • 9-13: Some schemas present. Validation warnings. Heading hierarchy mostly correct.
  • 4-8: Minimal schema. Validation errors. Heading hierarchy broken in places.
  • 0-3: No schema. No semantic HTML.

Test for this

Run the page through Google Rich Results Test. Count the recognized schema types. Less than three for a blog post or four for a product page means you are leaving citation score on the table.

Category 4: Authority signals (15 points)

AI search engines weight author credibility because their answers carry implicit liability. A passage from "Admin" on an unbranded site is worth less than the same passage from a named expert with verifiable credentials.

Scoring

  • 13-15: Page has a named human author with a Person schema. The sameAs array lists at least 3 real social profiles. Author has a real photo and a real bio elsewhere on the site. Organization schema present at root.
  • 10-12: Named author with Person schema. Some sameAs links. Bio exists. Organization schema present.
  • 6-9: Named author but missing Person schema or sameAs. Bio missing or stale.
  • 3-5: Generic author ("Admin", "Editor"). No author entity.
  • 0-2: No author byline at all.

Test for this

Click the author byline on your page. Does it lead somewhere meaningful (a real bio with social links and history)? Are the social profiles real and active? If no, the page scores below 6.

Category 5: Update freshness (10 points)

AI search engines weight recently updated pages higher, especially for time-sensitive queries. This is partly because freshness correlates with accuracy, partly because recently edited pages tend to be more carefully maintained.

Scoring

  • 9-10: dateModified within 30 days. Content has been substantively edited (not just timestamp bumped).
  • 7-8: dateModified within 90 days. Recent meaningful edits.
  • 4-6: dateModified within 6 months.
  • 2-3: dateModified within 1 year.
  • 0-1: Stale beyond a year, or no dateModified at all.

Test for this

Check the dateModified in the page's JSON-LD. If it has not changed in a year, score is below 3. If it changed last week because you fixed a typo, that does not count as "substantive" — be honest with the rating.

The cheap fix: a quarterly review where you read each page top-to-bottom and update at least one paragraph with new context. Real edits, not timestamp tricks.

Putting it together

A real example. The hub post on shipping as a solo founder scores:

  • Answer block quality: 28/30 (every major H2 has a 40-60 word answer block; some sections lead with narrative rather than answer)
  • Passage self-containment: 22/25 (a few "it" pronouns survived the editing pass)
  • Structural extractability: 20/20 (all schemas present, validates clean, hierarchy correct)
  • Authority signals: 14/15 (named author, full Person schema, 6 sameAs links, real photo on About)
  • Update freshness: 10/10 (dateModified two weeks ago after material edit)

Total: 94/100. Above my 85 reliable-citation threshold. The two-point gap on answer quality is fixable; the three-point gap on self-containment requires a more careful editing pass. Both are on my list for the next quarterly review.

The 75 cutoff

I do not publish pages below 75. Pages I have shipped accidentally at 60-70 score have one thing in common: I cannot find them in any AI search citation, ever. They get organic Google traffic, sometimes well, but the AI surface does not surface them.

The 75 cutoff is empirical. Above it I see occasional citations. Above 85 I see reliable citations. Below 75 I see nothing.

What the sheet does not measure

Three things this scoring sheet does NOT measure, deliberately:

  • Conversion rate. A page can be highly citable and still not convert. GEO is for visibility, not closing.
  • Reader enjoyment. Optimizing purely for the sheet produces drier prose. Hand-tune for the human reader after the score is clean.
  • Long-term defensibility. A 95-score page that mostly aggregates other people's research is more vulnerable to obsolescence than a 78-score page with original first-party data.

The sheet is one input. Editorial judgment is the other.

How to use it on your own pages

If you are starting from scratch:

  1. Score one existing page against the rubric. Write the score per category. This calibrates your eye.
  2. Pick the lowest-scoring category and fix that one. Most pages fail on either answer block quality or passage self-containment.
  3. Re-score. If you have moved 10+ points in the target category, ship it. Otherwise iterate.
  4. Set a publish threshold. 75 is a reasonable starting cutoff for a small shop. Adjust based on what you see in actual AI citations over the next month.

The sheet runs at the editorial layer — it is what separates content that gets cited from content that gets ignored. The technical layer (robots.txt allowing AI crawlers, llms.txt giving a curated tour, JSON-LD validating cleanly) is necessary but not sufficient. The editorial discipline of the scoring sheet is what closes the loop.


The full GEO audit that runs this scoring sheet against an entire site is built into AI Search Visibility Toolkit. The content framework that produces high-scoring pages from scratch is in AI Content Blueprint. The JSON-LD library that handles the structural extractability category is Structured Data Pro Pack.

// faq

Frequently asked

Is this scoring sheet the same one used by AI search engines?
No. I have no insight into what scoring AI search engines actually use. The sheet is reverse-engineered from the published Princeton-GA Tech-IIT Delhi GEO research (2024) plus three months of testing what gets cited versus what does not on my own pages. It is empirical, not insider.
Why a 75 cutoff and not 80 or 90?
75 is the score below which I started seeing zero AI citations in my own testing. 75-85 produces occasional citations. 85+ produces citations reliably enough that I treat it as the floor for hub posts. The cutoff will probably shift as the AI search ecosystem matures.
How long does scoring a page take?
About 15 minutes the first few times. Five minutes once the rubric is muscle memory. The slow part is reading every passage and asking "could this 60-word block stand alone as an answer to a question". After 50 pages you start to spot the answer-able structure on a quick scan.
Does the sheet apply to product pages too or just blog posts?
Both. The categories are the same. Product pages tend to score lower on "update freshness" (they do not get edited often) but higher on "structural extractability" (the JSON-LD does heavy lifting). Blog posts are the opposite. The 75 cutoff applies to both.
Can a page score 100?
In theory yes. In practice I have not produced one. The highest I have shipped is 92. The last 8 points tend to require things like multiple original studies cited or first-party data the page is the only source for. Worth knowing they exist as a ceiling but not worth optimizing for.

// related products

// related writing

Written by

İsmail Günaydın

Software Engineer · SEO/GEO/AEO Strategist · Digital Entrepreneur

Software engineer and digital entrepreneur with 15+ years building SEO-driven products. Founder of ModernWebSEO and ToolGenX. Focused on developer experience, web performance, and making technical content accessible. Builds customer-generating digital infrastructure through SEO, AEO, and GEO strategies.