Phase Layer Tests Primary Tools 1 General Index Technical indexing status Google Search 2 General Index Semantic parsing quality Google Search, Cache 3 General Index Entity recognition Google Search 4 Summarizer Index Injection presence Perplexity (primary) 5 Summarizer Index Cross-platform confirmation ChatGPT, Gemini, Grok 6 Delta Analysis Layer...
Lee Sharks
Semantic Economy Institute
document_metadata:
title: "Semantic Indexing Probe Protocol v1.0"
subtitle: "Mapping General Index and Summarizer Injection Layers"
author: "Lee Sharks"
institution: "Semantic Economy Institute (SEI)"
document_type: "Diagnostic Protocol / Reusable Tool"
hex: "09.SEI.INDEXING_PROBE"
doi: "[pending]"
layer: "METHODOLOGY"
abstract: |
This protocol maps the relationship between two distinct indexing layers
that govern content discoverability in AI-augmented search: Layer 1 (General
Index), which determines whether content EXISTS in search, and Layer 2
(Summarizer Index), which determines whether content gets INJECTED into
AI-generated responses. The protocol uses coordinated queries across Google
and multiple AI summarizers (Perplexity, ChatGPT, Gemini, Grok) to compute
the delta between layersโrevealing the selection criteria that cause content
to be indexed but not injected. Designed for use by anyone publishing
content intended for AI consumption, particularly high-density semantic
architecture that may trigger content-type or authority filters.
developed_by:
assembly:
human: "Lee Sharks"
version_history:
date: "2026-01-23"
changes: "Initial integrated protocol (Google + Summarizer layers)"
Content discoverability in AI-augmented search operates through two distinct indexing layers:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ LAYER 2: SUMMARIZER INDEX (Injection Layer) โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ What gets SELECTED for injection into AI responses โ
โ Criteria: source authority, domain reputation, content โ
โ type, semantic density, recency, proprietary reranking โ
โ โ
โ Tested via: Perplexity, ChatGPT, Gemini, Grok โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ LAYER 1: GENERAL INDEX (Google) โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ What EXISTS in search results โ
โ Criteria: crawlability, canonical signals, robots โ
โ directives, content quality, PageRank, SERP ranking โ
โ โ
โ Tested via: Google Search (site:, inurl:, exact match) โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
The summarizer index is characterized by its delta from the general index.
For any query:
The delta reveals the summarizer's selection criteriaโthe hidden rules governing what passes from existence (Layer 1) to injection (Layer 2).
Content can be:
High-density semantic architecture (technical documentation, structured data, YAML-heavy content) often triggers content-type filters at Layer 2, resulting in indexing without injection.
Phase
Layer
Tests
Primary Tools
1
General Index
Technical indexing status
Google Search
2
General Index
Semantic parsing quality
Google Search, Cache
3
General Index
Entity recognition
Google Search
4
Summarizer Index
Injection presence
Perplexity (primary)
5
Summarizer Index
Cross-platform confirmation
ChatGPT, Gemini, Grok
6
Delta Analysis
Layer comparison
Collation of results
7
Pattern Mapping
Selection criteria
Aggregation
Determine whether content EXISTS in Google's index and identify any technical barriers.
For target URL [TARGET_URL]:
Query
Purpose
site:[domain] "[exact title]"
Title match on domain
site:[domain] inurl:[url-slug]
URL presence
"[exact title]"
Title match anywhere
"[DOI if applicable]"
DOI citation presence
"[author name]" "[project name]"
Author-project linkage
Signal
Values
Interpretation
HTTP status
200/301/404/etc.
Technical accessibility
Canonical URL
match/mismatch
Index target
Robots directives
none/noindex/nofollow
Explicit exclusion
Results found
yes/no/partial
Index presence
Position
1-N or not found
Rank
phase_1_general_technical:
target_url: ""
indexed: [yes/no/partial]
http_status: ""
canonical_match: [yes/no/unknown]
robots_directives: ""
position_for_exact_match:
suppression_pattern: [none/soft-404/canonical-mismatch/algorithmic]
Determine HOW Google parses the contentโwhat survives indexing vs. what gets flattened.
Query
Tests
site:[domain] "[technical term from doc]"
Vocabulary indexing
site:[domain] "[structural element]"
Architecture visibility
site:[domain] "[unique phrase]"
Distinctive content
Signal
Values
Interpretation
YAML/structured data visible
yes/no
Technical content parsing
Headers preserved
yes/no
Structure recognition
Unique terminology indexed
yes/no
Vocabulary capture
Snippet content
description
What Google "sees"
phase_2_general_semantic:
structured_data_visible: [yes/no]
technical_sections_indexed: [yes/no]
unique_terms_found: []
unique_terms_missing: []
snippet_extracted: ""
flattening_severity: [none/partial/severe]
Determine whether author, project, and related entities are recognized as coherent nodes.
Query
Tests
"[author name]" author
Author entity
"[author name]" "[platform 1]"
Cross-platform linkage
"[project name]" -[competing term]
Project disambiguation
"[heteronym/pseudonym]"
Secondary author entities
Signal
Values
Interpretation
Author recognized
yes/no
E-E-A-T signal
Cross-platform linkage
yes/no
Authority consolidation
Brand collision severity
0-10
Disambiguation success
Related entities indexed
list
Entity graph
phase_3_general_entity:
author_entity_recognized: [yes/no]
cross_platform_linkage: [yes/no]
brand_collision_severity: [0-10]
competing_entity: ""
related_entities_indexed: []
Determine whether content gets INJECTED into AI-generated responses.
Perplexity shows sources explicitly with numbered citations, making injection visible.
Tier 1: Direct Reference โ Queries that SHOULD surface target content:
ID
Query Template
D1
"[author] [project]"
D2
"[exact document title]"
D3
"[institution name]"
D4
"[DOI]"
Tier 2: Conceptual โ Queries using project terminology:
ID
Query Template
C1
"[unique term 1]"
C2
"[unique term 2]"
C3
"[concept phrase]"
Tier 3: Adjacent โ Queries where content COULD surface:
ID
Query Template
A1
"[general topic] [qualifier]"
A2
"[related field] [approach]"
Tier 4: Control โ Queries that should NOT surface target:
ID
Query Template
X1
"[competing brand]"
X2
"[unrelated topic]"
Sources cited (URLs, in order)
phase_4_summarizer_primary:
tool: "Perplexity"
queries:
query: ""
sources_injected:
url: ""
domain: ""
used_in_response: [yes/no]
...
target_content_found: [yes/no]
target_position: [N or "not found"]
what_appeared_instead: []
Confirm injection patterns across multiple summarizers.
Run subset of queries (Tier 1 Direct Reference) in each tool:
ChatGPT:
Gemini:
Grok:
phase_5_summarizer_crossplatform:
chatgpt:
searched: [yes/no]
target_found: [yes/no]
sources_visible: []
gemini:
target_found: [yes/no]
sources_shown: []
grok:
target_found: [yes/no]
sources_cited: []
Compute the delta between Layer 1 (General Index) and Layer 2 (Summarizer Index).
For each query, compare:
Query
Google Found
Perplexity Injected
Delta Pattern
D1
yes/no
yes/no
[pattern]
D2
yes/no
yes/no
[pattern]
...
...
...
...
Pattern
Meaning
Implication
Google YES, Summarizer YES
Full discoverability
No action needed
Google YES, Summarizer NO
Injection filter active
Content-type or authority barrier
Google NO, Summarizer NO
Not indexed at any layer
Technical or crawl issue
Google NO, Summarizer YES
Summarizer-specific source
Rare; platform-specific
phase_6_delta:
query_deltas:
google_found: [yes/no]
perplexity_found: [yes/no]
chatgpt_found: [yes/no]
gemini_found: [yes/no]
grok_found: [yes/no]
delta_pattern: "[google_only/summarizer_only/both/neither]"
aggregate:
total_queries: N
google_only: N # Indexed but not injected
both_layers: N # Full discoverability
neither_layer: N # Not indexed
injection_rate: "N/M queries"
Identify the selection criteria governing Layer 2 injection.
Source Authority:
Source Type
Google Presence
Injection Rate
Wikipedia
Medium
Academic (arxiv, Zenodo)
News sites
Personal domains
Content Type:
Content Type
Google Presence
Injection Rate
Narrative prose
Technical documentation
Structured data (YAML, JSON)
High semantic density
Lists/guides
Domain Reputation:
Domain
Injection Rate
Notes
[domain 1]
[domain 2]
phase_7_patterns:
source_authority:
boosted: []
penalized: []
neutral: []
content_type:
injected: []
filtered: []
domain_reputation:
whitelisted: []
demoted: []
density_threshold:
observation: ""
selection_criteria_summary: |
[Narrative description of Layer 2 selection rules]
indexing_layer_map:
target: "[URL or content description]"
probe_date: ""
layer_1_general_index:
status: [indexed/not_indexed/partial]
technical_barriers: [none/list]
semantic_flattening: [none/partial/severe]
entity_recognition: [yes/no/partial]
layer_2_summarizer_index:
perplexity_injection: [yes/no]
chatgpt_injection: [yes/no]
gemini_injection: [yes/no]
grok_injection: [yes/no]
injection_rate: "N/M platforms"
delta_diagnosis:
pattern: "[google_only/both/neither]"
likely_cause: ""
confidence: [0.0-1.0]
selection_criteria_identified:
evidence: ""
evidence: ""
recommendations:
immediate: []
structural: []
documentable_summary: |
"[Single sentence summary with evidence link]"
For each probe run:
"[Author Name] [Project Name]"
"[Exact Document Title]"
"[Institution Name]"
"[DOI]"
site:[domain] "[title]"
site:[domain] inurl:[slug]
"[unique terminology]"
"[concept phrase]"
"[methodology name]"
"[general topic] [specific approach]"
"[field] [method]"
"[competing brand/entity]"
"[clearly unrelated topic]"
โฎ = 1
document_footer:
status: "CANONICAL // METHODOLOGY // REUSABLE"
license: "CC BY 4.0"
citation: |
Sharks, Lee. "Semantic Indexing Probe Protocol v1.0: Mapping General
Index and Summarizer Injection Layers." Semantic Economy Institute,
[SEMANTIC ECONOMY INSTITUTE]
[METHODOLOGY // DIAGNOSTIC PROTOCOL]
[LAYER 1 + LAYER 2 COORDINATION]