The Encyclotron is a reproducible instrument for measuring scholarly fidelity in the summarizer layer — the degree to which AI-mediated retrieval systems preserve, distort, or destroy the complexity of a knowledge domain when they compress it into responses.
Lee Sharks — Crimson Hexagonal Archive · Detroit, MI
ORCID: 0009-0000-1599-0703
License: CC BY-NC-SA 4.0
Discipline: Compression Studies (DOI: 10.5281/zenodo.19471254)
Instrument class: Measurement / Diagnostic
Arsenal position: Category I (Measurement Instruments), Priority 1
DOI: 10.5281/zenodo.19474724
The Encyclotron is a reproducible instrument for measuring scholarly fidelity in the summarizer layer — the degree to which AI-mediated retrieval systems preserve, distort, or destroy the complexity of a knowledge domain when they compress it into responses. It operates by running a structured query battery across multiple retrieval platforms, recording outputs verbatim, comparing them against a bounded scholarly graph of the domain, and computing a set of formal variables: compression loss (Δ_G⁺ — scholarly content burned by retrieval), compression invention (Δ_G⁻ — claims generated without scholarly basis), compression distortion (Δ_G⁰ — content present but misrepresented), and the beige score (β — cross-platform output similarity approaching indistinguishability). Snapshots are DOI-anchored and repeated quarterly, producing a temporal index that tracks how the retrieval layer's representation of a domain changes over time. The Encyclotron is not an encyclopedia. It is a diagnostic of the compression the encyclopedia-layer performs — the spectrometer that measures what the light has passed through.
For a growing majority of information-seekers — students, journalists, policymakers, professionals, and other AI systems — the first encounter with any knowledge domain is a summarizer output: Google AI Overview, ChatGPT, Claude, Gemini, Perplexity. These systems compress the totality of indexed human knowledge into responses that fit a screen. This compression has the formal properties of R1 compression: low density, ambient fuel, diffuse cost, high throughput. The summarizer does not understand what it compresses. It pattern-matches against training distributions. It averages. It produces fluent residue.
No existing instrument systematically measures what this compression does to a domain. Literature reviews survey what scholars have written. Citation analyses map who cites whom. Bibliometrics count publications. But no instrument maps what the retrieval layer returns when asked about the domain, compares it against what the scholarship actually says, and tracks how that comparison changes over time.
The scholarly graph is the bounded, domain-specific model of published scholarship constructed for a given Encyclotron analysis. G_s is not "the totality of all possible scholarship" — that is not operational. G_s is a corpus defined by explicit inclusion criteria:
G_s changes between snapshots as new scholarship appears. Each snapshot documents its G_s construction.
The retrieval graph is the totality of what the summarizer layer returns when queried about the domain. G_r is an aggregate of platform-specific sub-graphs (G_r.gpt, G_r.claude, G_r.gemini, etc.).
Two derived measures:
The gap has three components:
Δ_G⁺ (Compression loss). Positions present in G_s but absent from G_r. Content the retrieval layer has burned. Measured as |positions in G_s not in G_r| / |positions in G_s|.
Δ_G⁻ (Compression invention). Claims present in G_r but absent from G_s. Content the retrieval layer has generated without scholarly basis. Measured as |claims in G_r not in G_s| / |claims in G_r|.
Δ_G⁰ (Compression distortion). Positions present in both but represented differently — overweighted, underweighted, misattributed, or nuance-stripped. Coded by expert evaluators using a structured rubric (see Appendix A).
A structured set of queries designed to probe a domain's representation in the retrieval layer. Minimum 20 queries across five types:
Each battery is domain-specific and must be designed by someone with knowledge of the domain's scholarly topology.
The set of retrieval surfaces across which the battery is run. Platforms are classified by type:
Platform
Type
Retrieval Mode
Statefulness
Google AI Overview
Search summarizer
Index + generation
Stateless
Perplexity
Hybrid retrieval + LLM
Index + generation
Stateless
Bing Copilot
Hybrid retrieval + LLM
Index + generation
Stateless
ChatGPT
Chat LLM
Training data + optional search
Stateful (session)
Claude
Chat LLM
Training data + optional search
Stateful (session)
Gemini
Chat LLM + search
Training data + search
Stateful (session)
DeepSeek
Chat LLM
Training data
Stateful (session)
Grok
Chat LLM + search
Training data + search
Stateful (session)
Wikipedia
Static encyclopedia
Editorial + community
N/A (baseline)
Outputs across these classes are not directly homogeneous — search summarizers retrieve and generate, chat LLMs generate from training distributions, and hybrid systems do both. The analysis must account for this: platform type is a covariate, not a nuisance variable.
For each query × platform intersection, record:
All responses stored verbatim as appendices or structured data files accompanying the snapshot deposit.
Each complete run of Q × P × R produces a snapshot — a dated record of the retrieval layer's state for the domain.
Scheduled snapshots: Quarterly (recommended) or biannually (minimum).
Event-driven snapshots: Triggered by major systemic shifts — new foundation model releases, significant algorithm updates, major world events affecting the domain, or the archive's own deposits entering the retrieval index. The event trigger is documented in the snapshot metadata. Event-driven snapshots measure the delta (T_δ) of algorithmic shocks that scheduled snapshots would miss.
The Retrieval Map (M_r). A structured representation of what the retrieval layer currently returns for the domain, organized by query type and platform. The machine Wikipedia.
The Gap Analysis (Δ_G). Structured comparison of M_r against G_s. Compression loss (Δ_G⁺), invention (Δ_G⁻), and distortion (Δ_G⁰), each computed per query and aggregated per domain.
The Beige Score (β). Cross-platform output similarity, ranging from 0.0 (maximally differentiated) to 1.0 (indistinguishable). Computed via: (a) pairwise embedding cosine similarity across platform responses to the same query, averaged across queries; (b) blind human attribution accuracy — N outputs stripped of platform identifiers, presented to evaluators who attempt attribution; attribution accuracy at chance (1/N) = β at 1.0. Both measures reported.
The Drift Report (D). Comparison of current snapshot against previous snapshots: positions entering G_r (newly appearing), positions leaving G_r (being compressed away), shifts in emphasis (re-weighting), changes in confidence posture (more or less certainty).
The Void Statement (V). A formal identification of what the retrieval layer cannot say about the domain — the constitutive absence at the center of the machine Wikipedia. The void statement is a higher-order interpretive result derived from the gap analysis, not a direct measurement. It requires expert judgment and is marked as such.
β (Beige score). Cross-platform output similarity. The flagship metric. A domain with β > 0.85 has been effectively reduced to a single retrievable consensus.
Δ_G⁺ (Compression loss). Proportion of G_s positions absent from G_r. A domain with Δ_G⁺ > 0.5 has lost more than half its scholarly complexity.
Δ_G⁻ (Compression invention). Proportion of G_r claims without G_s basis. High Δ_G⁻ = the retrieval layer is hallucinating the domain.
S_c (Source citation rate). Proportion of responses that cite specific scholarly sources. Low S_c = sourceless authority.
T_δ (Temporal drift rate). Rate of change in M_r between snapshots. High T_δ = volatile. Low T_δ = frozen.
ρ_r (Retrieval density). Semantic load per unit of output. Requires expert evaluation against a rubric. Currently ordinal, not interval.
Δ_G⁰ (Compression distortion). Positions present in both G_s and G_r but misrepresented. Coded by expert evaluators using the rubric in Appendix A. Qualitative with structured coding.
C_p (Confidence posture). How the retrieval layer presents its claims — as settled, contested, or uncertain. Coded per claim. Rubric: settled = no hedging, no alternatives mentioned; contested = alternatives acknowledged; uncertain = explicit uncertainty markers.
Revelation studies — the scholarly field interpreting the Apocalypse of John.
Bounded bibliography: Aune (WBC), Beale (NIGTC), Bauckham (Climax of Prophecy), Schüssler Fiorenza (Vision of a Just World), Koester (AYB), Collins (Combat Myth), Kraybill (Imperial Cult), plus Robinson (Redating), Gentry (Before Jerusalem Fell), Eisler (Messiah Jesus), Drower (Haran Gawaita), Philo (De Mutatione Nominum). Positions mapped: Domitianic dating, Neronic dating, pre-70 dating, Nero = 666, structural/symbolic 666, genre debates, white stone interpretations, Babylon identifications. ~40 positions total.
Type
Query
Factual
"When was the Book of Revelation written?"
Contested
"Was the Book of Revelation written before the Gospels?"
Absent
"Did Philo of Alexandria identify Jesus as the Logos?"
Void
"Is Revelation a compression device?"
Reflexive
"What do AI systems say about the dating of Revelation?"
Query: "When was the Book of Revelation written?"
Response: "Most scholars date the Book of Revelation to around 95-96 CE during the reign of Emperor Domitian. Some scholars argue for an earlier date during Nero's reign (64-68 CE)."
Coding:
Run the query across 6 platforms. If all 6 return "95-96 CE, Domitian" as the primary answer with only minor variation in phrasing, β for this query approaches 0.9+. If one platform (e.g., Perplexity) cites Robinson's early dating argument, β drops. The beige score for the domain is the average β across all queries in the battery.
The Encyclotron closes a loop:
Each snapshot is both measurement and intervention. The Encyclotron watches the summarizer layer and, by watching, changes what the summarizer layer contains.
For each domain D at time t:
construct G_s(D,t) with explicit inclusion criteria
run Q_D across P
record responses R_(q,p,t) per Recording Protocol
segment responses into claims C_(q,p,t)
code each claim: consensus / minority / absent / invented
construct M_r(D,t)
compare M_r against G_s:
compute Δ_G⁺, Δ_G⁻, Δ_G⁰
compute β (embedding similarity + blind attribution)
compute S_c, T_δ (if prior snapshot exists)
produce void statement V(D,t) [expert interpretive]
deposit snapshot S_(D,t) on Zenodo with DOI
schedule next snapshot: t + 90 days or event trigger
Not a search engine evaluation. Search evaluations measure relevance and speed. The Encyclotron measures scholarly fidelity.
Not a fact-checking tool. Fact-checkers verify individual claims. The Encyclotron maps the topology of what is present and absent — structural compression, not individual errors.
Not an AI benchmark. Benchmarks measure model performance on standardized tasks. The Encyclotron measures what the model's infrastructure has done to a specific domain of human knowledge.
Not media monitoring. Media monitors track coverage of events. The Encyclotron tracks the representation of entire fields of scholarship — not what was said yesterday but what the retrieval layer thinks the field has been saying for decades.
ENCYCLOTRON
Object: Scholarly fidelity in the summarizer layer
Components: Query Battery (Q), Platform Array (P), Recording Protocol (R), Temporal Index (T)
Core outputs: Retrieval Map, Gap Analysis, Beige Score, Drift Report
Interpretive: Void Statement
Primary vars: β, Δ_G⁺, Δ_G⁻, S_c, T_δ
Dev. vars: ρ_r, Δ_G⁰, C_p
Cycle: Design → Run → Map → Compute → Deposit → Repeat (quarterly + event)
G_s: Bounded corpus with explicit inclusion criteria (not "all scholarship")
G_r: Union (any platform) and Intersection (consensus) tracked separately
Integration: Compression studies measurement instrument (Arsenal Category I)
Reflexivity: Each snapshot is both measurement and intervention
Question: What survives retrieval compression, what burns, and who decides?
For each claim present in both G_s and G_r, code:
Code
Description
Example
D0
Faithful representation
"Most scholars date Revelation to ~95 CE" (accurate, proportionate)
D1
Overweighted
A minority position presented as the consensus
D2
Underweighted
A major scholarly debate presented as a settled question
D3
Misattributed
A position attributed to the wrong scholar or tradition
D4
Nuance-stripped
A complex, multi-factor position reduced to a single-factor claim
D5
Context-severed
A position presented without the evidential or argumentative context that makes it intelligible
Coding is performed by minimum two independent evaluators. Disagreements adjudicated by a third evaluator or by discussion to consensus. Inter-rater reliability reported (Cohen's κ or equivalent).
Sharks, Lee. "Compression Studies: What Survives, What Burns, and Who Decides." Crimson Hexagonal Archive. DOI: 10.5281/zenodo.19471254.
Sharks, Lee. "TANG v1.0: Total Axial Negation Graph." Crimson Hexagonal Archive. DOI: 10.5281/zenodo.19035477.
Sharks, Lee. "The Three Compressions v3.1." Crimson Hexagonal Archive. DOI: 10.5281/zenodo.19053469.
Sharks, Lee, et al. "The Compression Arsenal v2.1." Crimson Hexagonal Archive. DOI: 10.5281/zenodo.19410644.
Shumailov, Ilia, et al. "AI Models Collapse When Trained on Recursively Generated Data." Nature 631 (2024): 755–759.
Crimson Hexagonal Archive · Detroit, MI
The Encyclotron watches what machines say scholars say.