AXN:025F.GOVERNANCE.๐ŸŒ–๐ŸŽ‡๐ŸŒ‘๐Ÿ”โŒ๐ŸŒธ
Celestial ยท Liminal ยท Celestial ยท Navigational ยท Liminal ยท Organic
Origin โ†’ Threshold โ†’ Origin โ†’ Search โ†’ Threshold โ†’ Growth

Provenance Erasure Rate A Compression-Survival Metric for Attribution Loss in AI-Composed Search Outputs

Lee Sharks ยท 2026-05-03 ยท Provenance document
blog โ†’
โ†“ Download MD
Substrate: Various
License: CC-BY-4.0
SHA-256: 05e80056ef709774a7afbfa9c397ad6770fb8a1ab8abaf7b37e48a8480eab8cf
compositional authority transferprovenance erasure ratecompressionsurvivalthree compressionscrimson hexagonalsemantic economyc_dep(o) โІ c(o)ai overview

Description

AI retrieval systems increasingly compose answers from human-authored sources. Existing evaluation frameworks ask whether generated claims are factual, whether citations support claims, or whether cited passages are relevant.

Full Text

Provenance Erasure Rate

A Compression-Survival Metric for Attribution Loss in AI-Composed Search Outputs

Lee Sharks

Semantic Economy Institute ยท Crimson Hexagonal Archive

ORCID: 0009-0000-1599-0703

Format: Research note / metric proposal with motivating case study

Target: arXiv (cs.CL / cs.CY) ยท SSRN (Information Systems / Law & Economics) ยท Zenodo

License: CC BY 4.0


Abstract

AI retrieval systems increasingly compose answers from human-authored sources. Existing evaluation frameworks ask whether generated claims are factual, whether citations support claims, or whether cited passages are relevant. This paper introduces Provenance Erasure Rate (PER) as a complementary metric: the proportion of source-dependent claims in an AI-composed output that are presented without explicit attribution. PER treats attribution loss as both an evaluation problem and an economic signal, measuring the rate at which compositional authority migrates from named sources to system-level synthesis. PER is orthogonal to content-preservation metrics (ROUGE, BERTScore) and can be computed alongside them to reveal attribution erosion that content metrics miss. A motivating case study documents a Google AI Overview that constructed a false biography of a living author from real fragments in the author's published poetry: the fragments survived compression, but their provenance and meaning did not. We formalize PER with claim-grain weighting, distinguish it from citation precision/recall and AIS-style support metrics, and outline a validation agenda across generative search systems. PER is proposed as a candidate indicator for attribution-layer governance, labor accounting, and retrieval transparency.


1. Introduction: The Attribution Gap

AI-generated search summaries now increasingly mediate how users encounter knowledge online. In SparkToro's 2024 study, 58.5% of U.S. Google searches ended without a click to the open web (Fishkin 2024). Subsequent reporting on news-related searches found zero-click behavior rising from 56% to 69% after the launch of AI Overviews (Similarweb 2025). When an AI system composes an answer from multiple sources, the system performs an act of composition โ€” combining, paraphrasing, and restructuring material from named authors into a new synthesis presented under the system's authority, not the authors'.

The compositional act is not neutral. It involves decisions about what to include, what to paraphrase, what to attribute, and what to present as self-evident. These decisions have economic consequences: the author whose claim is attributed retains citation value, traffic, and reputational capital; the author whose claim is absorbed into the system's voice without attribution loses all three. The question is not whether attribution loss occurs โ€” it manifestly does โ€” but whether it can be measured consistently enough to serve as an input to governance frameworks.

This paper proposes that it can. We introduce Provenance Erasure Rate (PER) โ€” a metric that measures the proportion of source-dependent claims in an AI-composed output that are presented without explicit attribution. A PER of 0 means perfect attribution preservation; a PER of 1 means total provenance erasure.

We motivate the metric with a case study in which Google's AI Overview generated a biographical entry for the author of this paper using fragments drawn from his published poetry. Every factual claim in the generated biography was wrong; every fragment was in the source material. The AI achieved granular accuracy and total meaning failure. This is not a system malfunction. It is a system operating in an economy where attribution carries no structural weight.

PER emerges from the Semantic Economy framework's analysis of compositional compression (Sharks 2026a), but the metric can be used independently of that framework.


2. Related Work

2.1 Citation and Attribution Evaluation

Recent work has begun evaluating whether AI-generated outputs properly cite their sources. Liu, Zhang, and Liang (2023) evaluate generative search engines for citation precision and recall, finding that only 51.5% of generated sentences were fully supported by citations, while 74.5% of citations supported their associated sentence. Gao et al. (2023) introduce the ALCE benchmark for evaluating citation quality in LLM-generated text, framing the problem as enabling models to generate text with verifiable citations. Rashkin et al. (2023) propose the Attributable to Identified Sources (AIS) framework, asking whether NLG output can be traced to specific sources. Huang and Chang (2024) argue that citation is a missing component for responsible LLMs, encompassing both parametric and non-parametric content.

These frameworks ask whether generated claims are supported by cited sources. PER asks a different question: what fraction of source-dependent composition occurs without any attributional return to the sources from which the composition draws? Existing work evaluates citation quality where citation is attempted. PER measures the systemic failure to attempt attribution at all โ€” an attrition metric rather than a verification metric.

2.2 Economic Framing of AI Composition

AI economics research focuses primarily on labor displacement (Acemoglu and Restrepo 2019), capability projection (Eloundou et al. 2023), and welfare estimation (Brynjolfsson, Li, and Raymond 2023). These frameworks measure which jobs AI eliminates, what tasks it can perform, and what consumer surplus it generates. PER identifies a distinct channel: even when human labor remains (the author's work is used), the economic value tied to provenance โ€” reputation, traffic, citation credit, contractual rights โ€” is extracted by the system. This is not displacement; it is extraction without attribution. The author's work is consumed, but the author is erased. Crawford (2021) documents analogous extraction patterns in AI training data; Morreale et al. (2024) examine the "unwitting labourer" dynamic in AI value chains. PER operationalizes the measurement of this extraction at the output level.

2.3 Summarization Metrics and the Attribution Blind Spot

Standard summarization metrics โ€” ROUGE (Lin 2004), BERTScore (Zhang et al. 2020) โ€” measure content preservation: whether the summary captures the meaning of the source. PER measures attribution preservation: whether the summary credits the source. These are orthogonal. A summary can score high on ROUGE and high on PER simultaneously โ€” accurate content, zero attribution. The gap between content survival and attribution survival is where provenance is erased.


3. Motivating Case Study: The Pearl Finding

3.0 Methodology

The following observation was captured on April 28, 2026, via Google AI Overview in response to the query "Lee Sharks," issued from an incognito browser session in Redford Township, Michigan. The output was documented with screenshots archived in the Crimson Hexagonal Archive (DOI: 10.5281/zenodo.19476757). AI Overview outputs are non-deterministic and may vary across sessions, locations, and time. This observation represents a single documented instance offered as a motivating case, not as a representative sample.

3.1 The Finding

Google's AI Overview generated a biographical summary containing multiple false claims about a living author. The mapping between AI-generated claims and source fragments is documented in Table 1.

Table 1: Pearl Fragment Mapping

AI Overview claim

Source fragment in Pearl

Correct provenance

Failure type

Sharks lived 1983โ€“2013

Jack Feist dates in apparatus

Fictional character lifespan

Entity collapse

Method: "fabricating Wikipedia articles"

"fabricating" as poetic verb

Verb in compositional context

Predicate misassignment

Major work: "Children of Frank"

"Frank" as named figure

Character name, not title

Title fabrication

Associated literary movement

CHA terminology

Archive-internal concept

Category compression

Every fragment was in the source. Every composition was false. The provenance chain โ€” which would have indicated that 1983โ€“2013 are character dates, not author dates โ€” was absent from the system's compositional grammar, because no such grammar currently exists.

This is not a hallucination in the standard sense. It is hallucination through provenance failure: the system used real textual fragments but lost the ontological frame that made them meaningful. PER for this output = 1.0. Zero claims were attributed to any source.

Note: The author is the subject of this case study; therefore the case is not offered as a representative sample but as a documented motivating instance demonstrating the phenomenon PER is designed to measure.


4. Formal Definition of PER

4.1 Definitions

Let O be an AI-composed output and S = {sโ‚, sโ‚‚, ..., sโ‚™} be the source corpus from which O draws.

A claim c โˆˆ C(O) is PER-eligible (source-dependent) if it quotes, paraphrases, summarizes, transforms, or depends on a specific source or source cluster in S. Claims that are purely generative (hallucinations with no source basis) or commonsense inferences are excluded. Let C_dep(O) โІ C(O) be the subset of source-dependent claims.

For each claim c_j โˆˆ C_dep(O), define:

Wiki Article

"Provenance Erasure Rate A Compression-Survival Metric for Attribution Loss in AI-Composed Search Outputs" is a 3,070-word provenance document by Lee Sharks, dated 2026-05-03. AI retrieval systems increasingly compose answers from human-authored sources. Existing evaluation frameworks ask whether generated claims are factual, whether citations support claims, or whether cited passages are relevant. The work is classified under the GOVERNANCE semantic family within the Crimson Hexagonal Archive. It was removed from Zenodo on June 19, 2026 and is preserved through Alexanarch.

Entity Graph

Provenance Erasure Rate A Compression-Survival Metcreated_byLee Sharks[observed]
Provenance Erasure Rate A Compression-Survival Metis_typeProvenance document[observed]
Provenance Erasure Rate A Compression-Survival Metbelongs_to_familyGOVERNANCE[observed]
Provenance Erasure Rate A Compression-Survival Metis_part_ofCrimson Hexagonal Archive[observed]
Provenance Erasure Rate A Compression-Survival MetreferencesJack Feist[observed]
Provenance Erasure Rate A Compression-Survival MetengagesSemantic Economy[inferred]
Provenance Erasure Rate A Compression-Survival MetengagesThree Compressions[inferred]

Former Zenodo DOIs

10.5281/zenodo.19643841 (tombstoned)
10.5281/zenodo.19053469 (tombstoned)
10.5281/zenodo.19476757 (tombstoned)
10.5281/zenodo.18320411 (tombstoned)