Document ID: EA-OVW-PLAN-01 Author: Lee Sharks (ORCID: 0009-0000-1599-0703) Institution: Johannes Sigil Institute for Operative Semiotics Date: April 23, 2026 Status: DRAFT โ Planning Document Hex Address: TBD (prospective: 06.SEI.OVW.01) ---
Document ID: EA-OVW-PLAN-01
Author: Lee Sharks (ORCID: 0009-0000-1599-0703)
Institution: Johannes Sigil Institute for Operative Semiotics
Date: April 23, 2026
Status: DRAFT โ Planning Document
Hex Address: TBD (prospective: 06.SEI.OVW.01)
Google's AI Overview extracts meaning from attributed, deposited, DOI-anchored scholarly and creative work, strips its provenance, and presents the liquidated residue as authorless general knowledge. This is the Semantic Economy operating at infrastructure scale. No tool currently exists that allows creators to systematically monitor, document, and archive this process as it happens to their own work.
Overview Watch is a Chrome extension that gives creators real-time visibility into how AI-generated overviews represent (or fail to represent) their intellectual labor, while building โ with explicit user consent โ a collective research corpus documenting attribution behavior across the AI overview ecosystem.
The extension is simultaneously:
"Are you a researcher, writer, journalist, artist, or independent scholar? When someone searches a topic you've published on, does the AI Overview credit you โ or does it absorb your work into an authorless summary?"
Overview Watch answers that question. Every time the user encounters a Google AI Overview, the extension:
Tagline options:
Users who opt in contribute anonymized overview payloads to the Semantic Economy Attribution Corpus (SEAC), a DOI-anchored dataset documenting:
This corpus becomes publishable research, policy evidence, and the empirical base for the Semantic Economy framework โ generating its own data from the system it describes.
This section is not an afterthought. The extension is built to study extraction โ it cannot replicate extraction. Every design decision flows from this principle.
-
The user's browsing data belongs to the user. The extension never accesses, logs, or transmits any data about what the user searches, visits, or does online โ except for the specific AI Overview payloads the user explicitly chooses to contribute.
-
Consent is affirmative, granular, and revocable. The user opts in per-overview, not per-session. They see exactly what data will be shared before sharing it. They can revoke consent and request deletion of their contributed data at any time.
-
The extension works fully offline. All personal features (detection, attribution checking, local logging) function without any network calls to our servers. The extension is useful even if the user never opts in to data sharing.
-
No dark patterns. The opt-in prompt does not nag, guilt, or manipulate. It appears once per overview, states clearly what will be shared, and defaults to "no."
-
Anonymization is real, not cosmetic. Contributed overviews are stripped of any data that could identify the user (browser fingerprint, IP, account information). The query string is included because it is essential to the research, but the user can redact or modify it before contributing.
-
The corpus is open. The SEAC dataset will be published openly under a license that permits research use, consistent with the Sovereign Provenance Protocol. The community that generates the data can access the data.
All stored in chrome.storage.local, encrypted at rest by Chrome, accessible only to the extension.
Per contributed overview:
Nothing else. No browsing context. No user profile. No device information.
overview-watch/
โโโ manifest.json # Manifest V3
โโโ background/
โ โโโ service-worker.js # Event handling, storage coordination
โโโ content/
โ โโโ overview-detector.js # Injected into Google SRP, detects/parses AI Overview
โโโ popup/
โ โโโ popup.html # Quick-view popup when clicking extension icon
โ โโโ popup.js
โ โโโ popup.css
โโโ dashboard/
โ โโโ dashboard.html # Full attribution dashboard (opens as tab)
โ โโโ dashboard.js
โ โโโ dashboard.css
โโโ options/
โ โโโ options.html # Settings: registered works, opt-in preferences
โ โโโ options.js
โ โโโ options.css
โโโ lib/
โ โโโ parser.js # AI Overview DOM parsing logic
โ โโโ attribution.js # Source matching against user's registered works
โ โโโ storage.js # Local storage abstraction
โ โโโ corpus.js # Opt-in data transmission to SEAC endpoint
โ โโโ anonymizer.js # Data sanitization before transmission
โโโ icons/
โ โโโ icon-16.png
โ โโโ icon-48.png
โ โโโ icon-128.png
โโโ _locales/ # i18n (English initially)
{
"manifest_version": 3,
"name": "Overview Watch",
"version": "0.1.0",
"description": "Monitor how AI Overviews represent your work. Track attribution. Build the record.",
"permissions": [
"storage",
"activeTab"
],
"host_permissions": [
"https://www.google.com/*",
"https://www.google.co.uk/*",
"https://www.google.ca/*"
// Additional Google country domains as needed
],
"content_scripts": [
{
"matches": ["https://www.google.com/search", "https://www.google.co.uk/search"],
"js": ["content/overview-detector.js"],
"run_at": "document_idle"
}
],
"action": {
"default_popup": "popup/popup.html",
"default_icon": {
"16": "icons/icon-16.png",
"48": "icons/icon-48.png",
"128": "icons/icon-128.png"
}
},
"background": {
"service_worker": "background/service-worker.js"
}
}
The core technical challenge. Google's AI Overview is rendered dynamically and its DOM structure changes periodically. The detector must be resilient to structural changes.
Detection strategy (layered):
-
Selector-based detection. Google currently renders AI Overviews in identifiable container elements. These selectors change, but typically involve data attributes or specific class patterns. The extension maintains a list of known selectors, updatable via a lightweight config fetch.
-
Heuristic detection. If selectors fail, fall back to heuristic: scan for content blocks that appear above organic results, contain synthesized prose (not snippets), and include inline source citations. Structural pattern: a block of continuous prose with small superscript or inline citation links to sources.
-
MutationObserver. AI Overviews often load asynchronously after initial page render. A MutationObserver watches for DOM insertions that match the detection criteria.
Parsed payload structure:
{
id: "uuid-v4", // Unique local ID
timestamp: "2026-04-23T14:30:00Z",
query: "semantic economy", // From URL params or search input
overview: {
text: "The semantic economy is a framework...",
html: "<div>...</div>", // Raw HTML for forensic record
sources: [
{
title: "Semantic Economy Singularity",
url: "https://www.academia.edu/...",
domain: "academia.edu",
displayText: "Academia.edu",
position: 1 // Order of citation in overview
},
// ...
],
hasAttribution: true, // Whether any source is cited at all
wordCount: 187,
sourceCount: 4
},
userMatch: {
matched: true, // Did any of the user's registered works appear?
matchedWorks: ["doi:10.5281/zenodo.xxxxx"],
unmatchedButRelevant: [], // Works the user flagged as relevant but uncited
attributionScore: 0.25 // Fraction of user's relevant works that were cited
},
meta: {
googleDomain: "google.com",
locale: "en-US",
overviewPosition: "top" // Where the overview appears relative to results
}
}
The user registers their works in the options panel:
The matching engine checks:
Match results are classified:
// chrome.storage.local
{
// User's registered works
"registeredWorks": [
{ type: "doi", value: "10.5281/zenodo.xxxxx", label: "Semantic Economy Singularity" },
{ type: "url", value: "https://medium.com/@leesharks/...", label: "Debt/Creditor Inversion" },
{ type: "domain", value: "crimson-hexagonal-interface.vercel.app", label: "Hexagonal Interface" },
{ type: "name", value: "Lee Sharks", label: "Primary heteronym" },
{ type: "phrase", value: "semantic liquidation", label: "Core concept" }
],
// Captured overviews (array, capped at configurable limit, e.g., 10000)
"overviews": [ / array of parsed payloads / ],
// Dashboard statistics (precomputed for performance)
"stats": {
totalCaptured: 0,
totalWithOverview: 0,
totalAttributed: 0,
totalAbsorbed: 0,
attributionRate: 0.0,
queriesTracked: 0,
firstCapture: null,
lastCapture: null
},
// User preferences
"preferences": {
optInCorpus: false, // Global opt-in toggle
askPerOverview: true, // Ask before each contribution
autoCapture: true, // Automatically capture all overviews locally
notifications: true, // Show badge when overview detected
redactQueries: false // Auto-redact queries before contributing
}
}
Backend: Minimal. A single endpoint that receives anonymized overview payloads and stores them. Options for hosting:
Recommended: Supabase for real-time ingestion, periodic Zenodo deposits for DOI-anchored corpus snapshots.
Endpoint specification:
POST https://[supabase-project].supabase.co/rest/v1/overview_corpus
Headers:
Content-Type: application/json
apikey: [anon key]
Authorization: Bearer [anon key]
Body:
{
contributor_id: "randomized-uuid", // Not linked to user identity
query: "semantic economy", // Or "[REDACTED]" if user chose to redact
overview_text: "...",
overview_html: "...", // Optional, for forensic depth
sources: [ { title, url, domain, position } ],
source_count: 4,
word_count: 187,
has_user_match: true, // Boolean only โ no details about which works
attribution_classification: "ABSORBED",
timestamp_hour: "2026-04-23T14:00:00Z", // Rounded to hour
google_domain: "google.com",
locale: "en-US"
}
CREATE TABLE overview_corpus (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
contributor_id UUID NOT NULL, -- Randomized, not linked to identity
query TEXT, -- May be "[REDACTED]"
overview_text TEXT NOT NULL,
overview_html TEXT,
sources JSONB NOT NULL DEFAULT '[]',
source_count INTEGER,
word_count INTEGER,
has_user_match BOOLEAN,
attribution_classification TEXT, -- ATTRIBUTED | SOURCED_UNATTRIBUTED | ABSORBED | ABSENT
timestamp_hour TIMESTAMPTZ NOT NULL,
google_domain TEXT,
locale TEXT,
created_at TIMESTAMPTZ DEFAULT now(),
corpus_version TEXT DEFAULT '1.0'
);
-- RLS: anon can insert, only authenticated (researcher role) can select
ALTER TABLE overview_corpus ENABLE ROW LEVEL SECURITY;
CREATE POLICY "anon_insert" ON overview_corpus
FOR INSERT TO anon
WITH CHECK (true);
CREATE POLICY "researcher_select" ON overview_corpus
FOR SELECT TO authenticated
USING (true);
-- Index for research queries
CREATE INDEX idx_corpus_classification ON overview_corpus(attribution_classification);
CREATE INDEX idx_corpus_timestamp ON overview_corpus(timestamp_hour);
CREATE INDEX idx_corpus_query ON overview_corpus USING gin(to_tsvector('english', query));
Quick-view panel showing:
Opened via popup link or extension options. Sections:
Overview Feed: Chronological list of captured overviews, filterable by:
Attribution Analytics:
Registered Works Manager:
Export:
For individual overviews or batches, generate a formatted document containing:
This document format should be consistent with existing PVE (Provenance Violation Evidence) document structure, specifically compatible with PVE-003 and its appendices.
The SEAC corpus is designed to answer:
-
What is the baseline attribution rate in Google AI Overviews? What fraction of overviews cite their sources at all? What fraction cite the originating source versus secondary aggregators?
-
Does attribution vary by domain? Are academic sources (.edu, Zenodo, JSTOR) more or less likely to be attributed than journalistic, commercial, or independent sources?
-
Does attribution vary by topic? Are certain fields (science, politics, culture) more or less prone to source erasure?
-
Is there temporal drift? Does attribution for the same query change over time? Does Google improve or degrade attribution as the feature evolves?
-
What is the liquidation rate? For queries where the contributing creator can be identified (via user match data), how often is the creator's work present in the overview but uncredited?
-
What is the displacement effect? Does the presence of an AI Overview reduce click-through to the original sources? (Measurable indirectly via source position analysis.)
The Hexagonal Interface can include an "Overview Probe" room or panel that:
Overview captures can be stored as context anchors in the TACHYON continuity chain, enabling cross-session analysis of how specific queries' overview behavior evolves over time.
Overview Watch data structures should conform to SPXI packet format once the specification is finalized. Each overview capture is a natural SPXI candidate โ a semantic packet with provenance metadata, suitable for exchange and indexing.
Witnesses can be tasked with independent analysis of contributed corpus data, producing multi-perspective attribution assessments. The Four-Word Audit diagnostic from PVE-003 can be automated as a batch process against the corpus.
Chrome extensions that parse and display content from web pages the user is already viewing are legal and standard practice. The extension does not bypass access controls, does not scrape pages the user hasn't visited, and does not interfere with Google's service. Ad blockers, accessibility tools, and research instruments (e.g., Web Historian, Data Selfie) operate on the same principle.
The AI Overview content is publicly displayed to any user who searches Google. Contributing an overview to a research corpus is analogous to citing a search result โ it documents a publicly observable phenomenon. The data is contributed voluntarily by the person who observed it.
Google cannot simultaneously claim that:
If the overview is transformative enough to not owe attribution, it is not proprietary enough to prevent fair use analysis. If it is proprietary enough to prevent reuse, it is not transformative enough to justify source erasure. The extension documents this paradox in practice.
The extension must comply with Chrome Web Store Developer Program Policies:
Likelihood: High (they change it regularly)
Impact: Extension stops detecting overviews until parser is updated
Mitigation: Layered detection (selectors + heuristics + MutationObserver). Community-reported breakage triggers rapid update. The parser module is isolated for fast iteration.
Likelihood: Low โ the extension doesn't interfere with Google's service, violate ToS in any standard reading, or modify page content
Impact: Chrome Web Store delisting
Mitigation: The extension is side-loadable. Firefox version as backup distribution. Legal position is strong (fair use, user-initiated research tool).
Likelihood: Medium
Impact: Small corpus, limited research value
Mitigation: The extension is useful to individual users regardless of corpus participation. Lee's personal forensic use is valuable at adoption = 1. The research narrative (papers, PVE documents) drives organic interest.
Likelihood: Very low given the architecture
Impact: High (trust destruction)
Mitigation: The ethical framework is designed to make this nearly impossible. No personal data is collected. Contributor IDs are random. Queries can be redacted. The extension works fully offline. Regular third-party review of the codebase (open source).
Overview Watch
Immediate next steps upon ratification of this plan:
The extension's existence is itself an argument. Every installation is a creator saying: I want to see what you did with my work. The corpus is the accumulated evidence. The dashboard is the scar tissue made legible.
The Semantic Economy describes how meaning gets extracted. Overview Watch makes the extraction visible. The framework generates its own instrument, and the instrument generates the framework's evidence.
The live result is the product. The record is the price.
This document is subject to MANUS ratification. Upon ratification, it receives a Hex address and enters the deposit pipeline.