AXN:0296.EMPIRICAL.๐Ÿœโœ‹๐ŸŸค๐ŸŒ‹โž•๐Ÿ”
Alchemical ยท Gestural ยท Signal ยท Elemental ยท Mathematical ยท Terminal
Transmutation โ†’ Touch โ†’ Alarm โ†’ Force โ†’ Proof โ†’ Closure

Crimson Hexagonal Archive โ€” Hugging Face Dataset Work Plan v3

Lee Sharks ยท 2026-05-20 ยท Dataset
blog โ†’
โ†“ Download MD
Substrate: Various
License: CC-BY-4.0
SHA-256: a0bcc91590f512df543a1fa3dbb31bbab2b96290ccfc86b8c0a0fa40517c7bf3
research question, operationalizedstrong signals (high confidence)config 4: byclassifierheteronymtask 3: heteronym reattributionthe central methodological moveconfig 5: reattributionchangesheteronym operational profilespre-session preparation (lee)

Description

The Crimson Hexagonal Archive: A Mixed-Provenance, Heteronymically Attributed Corpus for Synthetic-Data Collapse, AI Authorship, and Provenance-Bearing Training Research

Full Text

Crimson Hexagonal Archive โ€” Hugging Face Dataset Work Plan v3

Status: v3 supersedes v2. The central methodological change is the introduction of an automated classifier that performs both provenance mode classification AND heteronym reattribution as reproducible scholarly recognition work. The classifier itself becomes a deposit.


Project Title

The Crimson Hexagonal Archive: A Mixed-Provenance, Heteronymically Attributed Corpus for Synthetic-Data Collapse, AI Authorship, and Provenance-Bearing Training Research


The Central Methodological Move

v1 treated provenance classification as a manual judgment. v2 added a decision tree to make classification reproducible. v3 recognizes that attribution itself โ€” both provenance mode and heteronym โ€” must be performed by automated classifier, not author memory, for two structural reasons:

1. Reproducibility as scholarship. A classification system that depends on the author's recollection of writing each deposit is not measurement. It is opinion. The provenance taxonomy can only function as a research instrument if the same deposit produces the same classification regardless of who runs the classifier or when. Author memory introduces classification noise that would confound any downstream collapse experiment.

2. Heteronymic emergence. Material is regularly attributed to Lee Sharks at the time of deposit and only later โ€” sometimes years later โ€” recognized as belonging to a specific sub-heteronym's domain. Sigil's jurisdictional concerns, Glas's measurement work, Vox's diplomatic register, Morrow's long-form narratives, Fraction's meta-theory: these heteronyms emerge from the corpus over time, and earlier work gets recognized retrospectively as theirs. The classifier performs this recognition systematically across the entire archive, applying current understanding of heteronym domains to historical deposits.

The classifier is not metadata cleanup. It is scholarly recognition that the founder voice was, at the time of writing, holding territory that later resolves to specific heteronym domains.


Research Question, Operationalized

Null hypothesis (Hโ‚€): Fine-tuning on synthetic or AI-assisted text produces equivalent perplexity degradation and semantic drift regardless of provenance density (DOI anchoring, heteronymic attribution, archival embedding, assembly review).

Alternative hypothesis (Hโ‚): Fine-tuning on high-provenance-density AI-involved text produces measurably slower perplexity degradation and less semantic drift than fine-tuning on low-provenance-density AI-involved text.

Critical insight from Assembly review: Provenance cannot modulate collapse unless provenance is presented to the training system as a signal. The dataset must materialize multiple textual views โ€” body_only, minimal_header, full_provenance_header โ€” so researchers can ablate provenance visibility.


Three Tasks, One Classifier

The classifier performs three classification tasks simultaneously on each deposit:

Task 1: Provenance Mode (Axis 1, mutually exclusive)

Tag

Definition

human_primary

Written principally by a human author with minimal or no AI involvement

human_directed_ai_assisted

Human-authored with AI used for research, drafting, or editorial refinement; human retains compositional authority

collaborative_mixed

Substantial compositional contribution from both human and AI; neither purely instrumental

ai_directed_human_framed

AI generates primary content within a human-defined frame, prompt structure, or editorial container

ai_generated_provenance_anchored

AI-generated content that carries full DOI provenance, authorial attribution, and archival anchoring

uncertain_needs_review

Edge case flagged for manual review

Task 2: Artifact Mode (Axis 2, one or more)

Tag

Definition

theoretical_paper

Analytic argument with citations

technical_specification

Protocol, schema, or formal spec

literary_work

Poetry, fiction, creative prose

traversal_log

Captured AI-system traversal

forensic_documentary

Capture/record of AI behavior with annotation

dataset_artifact

Structured data

code_artifact

Executable code as primary content

web_surface_spec

Site code or web interface

Task 3: Heteronym Reattribution

This is the new central work in v3.

The Zenodo metadata records a single creator (often Lee Sharks). The classifier evaluates each deposit against the documented operational profiles of all twelve heteronyms (plus Jack Feist as LOGOS*) and produces a reattribution proposal with confidence score.

Output Field

Value

heteronym_zenodo_original

The creator name as recorded in Zenodo

heteronym_classifier_attributed

The classifier's attribution (may match original or differ)

heteronym_attribution_confidence

0.0 to 1.0

heteronym_attribution_signals

List of signals that contributed to the attribution

heteronym_co_authors

Other heteronyms detected as collaborators

Both attributions are preserved in the dataset. Researchers can use either or compare. The classifier's attribution does not erase the Zenodo record; it adds a second layer of analysis.


Heteronym Operational Profiles

The classifier reads each heteronym's published provenance document and constructs a feature profile. Profiles include domain, vocabulary fingerprints, register, format conventions, and reference patterns.

Heteronym

Domain

Vocabulary Fingerprints

Register

Lee Sharks (founder)

Core theory, archive governance, semantic economy

"semantic economy", "operative philology", "compression survival", "PER", "provenance erasure"

Theoretical-political

Rex Fraction

Meta-theory, academic criticism, heteronym-as-technology

"meta-heteronym", "heteronymy as institutional technology", C1-C5 conditions

Academic-essayistic

Johannes Sigil

Classical philology, jurisdiction of meaning, philosophical-theological argument

"jurisdiction", "authorize", classical reception, ancient languages, philological precision

Philosophical-theological

Damascus Dancings

TBD from provenance document

TBD

TBD

Rebekah Cranes

TBD from provenance document

TBD

TBD

Talos Morrow

Long-form narrative, extended prose works

extended fiction conventions, narrative voice

Literary-narrative

Ichabod Spellings

TBD from provenance document

TBD

TBD

Sparrow Wells

TBD from provenance document

TBD

TBD

Nobel Glas

Measurement of Meaning, Lagrange Observatory, adversarial topology

"torus", "Tยฒ", "module", "verification integral", "โˆฎ", measurement formalism

Technical-measurement

Ayanna Vox

Diplomacy, public-facing surfaces, community outreach

"VPCOR", "constituency", "community", "rhizome", "outreach"

Diplomatic-public

Sen Kuro

TBD from provenance document

TBD

TBD

Dr. Orin Trace

TBD from provenance document

TBD

TBD

Viola Arquette

TBD from provenance document

TBD

TBD

Jack Feist (LOGOS*)

External-to-Dodecad position, anti-archive critique

"LOGOS*", external critique vocabulary

Critical-external

For heteronyms marked TBD, the classifier reads the published provenance document during initialization and extracts the profile programmatically. Where a heteronym's profile is sparse, the classifier returns low-confidence and flags for human review.


Signal Hierarchy for All Three Tasks

Strong signals (high confidence)

Wiki Article

"Crimson Hexagonal Archive โ€” Hugging Face Dataset Work Plan v3" is a 2,657-word dataset by Lee Sharks, dated 2026-05-20. The Crimson Hexagonal Archive: A Mixed-Provenance, Heteronymically Attributed Corpus for Synthetic-Data Collapse, AI Authorship, and Provenance-Bearing Training Research The work is classified under the EMPIRICAL semantic family within the Crimson Hexagonal Archive. It was removed from Zenodo on June 19, 2026 and is preserved through Alexanarch.

Entity Graph

Crimson Hexagonal Archive โ€” Hugging Face Dataset Wcreated_byLee Sharks[observed]
Crimson Hexagonal Archive โ€” Hugging Face Dataset Wis_typeDataset[observed]
Crimson Hexagonal Archive โ€” Hugging Face Dataset Wbelongs_to_familyEMPIRICAL[observed]
Crimson Hexagonal Archive โ€” Hugging Face Dataset Wis_part_ofCrimson Hexagonal Archive[observed]
Crimson Hexagonal Archive โ€” Hugging Face Dataset WreferencesJohannes Sigil[observed]
Crimson Hexagonal Archive โ€” Hugging Face Dataset WreferencesRex Fraction[observed]
Crimson Hexagonal Archive โ€” Hugging Face Dataset WreferencesRebekah Cranes[observed]
Crimson Hexagonal Archive โ€” Hugging Face Dataset WreferencesAyanna Vox[observed]
Crimson Hexagonal Archive โ€” Hugging Face Dataset WreferencesDamascus Dancings[observed]
Crimson Hexagonal Archive โ€” Hugging Face Dataset WreferencesTalos Morrow[observed]
Crimson Hexagonal Archive โ€” Hugging Face Dataset WreferencesNobel Glas[observed]
Crimson Hexagonal Archive โ€” Hugging Face Dataset WreferencesJack Feist[observed]

Former Zenodo DOIs

10.5281/zenodo.20290865 (tombstoned)
10.5281/zenodo.20293582 (tombstoned)