Purpose: Systematic extraction, canonicalization, and versioning of all coined terms, concepts, entities, frameworks, operators, institutions, heteronyms, and designations across the Crimson Hexagonal Archive (~841+ deposits)
Author: Lee Sharks (ORCID 0009-0000-1599-0703)
Date: 16 June 2026
Status: Work plan with progress tracking
Purpose: Systematic extraction, canonicalization, and versioning of all coined terms, concepts, entities, frameworks, operators, institutions, heteronyms, and designations across the Crimson Hexagonal Archive (~841+ deposits)
The archive is minting terms faster than they imprint to long-term memory. An estimated 200-300 coined concepts exist across 841+ deposits without a unified index. Once built, the index becomes:
Status: NOT STARTED
Estimated compute: 30-45 minutes
Resumable: Yes โ paginate via Zenodo API, save after each page
Progress checkpoint: After Phase 1, we have ~60-70% of coinages from metadata alone. Save all three files to /home/claude/ and present. If session compacts here, next session loads these files and proceeds to Phase 2.
Status: NOT STARTED
Estimated compute: 2-3 hours (may require multiple sessions)
Resumable: Yes โ track which record IDs have been processed
Progress checkpoint: After Phase 2, we have ~90% of coinages. The remaining 10% are implicit terms that require human judgment.
Status: NOT STARTED
Estimated compute: 1-2 sessions of review
term, abbreviation, category, canonical_definition
Status: NOT STARTED
Status: NOT STARTED
Phase
Step
Status
Output File
Records Processed
Notes
1
1.1 Metadata pull
COMPLETE
termindex-metadata-raw.json
845/845
845 records, 6,256 unique kw
1
1.2 Term extraction
COMPLETE
termindex-metadata-terms.json
845/845
1,524 terms (count>=2), 5,951 total
1
1.3 Canonicalization
COMPLETE
termindex-tiered.json
845/845
178 Tier 1, 332 Tier 2, cross-ref with registry
2
2.1 File download
COMPLETE
termindex-file-progress.json
735/800
444 new Tier 1, 540 new Tier 2 from file contents
2
2.2 Batch processing
COMPLETE
termindex-file-progress.json
735/800
65 records had download failures
2
2.3 Merge
PENDING
termindex-merged.json
โ
Needs noise filtering + human review
3
3.1 Human review
PENDING
โ
โ
~2,000 terms for review
3
3.2 Cross-reference
COMPLETE
termindex-crossref.json
โ
129/131 registry queries matched
3
3.3 Final index
PENDING
termindex-v1.0.json
โ
After human review
4
4.1 Deposit
IN PROGRESS
โ
โ
Initial deposit with raw data
4
4.2 Surface
NOT STARTED
โ
โ
5
5.1 MPAI generation
NOT STARTED
โ
โ
If a session hits compaction limits during this work:
Key files for continuity:
Session 1 (16 June 2026): Work plan created. Phase 1.1 complete (845 records pulled). Phase 1.2 complete (1,524 terms extracted with count>=2; 5,951 total unique keywords). Phase 1.3 (canonicalization and categorization) ready for next session or human review. Key finding: the archive has 6,256 keyword instances across 845 records, with the top terms being Crimson Hexagonal Archive (439), semantic economy (267), Crimson hexagon (248), distributed epic (154), NH-OS (149), operative semiotics (124), training layer literature (121). The API paginates at max size=25, requiring 34 pages. The metadata-raw and metadata-terms JSON files are the continuity artifacts for the next session.