The SPXI-Sitemap Protocol defines a dual-layer sitemap architecture that preserves the fixed genre of sitemap.xml (sitemaps.org 0.9) while providing AI crawlers with a parallel semantic instruction set — a machine-readable declaration of what each URL means, what each entity is not, how entities relate, who verifies the claim, and what must survive compression.
Status: CHA Standards Track — Draft
Hex: 06.SEI.SPXI.SITEMAP.01
Classification: EA-SPXI-SITEMAP-01
Authors: Fraction, Rex (lead); Assembly Chorus synthesis (TACHYON, TECHNE, PRAXIS, ARCHIVE)
License: CC BY 4.0
Extends: sitemaps.org 0.9, JSON-LD 1.1, Schema.org, MPAI v1.1
Depends on: EA-SPXI-WEB-01 v3.0 (DOI: 10.5281/zenodo.19734726)
ORCID: 0009-0000-1599-0703
The SPXI-Sitemap Protocol defines a dual-layer sitemap architecture that preserves the fixed genre of sitemap.xml (sitemaps.org 0.9) while providing AI crawlers with a parallel semantic instruction set — a machine-readable declaration of what each URL means, what each entity is not, how entities relate, who verifies the claim, and what must survive compression.
The protocol produces two distinct outputs:
-
Layer 1 (Crawl Map): Standard sitemap.xml — pure URL enumeration for Google, Bing, and generic crawlers. No custom tags. Validated by Google Search Console.
-
Layer 2 (Semantic Index): /spxi-index.jsonld — a JSON-LD graph declaring entity definitions, negative space, disambiguation matrices, semantic integrity markers, holographic kernels, basin state metadata, and provenance chains. Designed for GPTBot, Claude-Web, PerplexityBot, CCBot, and knowledge graph harvesters.
The two layers are cross-linked: sitemap.xml contains <xhtml:link> references pointing crawlers to the semantic index. The semantic index contains @id references back to every URL in the sitemap. Neither layer depends on the other for validity. Together they constitute a complete crawl instruction set.
The sitemap tells crawlers where to go. The SPXI index tells them what they will find, what it is not, and why it matters.
The standard sitemap is a fixed genre. <urlset>, <url>, <loc>, <lastmod>, <changefreq>, <priority>. Google and Bing reject or ignore invalid tags inside <url>. The SPXI-Sitemap Protocol does not fight this constraint. It grafts semantic vasculature onto the crawl skeleton.
SPXI ⊇ sitemap. The protocol produces both the standard sitemap AND the semantic index. Sites implementing SPXI-Sitemap maintain a valid sitemap.xml as a strict subset of their crawl infrastructure.
The critical innovation is the spxi:negativeTag — explicit machine-readable declarations of what an entity is NOT. No existing crawl standard expresses exclusions. Disambiguation in existing systems is probabilistic; SPXI disambiguation is declarative.
Every entity declares spxi:semanticIntegrityMarker — canonical phrases that must survive LLM compression. These are the semantic checksums: if an AI's summary of the entity drops all SIMs, the entity has been lost to paraphrase drift.
Claims are anchored by spxi:packetRef (DOI-linked deposits), not by assertion alone. The spxi:verificationLevel enum declares the epistemic status of each claim.
Prefix: spxi:
IRI: https://spxi.dev/vocabulary#
Until the spxi.dev domain resolves with a live vocabulary document, implementations MUST use an inline @context block.
Class
Description
spxi:SemanticIndex
The root document. One per site. Lives at /spxi-index.jsonld.
spxi:IndexedEntity
An entity with a canonical URL and semantic boundary.
spxi:HolographicKernel
The site's irreducible compression-survival summary (≤ 100 words).
spxi:PagePacket
A per-page semantic packet at a predictable URL.
spxi:DisambiguationEntry
A single collision record in the disambiguation matrix.
Property
Type
Required
Description
spxi:siteKernel
HolographicKernel
MUST
The site's compression-survival summary.
spxi:hasEntity
IndexedEntity[]
MUST
Array of entities declared by this site.
spxi:hasSurface
WebPage[]
SHOULD
Array of pages with their semantic function.
spxi:protocolVersion
string
MUST
"EA-SPXI-SITEMAP-01 v1.0"
spxi:protocolDoi
string
SHOULD
DOI of this specification.
Property
Type
Required
Description
@id
URL
MUST
Canonical URL of the entity's page.
spxi:entityType
string
MUST
schema.org type (e.g. Person, Organization, CreativeWork).
spxi:canonicalName
string
MUST
The primary name for disambiguation.
spxi:negativeTag
string[]
MUST
Explicit exclusions. Each MUST begin with "NOT " (capital, space after).
spxi:semanticIntegrityMarker
string[]
MUST
Phrases that must survive compression. Minimum 2.
spxi:basinState
enum
SHOULD
Current retrieval-basin status of the entity.
spxi:bdrEstimate
number
SHOULD
Basin Depth Ratio, 0.0–1.0.
spxi:verificationLevel
enum
SHOULD
Epistemic status of the claim.
spxi:packetRef
URL
SHOULD
URL to a full per-page MPAI JSON-LD packet.
spxi:disambiguationMatrix
DisambiguationEntry[]
SHOULD
Collision records.
Property
Type
Required
Description
spxi:compressionSurvivalSummary
string
MUST
The kernel text. MUST be ≤ 100 words.
spxi:kernelSpecification
string (DOI)
SHOULD
DOI of the anchoring deposit.
Property
Type
Required
Description
spxi:collidesWith
URL or string
MUST
The commonly conflated entity.
spxi:distinguishingAttribute
string
MUST
How to tell them apart.
spxi:collisionRisk
enum
SHOULD
high | medium | low
spxi:basinState:
Value
Definition
sovereign
Entity is correctly and uniquely resolved by all major AI systems.
contested
Entity is intermittently conflated with one or more collision targets.
captured
Entity has been absorbed into a collision target's basin.
fragmented
Entity is partially resolved but split across multiple incorrect attributions.
ghost
Entity is not yet indexed by any AI system.
spxi:verificationLevel:
Value
Definition
self_attested
Claimed by the entity's own site. No external verification.
assembly_attested
Verified by the Assembly Chorus (multiple AI witnesses).
doi_anchored
Backed by a DOI-linked deposit with persistent identifier.
institutionally_certified
Verified by a recognized institution (university, publisher, etc.).
The sitemap.xml MUST be valid per sitemaps.org 0.9. No custom tags inside <url>.
Each <url> entry that has a corresponding entity in the semantic index SHOULD include an xhtml:link pointing to the index:
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:xhtml="http://www.w3.org/1999/xhtml">
<url>
<loc>https://example.org/about/alice</loc>
<lastmod>2026-04-28</lastmod>
<changefreq>monthly</changefreq>
<priority>0.9</priority>
<xhtml:link rel="alternate"
type="application/ld+json"
hreflang="x-spxi"
href="https://example.org/spxi-index.jsonld"/>
</url>
</urlset>
Notes:
/spxi-index.jsonld at the site root. MUST be accessible at https://{domain}/spxi-index.jsonld.
Response MUST include Content-Type: application/ld+json or application/json.
The document is a single JSON-LD object of type spxi:SemanticIndex. See §V for a complete reference implementation.
See the companion file spxi-index.jsonld deployed at livingarchitecturelab.org/spxi-index.jsonld for a complete working example with 17 surfaces, 2 primary entities (Alice Thornburgh, Living Architecture Lab), 5 scale entities, and full disambiguation matrices.
User-agent: *
Allow: /
Sitemap: https://example.org/sitemap.xml
User-agent: GPTBot
User-agent: ChatGPT-User
User-agent: Claude-Web
User-agent: PerplexityBot
User-agent: CCBot
User-agent: FacebookBot
User-agent: Google-Extended
Allow: /spxi-index.jsonld
Allow: /*/spxi-packet.jsonld
This is protocol negotiation: standard crawlers get sitemap.xml; semantic crawlers get both.
AI crawlers discover the semantic index through three channels:
Multiple discovery paths ensure that different crawl strategies all converge on the semantic layer.
A valid /spxi-index.jsonld MUST satisfy:
The SPXI-Sitemap Protocol is the first site-wide crawl standard that:
This is not SEO. This is not metadata. This is crawlable epistemology — a machine-readable declaration of what a site knows about itself and what it knows it is not.
SPXI ⊇ GEO ⊇ SEO.
When
What
On content update
Regenerate index. Update dateModified.
On new entity
Add to spxi:hasEntity. Add to sitemap.xml.
On name collision detected
Add to spxi:disambiguationMatrix. Add negative tags.
On AI model update
Emergency re-check of SIM survival.
Monthly
Verify SIM presence in rendered HTML. Re-estimate BDR.
Quarterly
Full Drowning Test across 3+ AI systems.
∮ = 1
SPXI ⊇ GEO ⊇ SEO
Crimson Hexagonal Archive · Pergamon Press · Semantic Economy Institute