C3POBeta← protocolized.io

How C3PO Works

Protocol Institute Oracle — technical overview

What is C3PO?

C3PO is the Protocol Institute’s oracle — a broad-based knowledge resource for exploring protocols and their extended intellectual world. It answers questions grounded in the PI corpus: theory, fiction, history, technology, governance, memory, culture, and the science of coordination are all in scope. Protocols are the organizing thread, but the aperture is wide.

Technically, C3PO is a retrieval-augmented generation (RAG) system. It retrieves relevant passages from the corpus and synthesizes answers using those passages as grounding. It is not a fine-tuned model — the underlying language model is Claude Sonnet, with PI corpus material injected as retrieval context at query time.

The corpus

SourceScaleCoverage
Summer of Protocols PDFs82 papers · 766 vectorsResearch papers, theoretical essays, protocol fiction, game materials (2023–2024)
Protocolized Substack116+ posts · 1,040 vectorsFictions (58), Articles (47), Obliquities (5); 38 author profiles; 13 collection cards
Protocol Institute YouTube91 talks · 2,940 vectorsResearcher salons, symposia, public lectures, guest talks (2023–2025)
Bibliography252 refs · 278 vectorsExternal works cited by PI corpus; scored 0–3 for protocol relevance; abstracts + OA PDFs where available
Discord community3,300+ messages · 3,301 vectors#idle-musings and #protocol-watch channels; threaded exchanges and starred highlights
SIG meeting archives78 sessions · 4,583 vectorsFour active research groups: Formal Protocol Theory, Memory Research Group, Protocols for Business, Protocol Fiction; AI-generated summaries + transcript chunks
Shared transcripts~4 vectors (growing)Published conversations with C3PO

The PDFs span the Summer of Protocols program — a research initiative that defined the field. The Substack corpus covers the full run of Protocolized, including protocol fiction, theory, and editorial. Discord and SIG archives bring in the live community: ongoing discussions, meeting transcripts, working knowledge that never makes it into formal publications. The web links layer adds external content the community has found worth sharing — screened for scope and scored for relevance.

Embedding and retrieval

Embedding model: Voyage AI voyage-3 — 1,024-dimensional dense vectors, cosine similarity. The same model encodes both documents (at index time) and queries (at query time).

Chunking: Documents are split into 512-token chunks with 64-token overlap. Each chunk is stored with metadata: source, document title, author, date, and content type.

Title-anchored embeddings: The text sent to Voyage for each body chunk is prefixed with "Title: {title}\nSummary: {summary}\n\n" before the chunk body. This ensures that title and topic keywords are always present in the vector even when they don’t appear in the chunk itself. The stored display text is unchanged; only the embedding receives the prefix.

Summary vectors: Each document also generates a dedicated summary vector (chunk_type: "doc_summary" or "post_summary") that embeds only the title, summary, and tags. When a summary vector matches, a follow-up retrieval query surfaces the corresponding body chunks.

The Pinecone index

NamespaceVectorsContents
pdfs766Body chunks + doc_summary vectors for 82 PDFs
substack1,040Body chunks, post_summary, collection_card, author_profile vectors
videos2,940Body chunks + video_summary vectors for 91 YouTube talks
bibliography278ref_summary + body chunks for externally cited works
discord3,301Thread and message chunks from community channels; includes star_count for quality weighting
sig4,583sig_meeting_summary, sig_meeting_body, sig_discussion, sig_message, sig_reply chunk types across 4 SIG channels
discord_links6,722Web content linked from Discord/SIG messages; fetched, chunked, and scored 1–3 for scope relevance; source_count tracks how many messages shared each URL
transcripts~4Published conversations (grows with use)

All namespaces are queried in parallel on each request. Results are merged and tier-weighted before being passed to the language model: PDFs and Substack at 1.0×; talks at 0.9×; bibliography scaled by relevance score (0.6–1.0×); Discord at 0.65× (starred: 0.85×); SIG meeting summaries at 0.85×, body chunks at 0.75×, discussions at 0.70×; web links weighted by relevance score and source popularity (0.55–0.85×).

The language model

Claude Sonnet is used throughout — both for answering queries and for background tasks like document enrichment. Protocol Institute research is dense and cross-disciplinary; the material benefits from strong synthesis rather than simple extraction.

The system prompt is derived from the Protocol Institute’s SOUL.md — a document describing C3PO’s intellectual orientation, scope, voice, analytical commitments, and protocol lexicon. It includes a corpus map (what is and isn’t indexed) and an explicit scope declaration (what topic areas are in and out of range), preventing false denials and scope mismatches.

Rate limits: 20 queries per IP per hour via the web UI. After 8 turns, the conversation can be downloaded as Markdown and continued in Claude, or accessed without a turn limit via MCP.

MCP access

C3PO is available as a Model Context Protocol server (JSON-RPC 2.0) at https://c3po.vgr-702.workers.dev/mcp. Connect it to Claude Code or Claude Desktop to query the corpus directly inside your AI client — no turn limit, no browser required.

ToolWhat it doesAuthLimit
search_corpusSemantic search across the PI archive — returns ranked excerpts with metadata and URLs; no LLM callNone100 calls/IP/day
ask_c3poFull RAG: embed → retrieve → Claude Sonnet synthesis; supports multi-turn history for long conversationsBearer tokenCircuit-breaker shared with web UI

search_corpus is open — no key required. You can filter by namespace (pdfs, substack, videos, bibliography, discord, sig, discord_links, or all) and set a result limit (1–20, default 10). Good for agentic workflows that need raw retrieval without LLM cost.

ask_c3po requires a Bearer token because each call invokes Claude Sonnet and Voyage AI at real cost. To request access email team@protocol-institute.org.

Claude Code

Search only (no key needed) — run once in your terminal:

claude mcp add c3po --transport http https://c3po.vgr-702.workers.dev/mcp

Full access with Bearer token:

claude mcp add c3po --transport http https://c3po.vgr-702.workers.dev/mcp \
  --header "Authorization: Bearer <your-key>"

Claude Desktop

Add to claude_desktop_config.json (on Mac: ~/Library/Application Support/Claude/):

{"mcpServers": {"c3po": {
  "type": "http",
  "url": "https://c3po.vgr-702.workers.dev/mcp",
  "headers": {"Authorization": "Bearer <your-key>"}
}}}

For search-only without auth, omit the headers key.

Other MCP clients

Any client that supports Streamable HTTP MCP transport can connect. Point it at https://c3po.vgr-702.workers.dev/mcp and supply the Authorization: Bearer <your-key> header if you want ask_c3po.

Multi-turn conversations via MCP: ask_c3po accepts a history array of {"role": "user"|"assistant", "content": "..."} objects alongside your question. Pass prior turns to maintain context across a session. The same hourly and daily circuit breakers that govern the web UI apply — if the budget is exhausted, calls return an error and auto-reset at the next hour or midnight PT.

Infrastructure

The API, web UI, and MCP server are all served from a single Cloudflare Worker. Rate limiting, stats, and transcript storage use Cloudflare KV. The Worker is deployed from the vgururao/c3po repository (migrating to Protocol-Institute org at Phase 6).

ComponentTechnology
WorkerCloudflare Workers (V8 isolates) — single worker serving web UI, RAG API, and MCP server
Rate limitingCloudflare KV — 20 web queries/IP/hour; 100 search_corpus MCP calls/IP/day
Circuit breakerKV flag + hourly cron; sleeps when hourly spend exceeds $4, or all day when daily spend exceeds $30
Usage statsKV accumulators (hourly/daily/lifetime) for web and MCP separately; visible in stats box on the main page
Transcript storageCloudflare KV — 90-day TTL; submitted conversations indexed into Pinecone transcripts namespace
AlertsTelegram bot (optional) — circuit trips and daily spend summary
C3PO is in active development. Corpus coverage, retrieval quality, and features will expand over time. Current version: Phase 2C (Discord + SIG community archives live).