OrganizedMarket
Hard price data from TastyTrade. Prediction market odds from Polymarket and Kalshi. Sentiment from Twitter/X and financial news — all wired into a correlation engine running inside ClawBox's isolated Linux VM, orchestrated through OpenClaw. Surfaces arbitrage-adjacent signals across venues in real time.
The Core Idea
Prediction markets price real-world outcomes. Financial markets price risk. When they diverge — a Polymarket contract implying a Fed cut with probability X while /ZQ options price something different — there's signal. OrganizedMarket finds that gap continuously.
Explore the Guide
Current Build
OrganizedMarket has grown well past the base 6-agent flow. What ships today is a 13-agent pipeline with deterministic trade plans, a 2-click agentic execution layer, a shadow-journal + hindsight + policy learning loop, six live dashboards, and a Claude Code ↔ Hermes handoff skill that delegates coding tasks to a remote agent on a dedicated Mac mini over Tailscale. Everything mock-first, everything kill-switched.
TradePlan — exact contract counts sized against TRADE_UNIT_USD × confidence, expiry month, venue deeplinks, copy-pastable ticket strings for both legs. Nothing opaque, nothing narrative — every number is deterministic./actions with an Approve QA click, then a Place Trade click. Three layered kill switches (AUTO_EXEC_ENABLED + per-venue flags) keep everything paper-only by default.SHADOW entry whether you trade it or not. Hindsight labels outcomes at horizon; Polymarket resolution overlays real $/$-risked PnL when markets settle. The policy aggregator turns those labels into per-bucket TP rates + confidence multipliers./hermes delegate skill rsyncs the project to claws-mac-mini, creates an isolated git worktree, runs Hermes chat in a detached tmux session, and brings the diff back into ./.hermes/incoming/ for review. Your Claude Code session keeps full read/write on the local tree throughout.Six live dashboards
All served by Wrangler Pages; deployed copy lives at organized-market-arch.pages.dev. Each polls its rolling JSON feed every 2s.
/dashboardSignal with tier, confidence, gap, z. Trade-plan card inline with copy-ticket + deeplinks./actions/arb/journal/policy/docs13-agent pipeline
signal ──┐ ┌── dispatcher (sinks)
odds ──┼─▶ correlator ─▶ signal ──▶ qa ──▶ action ──┤
sentiment─┘ │ ├── journal ─▶ hindsight ─▶ policy ──┐
└──▶ arb.poly_tasty ───────────┤ │
└── bridge (HTTP) ─▶ tasty·poly·kalshi
│
autoresearch ◀── tier.lift ◀── tier_correlator │
──▶ model.drift │
──▶ journal.research ─── appended to entries │
sniffer ──▶ counterparty.fingerprint │
│
policy feedback ──────────────── modifies next ────┘
Every box subscribes to one or more Pydantic-validated topics on a shared in-process asyncio bus. Full wiring + schemas are in the Agent Pipeline + Docs · Message Topics sections.
2-click execution layer
The QA agent takes any HIGH signal / profitable arb and produces a ProposedAction with six deterministic check rows. Passed items surface on /actions.
PROPOSED ──▶ QA_PASSED ──click 1──▶ APPROVED ──click 2──▶ EXECUTING ──▶ EXECUTED
└─▶ QA_REJECTED └─▶ FAILED
The bridge server (scripts/bridge.py, http://127.0.0.1:18799) receives approve / execute POSTs and fires both legs through a fail-closed executor facade:
- Options leg → TastyTrade via OAuth → defined-risk call vertical at 50%-width limit price. Gated by
AUTO_EXEC_TASTY=1. - Prediction leg · Polymarket →
py-clob-clientwithPOLY_WALLET_PRIVATE_KEY. Gated byAUTO_EXEC_POLY=1. - Prediction leg · Kalshi → RSA-PSS signed REST headers. Gated by
AUTO_EXEC_KALSHI=1.
Master switch AUTO_EXEC_ENABLED=0 overrides everything — every place_* call returns status=manual with a specific "set X=1" message. Bridge still runs, but no orders ever reach the wire.
Learning loop (Layers 1–4)
Signal or profitable ArbOpportunity opens a SHADOW JournalEntry. If you later execute, it promotes in place to OPEN — same entry_id, full snapshot history preserved.HINDSIGHT_HORIZON_SECONDS (default 1h), labels entries with realized_capture, ideal_entry_lag_seconds, ideal_exit_lag_seconds. Separately polls gamma-api.polymarket.com for resolution — when a market settles, overwrites the proxy with real $/$-risked return.pattern_key (symbol|venue|gap:bucket|z:bucket|sentiment:regime): TP rate, mean realized capture, suggested confidence multiplier (clipped [0.3, 1.3], shrunk toward 1.0 until n≥20), suggested exit rule. Read-only; visible on /policy.LEARN_MODE=shadow (default) only stamps an audit. LEARN_MODE=active applies the multipliers live (size capped at ×1.2). Plan carries a PolicyAdjustment record either way so the dashboard always explains why a number moved.Hermes handoff skill
A Claude Code skill at ~/.claude/skills/hermes/ delegates build tasks to the Hermes agent running on claws-mac-mini over Tailscale while keeping your local Claude Code session fully active on the same project. Hermes works on an isolated git worktree on the remote side. When it's done, /hermes pull rsyncs the worktree into ./.hermes/incoming/<task-id>/ — you review the diff and selectively merge.
/hermes delegate "<task>" → rsync project → tmux detached hermes chat → task-id /hermes status [id] → tail log + list worktree changes /hermes pull [id] → rsync back → HANDOFF_RESULT.md + diff vs local /hermes list → recent task-ids + state /hermes cancel <id> → tmux kill-session
Sits side-by-side with Claude Code. Both can work on the same project simultaneously — Hermes on the mini, Claude Code on your Mac — with zero mutation conflicts until you explicitly merge.
What's next
- Token router — separate project the user is building to slot into
AUTORESEARCH_MODEL_DEEP / _FAST / _HAIKUenv slots that already exist. Will pick the right model per task (deep post-mortem vs routine bucket summary). - Kalshi live client — currently mock mode. Wire the RSA-JWT auth + market streams.
- Option-chain Greeks lookup for Tasty orders — replace the 0.30 / 0.15 delta heuristic with exact strikes via
/market-metrics/greeks. - Roadmap items (autoresearch-driven attribution + named-wallet sniffer) stay on the plan below.
System Architecture
OrganizedMarket runs entirely inside ClawBox — a Tauri-wrapped macOS app that spins up an isolated Ubuntu 24.04 VM via Lima. OpenClaw runs inside that VM as the agent gateway. Your Mac stays clean; only explicitly uploaded files are shared with the VM.
ORGANIZEDMARKET — SYSTEM DIAGRAM
┌────────────────────────────────────────────────────────────────────┐
│ CLAWBOX (Tauri + React) │
│ Native macOS UI · one-click VM lifecycle │
├────────────────────────────────────────────────────────────────────┤
│ LIMA VM MANAGER │
│ Ubuntu 24.04 · isolated from host Mac │
├────────────────────────────────────────────────────────────────────┤
│ OPENCLAW GATEWAY :18789 │
│ │
│ ┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐ │
│ │ agent-signal │ │ agent-poly │ │ agent-kalshi │ │
│ │ TastyTrade REST │ │ polymarket-cli │ │ kalshi-cli │ │
│ │ DXLink stream │ │ subprocess JSON │ │ subprocess JSON │ │
│ │ options flow │ │ clob midpoints │ │ yes_bid/yes_ask │ │
│ └─────────┬────────┘ └─────────┬────────┘ └────────┬─────────┘ │
│ │ │ │ │
│ └────────────────────┼───────────────────┘ │
│ │ quote.update · odds.update │
│ ┌──────────────────────────────▼──────────────────────────────┐ │
│ │ agent-correlator · pearson + lag │ │
│ │ cross-venue divergence · z-scored rolling stats │ │
│ └──────────────────────────────┬──────────────────────────────┘ │
│ │ + sentiment + drift + lift │
│ ┌──────────────────────────────┴──────────────────────────────┐ │
│ │ agent-sentiment Twitter/X v2 · news NLP · Claude Sonnet │ │
│ └──────────────────────────────┬──────────────────────────────┘ │
│ │ signal (HIGH/MED/LOW) │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ agent-dispatcher Slack · Discord · webhook · dashboard │ │
│ │ confidence gate · HIGH-only fires · MED queues for review │ │
│ └─────────────────────────────────────────────────────────────┘ │
│ │
│ ─── research + attribution loop ────────────────────────────── │
│ │
│ ┌──────────────────┐ tier.lift ┌─────────────────────────────┐ │
│ │ agent-tier- │───────────▶│ agent-autoresearch │ │
│ │ correlator │ │ Opus 4.6 × 4.7 drift probes │ │
│ │ MED→HIGH lift │◀───────────│ model.drift → correlator │ │
│ └────────┬─────────┘ feedback └──────────────┬──────────────┘ │
│ │ │ │
│ │ signal_log mining │ drift context │
│ ▼ ▼ │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ agent-sniffer │ │
│ │ public-data venue cohorts · signal+drift fingerprinting │ │
│ └─────────────────────────────────────────────────────────────┘ │
└────────────────────────────────────────────────────────────────────┘
│ │
▼ ▼
claws-mac-mini :11434 CF Pages dashboard
Claude Sonnet via OpenClaw organized-market-arch.pages.dev
local inference (optional)
Why ClawBox as the Container
ClawBox's Lima-based VM isolation is ideal for a financial intelligence agent: API credentials never touch your host Mac, the VM can be snapshotted before experiments, and the OpenClaw gateway handles multi-agent orchestration without needing to wire up a separate orchestration layer. Everything is already there.
| Component | Technology | Role |
|---|---|---|
ClawBox GUI |
Tauri + React | Native macOS wrapper, one-click VM start/stop |
Lima VM |
Ubuntu 24.04 ARM | Isolated execution environment, no host filesystem access |
OpenClaw |
Node.js + Python | Agent gateway :18789, multi-agent orchestration, tool registry |
agent-signal |
Python + DXLink WS | TastyTrade streaming quotes, options flow, greeks ingestion |
agent-poly |
Python + polymarket-cli |
Polymarket CLOB via Rust CLI subprocess — markets get + clob midpoints, JSON stdout |
agent-kalshi |
Python + kalshi-cli |
Kalshi event contracts via CLI subprocess — kalshi market <TICKER> --json, RSA-PSS auth handled by CLI |
agent-sentiment |
Python + Claude Sonnet | Twitter/X entity scoring, financial news NLP |
agent-correlator |
Python + NumPy | Cross-venue divergence, Pearson + lag correlation matrix, joins model.drift + tier.lift into tier scoring |
agent-dispatcher |
Python + webhooks | Confidence-gated alerts, Slack/Discord/ClawBox UI output |
agent-autoresearch |
Python + Anthropic SDK | Probes two frontier model versions (Opus 4.6 vs 4.7) on identical market context, emits model.drift. Subscribes to tier.lift for immediate triggered probes |
agent-tier-correlator |
Python + SQLAlchemy | Mines signal_log for MED→HIGH lift per pattern bucket, publishes tier.lift when a pattern crosses the promotion threshold |
agent-sniffer |
Python + public signal/drift joins | Emits venue-cohort fingerprints from recent signal + model.drift context and tags the likely lagging model for that cohort |
Bus topics
All inter-agent communication flows through core.bus.Bus — an in-process asyncio
pub/sub with Pydantic validation on publish. New topics from the research + attribution loop:
| Topic | Schema | Publisher → Subscribers |
|---|---|---|
quote.update |
QuoteUpdate |
signal → correlator |
odds.update |
OddsUpdate |
poly, kalshi → correlator |
sentiment |
Sentiment |
sentiment → correlator |
signal |
Signal |
correlator → dispatcher, tier-correlator (persisted to signal_log) |
arb.poly_tasty |
ArbOpportunity |
correlator → dispatcher |
model.drift |
ModelDriftEvent |
autoresearch → correlator (tier boost on drift × divergence overlap) |
market.microstructure |
MarketMicrostructure |
poly / kalshi → sniffer (public book spread, imbalance, activity) |
tier.lift |
TierTransitionStat |
tier-correlator → autoresearch (immediate triggered probe on high-lift pattern) |
counterparty.fingerprint |
CounterpartyFingerprint |
sniffer → dispatcher / downstream scoring (venue cohort + likely lagging model) |
Research + attribution loop
The base signal pipeline produces tiered price-divergence signals. The research loop — tier-correlator → autoresearch → sniffer — converts those signals into attributed venue-cohort opportunities:
- Tier-correlator mines
signal_loghourly for MED patterns that historically preceded HIGH within a window, computes lift, emitstier.liftwhen lift > 1.5×. - Autoresearch wakes on
tier.lift(instead of waiting its 300s cadence), probes the active frontier models on the matching pattern, emitsmodel.driftif they disagree. - Sniffer joins recent
signalstructure withmodel.driftand emits a venue-cohort fingerprint tagged with the likely lagging model. - Dispatcher persists those fingerprints to SQLite and dashboard JSON so the attribution layer is inspectable and testable.
The loop is self-correcting. When a triggered probe successfully promotes a MED to HIGH and resolves in the expected direction, the (pattern, model_version) pair strengthens in the Wiki. When it fails, the lift estimate for that pattern decays faster. No manual tuning — the system finds its own edge and forgets what no longer works.
ClawBox Setup
ClawBox ships as a native macOS app. It manages the entire Lima VM lifecycle — no CLI required.
Once installed, OpenClaw runs inside the Ubuntu 24.04 VM and exposes a gateway at
localhost:18789 through a port-forwarded bridge.
Setup stages
:18789.ClawBox Architecture Internals
┌──────────────────────────────────────────┐
│ ClawBox (Tauri + React) │
│ Native macOS UI │
├──────────────────────────────────────────┤
│ Lima VM Manager │
├──────────────────────────────────────────┤
│ Ubuntu 24.04 VM │
│ ┌──────────────────────────────────┐ │
│ │ OpenClaw Agent │ │
│ │ Browser · Terminal · File Sys │ │
│ └──────────────────────────────────┘ │
└──────────────────────────────────────────┘
▲
│ Your files stay on your Mac.
│ You share only what you upload.
└──────────────────────────────────
Data Sources
Three primary data sources feed OrganizedMarket: TastyTrade for live financial market data and options flow, Polymarket for binary prediction market odds, and Kalshi for regulated event contracts. Together they form a complete picture of market-implied probabilities versus hard derivative pricing.
TastyTrade API — Financial Market Data
TastyTrade provides a full open REST API with a WebSocket DXLink streamer for real-time quotes. The SDK supports equities, ETFs, options, futures, and futures options. OrganizedMarket uses it primarily for options chain data, implied volatility surfaces, and streaming quotes on macro-sensitive instruments (SPY, QQQ, /ES, /ZQ, TLT).
Polymarket — Prediction Market Odds
Polymarket uses a Central Limit Order Book (CLOB) model. OrganizedMarket pulls live order books and trades for politically and economically sensitive markets — Fed decisions, election outcomes, GDP prints, CPI surprises — and feeds the implied probabilities directly into the correlation engine for comparison against options-implied probabilities.
Kalshi — Regulated Event Contracts
Kalshi is CFTC-regulated, making it the cleanest source of prediction market data for US financial events. OrganizedMarket focuses on Kalshi's Fed rate, CPI, GDP, and jobs report markets — these map directly to instruments TastyTrade can stream.
Source Comparison
| Feature | TastyTrade | Polymarket | Kalshi |
|---|---|---|---|
| Data type | Options, equities, futures | Binary outcome markets | Regulated event contracts |
| Streaming | Yes — DXLink WS | Yes — CLOB WS | Polling (REST) |
| Regulation | FINRA/SEC | CFTC (prediction mkt) | CFTC regulated ✓ |
| Auth | Session token (24hr) | API key + L2 signing | RSA private key JWT |
| Free tier | Yes (account required) | Yes (read-only) | Yes (demo env) |
| Correlation use | Ground truth pricing | Implied probability | Regulated prob. signal |
The 6-Agent Pipeline
Each agent runs as an independent OpenClaw sub-process inside the ClawBox VM. They communicate via OpenClaw's internal message bus. The correlator consumes outputs from all data agents simultaneously and scores divergence signals. The dispatcher is the only agent that writes to external systems.
ORGANIZEDMARKET — AGENT DATA FLOW
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ agent-signal │ │ agent-poly │ │ agent-kalshi │
│ TastyTrade │ │ Polymarket │ │ Kalshi │
│ DXLink/REST │ │ CLOB API │ │ REST + JWT │
└──────┬───────┘ └──────┬───────┘ └──────┬───────┘
│ │ │
│ quote_update │ odds_update │ contract_update
└─────────────────┼──────────────────┘
│
┌──────▼────────────────────────┐
│ agent-correlator │
│ cross-venue divergence calc │
│ Pearson · lag · Z-score │
└──────┬────────────────────────┘
│
┌────────────┴───────────────┐
│ agent-sentiment │
│ Twitter/X · news · Claude │
│ score: -1.0 → +1.0 │
└────────────┬───────────────┘
│ signal + context + confidence
┌──────▼────────────────────────┐
│ agent-dispatcher │
│ confidence gate (>0.75 alert) │
│ Slack · Discord · ClawBox UI │
└───────────────────────────────┘
Agent Definitions
Message Bus Schema
Full topic-by-topic wiring and Pydantic schema shapes live in the Architecture section's bus-topics table. Every quote, odds update, sentiment event, drift event, tier-lift stat, and final signal flows through one in-process asyncio bus validated against those schemas.
Intelligence Layer
The intelligence layer sits inside agent-correlator and agent-sentiment. It translates raw price feeds and social signals into structured, scored intelligence that a trader can act on.
Correlation Engine
The correlator maintains a rolling state of implied probabilities across all three venues. It computes Pearson correlation between TastyTrade options-implied probabilities and prediction market yes prices, then Z-scores the current divergence against its 30-day history.
Twitter/X Sentiment Pipeline
The Twitter/X v2 filtered stream listens for tweets containing financial entities and watchlist ticker symbols. Claude Sonnet scores sentiment per entity and calculates velocity — the rate at which sentiment is shifting, which is often more predictive than absolute level.
Signal Summary Generation
Before dispatching any alert, Claude generates a plain-English summary that includes data provenance, confidence rationale, and relevant context. The goal is a summary that could be handed to a trader who has no context and they immediately understand the opportunity.
Intelligence Data Flow
SENTIMENT + PRICE → SIGNAL LIFECYCLE
Twitter/X stream ──┐
NewsAPI polling ───┼──→ agent-sentiment ──→ score (-1 to +1)
│ (Claude NLP) velocity calc
│
TastyTrade DXLink ─┼──→ options-implied prob
│ IV rank + delta
│
Polymarket CLOB ───┼──→ yes price (mid)
│ order flow delta
│
Kalshi REST ───────┘──→ yes price
resolution timeline
All streams ──→ agent-correlator
· Pearson correlation matrix
· Z-score vs 30-day history
· Lag analysis (sentiment → price)
· Confidence + tier scoring
│
HIGH (>0.75) ───→ immediate alert + Claude summary
MED (0.45–0.75) → 15-min confirmation window
LOW (<0.45) ────→ flywheel log only
Stack & Conventions
Python-first, OpenClaw-native agent architecture. All agents are independent Python processes
registered with the OpenClaw gateway. Shared types and utilities live in
packages/. Same monorepo conventions as the broader Organized AI codebase.
Runtime
Key Dependencies
| Package | Purpose |
|---|---|
tastytrade | Official Python SDK — sessions, option chains, DXLink streamer |
py-clob-client | Polymarket CLOB API — order books, markets, trades |
tweepy | Twitter/X v2 filtered stream client |
websockets | DXLink WebSocket streaming for TastyTrade real-time quotes |
numpy | Pearson correlation, rolling statistics, Z-score calculation |
cryptography / pyjwt | Kalshi RSA signature auth |
newsapi-python | Financial news sources — Reuters, Bloomberg, WSJ aggregation |
sqlalchemy | Signal log persistence — all signals stored for flywheel replay |
pydantic | Message bus schema validation across all agents |
anthropic | Claude Sonnet — sentiment NLP + signal summary (via OpenClaw OAuth) |
Agent conventions
Every agent registers with the OpenClaw gateway on startup, subscribes to the bus topics it cares about, publishes only validated Pydantic messages, and persists raw data to SQLite so the flywheel can replay. Only the dispatcher writes to external systems — webhooks fire on HIGH, MED queues for human review, LOW logs only. The OpenClaw config file is the single source of truth for which agents boot, which envs each one reads, and their cadence.
Roadmap
The base signal pipeline finds price divergence between venues. The next two agents find participant divergence — who's trading, what stack are they running, and when does the frontier of available intelligence shift under them. Autoresearch and tier-correlator, autoresearch, and the sniffer MVP are wired in the repo today. Named-wallet attribution and a full LLM Wiki remain roadmap work.
Agent 7 · autoresearch — frontier-model drift detector
Probes two Claude versions with the same prompt built from the freshest
signal_log entries and rolling-stat context for each symbol, then diffs their
decisions. Publishes model.drift events when direction, confidence, or
rationale diverges beyond a threshold. The arbitrage thesis: counterparties still running
the older model will misprice exactly the setups where the newer model disagrees, and that
window closes the day they upgrade.
signal_log summaries plus rolling-stat pairs for each symbol, served as the shared context so only the model id varies across calls.ModelDriftEvent { symbol, model_a, model_b, decision_a, decision_b, confidence_a, confidence_b, divergence_score, rationale_delta } on topic model.drift.AUTORESEARCH_CADENCE_SECONDS. Live mode is wired behind AGENT_AUTORESEARCH_LIVE=1 + ANTHROPIC_API_KEY.model.drift into signal tier boost when a drift event lines up with an existing cross-venue gap.Agent 8 · sniffer — counterparty fingerprinting
Literal counterparty hardware — model, GPU, OS — isn't observable from market data. What
is observable are behavioral fingerprints from public data that
proxy those things, and that's usually enough. The implemented MVP builds persistent
venue cohorts from recent signal structure and drift disagreement, then surfaces
counterparty.fingerprint events the correlator can join against drift and
sentiment.
signal events, matching model.drift events, and public market.microstructure observations on the same symbol. The MVP does not invent wallet data; it fingerprints venue cohorts from public pipeline outputs already in hand.api_bot, latency_chaser, mean_reverter, and discretionary. They are inferred from confidence, gap size, z-score, and drift magnitude.CounterpartyFingerprint { cluster_key, venue, symbol, archetype, likely_model, confidence, evidence } persisted to SQLite and dashboard/data/fingerprints.json.autoresearch emits a drift event on symbol X, the sniffer tags the freshest venue cohort on X with the likely lagging model based on the weaker side of the disagreement.Autoresearch × Sniffer — the closed loop
Autoresearch and the sniffer are two halves of the same instrument. Autoresearch maps what frontier models disagree on in a given market state; the sniffer maps which venue cohort looks exposed to that disagreement. Joining them turns drift events into inspectable attribution hints instead of anonymous divergence.
llm_wiki_signature history per model / symbol / regime, then tags the active venue cohort with the model that scores as historically weaker in that regime.model.drift on symbol X, the sniffer identifies which venue cohort looks most exposed to the lagging side. That gives the operator a practical lead even before wallet-level attribution exists.llm_wiki_signature table in SQLite. Future work is to enrich that store with resolved outcomes and wallet-level clustering.Agent 9 · tier-correlator — MED→HIGH lift mining
The signal_log already records every tier transition. The gap is a component
that mines it: for each MED pattern, compute
P(HIGH within window W | MED pattern X) / P(HIGH baseline). Patterns with high
lift become autoresearch triggers instead of autoresearch running on a
fixed 300s cadence. When a MED with proven lift appears, autoresearch immediately probes
frontier-model drift on that exact setup — agreement promotes it toward HIGH before the
window closes. This is how the pipeline manufactures more HIGH-tier arb opportunities
instead of waiting for them.
signal_log (default 14d). Groups MED signals into pattern buckets keyed on (instruments, divergence_gap bin, z_score bin, sentiment regime) and checks whether a HIGH followed on the same instrument within window W.TierTransitionStat { pattern_key, n_med, n_followed_high, lift, baseline_high_rate, window_seconds, last_med_at } on topic tier.lift. Emitted when lift crosses a promotion threshold (default 1.5×).TIER_CORRELATOR_CADENCE_SECONDS. Runs against the DB, not the live bus; this is retrospective analysis, not real-time.tier.lift. On each event it caches the pattern_key → expected-lift mapping. When the live correlator next emits a MED signal whose pattern matches, autoresearch fires a targeted probe immediately instead of waiting for its cadence tick.llm_wiki_signature. That learned history then feeds the sniffer's next attribution pass. Outcome-weighted decay is still future work.TIER_CORRELATOR_MIN_SAMPLES; early runs may emit nothing until the history window is rich enough.Scope boundary
The sniffer only analyses public market data — on-chain wallets, public order books, self-published social metadata. It never probes remote systems, never fingerprints hardware it doesn't own, and never executes trades off its own signals. Everything it emits flows through the same human-review gate as the base pipeline.
AutoAgent
Everything up to this point is a runtime pipeline — it ingests data and emits signals. AutoAgent is a separate build-time loop that mutates the pipeline's heuristics against a measurable benchmark. The meta-agent reads a directive, rewrites a single frozen-signature Python function, scores it, and keeps the rewrite only if it beats best-so-far on both a rotating benchmark and a sealed holdout. Inspiration + pattern: AutoAgent & Autoresearch Guide ↗.
System flow
ORGANIZEDMARKET — FULL SYSTEM FLOW
┌──────────────────────── RUNTIME PIPELINE (agents/) ────────────────────────┐
│ │
│ TastyTrade ──┐ │
│ DXLink │ quote.update │
│ Polymarket ──┼──────────────▶ agent-correlator ─── signal ──▶ agent- │
│ (cli) │ odds.update (score: z, gap, (HIGH/MED/LOW) │
│ Kalshi ───┘ sentiment) │
│ (cli) ▲ │ │
│ │ sentiment ▼ │
│ Twitter/X ──▶ agent-sentiment ─────────┘ agent-dispatcher │
│ + News NLP (Claude Sonnet) (webhooks · UI) │
│ ▲ │
│ │ model.drift │
│ │ tier.lift │
│ ┌───────────┴──────────────┐ │
│ │ │ │
│ agent-autoresearch ◀───── tier.lift ── agent-tier-correlator │
│ (Opus 4.6 × 4.7) (signal_log rollup) │
│ │ │
│ │ signatures │
│ ▼ │
│ ╔═══════════════════╗ ┌────────────────────────┐ │
│ ║ LLM Wiki ║◀─────── agent-sniffer (roadmap) │ │
│ ║ model×symbol×rgm ║ counterparty fingerprint│ │
│ ╚═══════════════════╝ └────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
▲
│ manual promotion after N accepted rounds
│ (meta/agent.py → agents/correlator/scoring.py)
┌────────────────────── META LOOP (meta/) ───────────────────────────────────┐
│ │
│ meta/program.md ──▶ meta/mutate.py ──▶ (1) read meta/agent.py │
│ (directive, six-step loop (2) run meta/eval/harness.py │
│ frozen zones, driver (3) handoff to Hermes │
│ success gate) (4) fallback → Gemma on claws │
│ (5) frozen-zone diff check │
│ (6) rerun harness → keep/drop │
│ │
│ ┌─────────────────────────────────────────┐ │
│ │ HERMES (claws-mac-mini · Codex OAuth) │ │
│ primary ─────────────▶ handoff.sh → tmux worktree → pull.sh │ │
│ │ edits meta/agent.py in an isolated wt │ │
│ └─────────────────────────────────────────┘ │
│ │ │
│ │ on fail (ssh, creds, timeout) │
│ ▼ │
│ ┌─────────────────────────────────────────┐ │
│ │ GEMMA (claws · ollama run gemma3:4b) │ │
│ fallback ────────────▶ ssh claws → stdin prompt → raw body │ │
│ │ driver wraps body with EDITABLE markers │ │
│ └─────────────────────────────────────────┘ │
│ │
│ meta/eval/ meta/results.tsv │
│ ├── harness.py Brier + tier-agreement append-only log │
│ ├── fixtures/ blended score (round, mutator, task_id, │
│ │ ├── rotating/ per ISO week rot_score, hold_score, │
│ │ └── holdout.jsonl sealed accepted, note) │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
Mutation target
First cut targets agents/correlator/scoring.py — the tier-scoring function
that decides HIGH / MED / LOW. Hardcoded thresholds (HIGH > 0.75,
MED ≥ 0.45), the sigmoid shape, and the sentiment-agreement weighting are
every one a tunable knob. The meta-agent hill-climbs them against the Brier-calibration +
tier-agreement blend in meta/eval/harness.py.
meta/agent.py — mirrors the runtime scorer but is decoupled. Only the block between # ───── EDITABLE ───── and # ───── END EDITABLE ───── may be mutated. Signature score(z, sentiment_score, sentiment_velocity, divergence) → Scored is frozen.0.6·(1−Brier) + 0.4·tier_agreement against labeled fixture rows. Rotating set rolls weekly (fixtures/rotating/2026-Wnn.jsonl); sealed holdout never rotates and acts as the promotion gate.rotating_score > best AND holdout_score ≥ best. Frozen-zone byte change → automatic reject. Harness exit non-zero → reject + restore baseline.claws-mac-mini via the Claude Code hermes skill — Codex OAuth, no Anthropic SDK in this repo. Fallback: Gemma 3 local via ollama run gemma3:4b over SSH to claws when Hermes is unreachable.AGENT_*_LIVE=1 or if dispatcher webhook envs are set. Mutated code may import only math, statistics, numpy. No network, no file I/O, no subprocess from inside the EDITABLE block.meta/results.tsv — append-only: round · timestamp · mutator · task_id · rotating_score · holdout_score · accepted · agent_sha · note. Every Hermes round also leaves a full worktree under .hermes/incoming/<task_id>/ for after-the-fact forensics.Next steps for deployment
- Seed real fixtures. The seed JSONL rows in
meta/eval/fixtures/are placeholders. First production round should replace them with a rolling export ofsignal_logjoined to realized outcome probabilities (options expiry, prediction-market resolution) for the trailing 14 days. Addscripts/export_meta_fixtures.pythat writes the weekly2026-Wnn.jsonlevery Monday. - Provision Gemma on claws. One-time:
ssh claws "ollama pull gemma3:4b". Verify withssh claws "echo 'ping' | ollama run gemma3:4b". Override the tag viaGEMMA_MODELenv if you want a different size. - Schedule the loop. Add a
launchdplist or cron entry on the developer Mac that runspython3 -m meta.mutate --rounds 5nightly. Stream stdout tometa/results.tsv(already append-only) and tail into the dashboard. - Promote accepted mutations. When
meta/agent.pybeats the previous runtime scorer on both rotating AND holdout across three consecutive rounds, copy the EDITABLE body back intoagents/correlator/scoring.py, open a PR, run the full pytest suite, deploy to ClawBox. Manual step by design — the loop discovers, the human promotes. - Dashboard surface. Add a
/metaroute that readsmeta/results.tsvand renders a sparkline ofrotating_score+holdout_scoreover time, with accepted rounds highlighted. Lives alongside/signalsand the planned/wiki. - Second mutation target. Once the correlator scorer plateaus, spin up
meta/program_autoresearch.mdpointed atagents/autoresearch/client_live.pyprompt + parser. Same loop, same Hermes/Gemma split, different frozen zone. Themeta/directory is designed to hold multiple concurrent targets.
Deploy & Run
Two deploy targets: the ClawBox agent stack (runs locally inside the VM) and the CF Pages dashboard (public-facing signal feed + docs, deployed via Wrangler CLI). No CI/CD required — run from your Mac.
Deployment stages
.gitignore.*.pages.dev URL works on day one.Architecture Options
| A — Cloud only | B — Local only | C — ClawBox ✓ | D — ExoClaw bridge | |
|---|---|---|---|---|
| Claude cost | High ($20+/mo) | Free (local model) | Low ($4–8/mo) | Free |
| Signal quality | Best | Good | Best | Good |
| Setup complexity | Low | Medium | Low | High |
| Reliability | Best | Mac-dependent | Good | Mac-dependent |
| Isolation | Full | None | Full (Lima VM) | Partial |
Operational Checklist
Quick Links
| Resource | URL |
|---|---|
| GitHub | github.com/Organized-AI/organizedmarket — agents, scripts, dashboard |
| TastyTrade API | developer.tastytrade.com — REST docs + DXLink WS |
| Polymarket Docs | docs.polymarket.com — CLOB API reference |
| Kalshi API | kalshi.com/docs/api — REST + auth guide |
| ClawBox Repo | github.com/coderkk1992/clawbox — source + releases |
| OpenClaw | github.com/openclaw/openclaw — gateway + agent registry |