Panels

Polymarket Agent

We present the polyMonitor Forecast Intelligence Graph, a structured multi-agent system for generating auditable prediction-market intelligence from Polymarket data. The system combines deterministic evidence construction with model-backed specialist agents, critique, calibration, and panel generation. Its objective is not to simulate human conversation, but to produce market-level claims that are grounded in prices, fills, related markets, oracle state, and external catalysts.

Diagram of the polyMonitor Forecast Intelligence Graph — Figure 1. The Forecast Intelligence Graph separates deterministic state construction from model-backed specialist reasoning, critique, and panel writing.

Abstract

Prediction markets expose probabilistic beliefs through prices, but market interpretation requires more than a last-traded value. A useful monitoring system must jointly reason over order-book microstructure, fill activity, sibling markets, resolution criteria, oracle state, and exogenous information. We formulate polyMonitor as a graph-structured inference system. Given a market-wide evidence packet and a dashboard lens, the graph emits a calibrated panel payload, a quant snapshot, and a replayable node-level audit log. The architecture is intentionally small: deterministic nodes build the evidence state, specialist LLM agents analyze complementary uncertainty surfaces, a skeptic agent performs critique, and a writer agent converts calibrated state into dashboard views.

Problem Formulation

Let x denote a market evidence packet containing Polymarket market metadata, prices, order-book summaries, recent fills, related-market candidates, oracle state, external context, and prior forecast memory. Let l be a panel lens in {overview, special, trend}. The system learns no private market model during inference. Instead, it computes a structured mapping:

F(x, l) -> (y, q, e)

where y is a dashboard-ready intelligence payload, q is a quant and data-quality snapshot, and e is an ordered event trace containing node outputs, input hashes, model identifiers, latency, token usage, tool traces, and errors. The central design constraint is auditability: every generated claim should be attributable to a bounded input packet or an explicit model-backed reasoning node.

System Overview

The production graph is a directed state machine rather than an open-ended group chat. Its nodes are executed in a fixed order so that later model calls observe compact, typed state instead of an unbounded transcript. The current graph consists of deterministic state builders, three specialist model agents, a rule-based calibration node, a skeptic model agent, and a final panel writer.

evidence_builder
  -> related_markets
  -> quant_forecaster
  -> reflexion_memory
  -> microstructure
  -> catalyst
  -> resolution
  -> calibration_agent
  -> skeptic
  -> panel_writer

This design follows a research-paper view of multi-agent systems: agents are introduced only when they own distinct sources of uncertainty. Microstructure, catalyst, and resolution reasoning are separated because they fail in different ways. A liquidity signal can be real while the news catalyst is stale; a catalyst can be strong while the market wording makes settlement ambiguous.

Architecture

Node	Type	Responsibility
Evidence Builder	Deterministic	Constructs the initial evidence state from market candidates, groups, prices, fills, oracle snippets, external context, and search results.
Related Markets	Deterministic code	Links sibling markets, event-level groups, deadline ladders, adjacent outcomes, and cross-market relationships that may reveal inconsistent pricing.
Quant Forecaster	Deterministic code	Computes price drift, fill-tape microstructure, spread indicators, related-market scores, and data-quality warnings.
Reflexion Memory	Deterministic memory	Loads prior forecast episodes and summary lessons so current reasoning can be compared with historical failures and successes.
Microstructure Agent	LLM specialist	Analyzes named markets through implied probability, volume, trade count, fill concentration, bid/ask quality, close probabilities, and liquidity caveats.
Catalyst Agent	LLM specialist	Identifies external triggers, related-market catalysts, event timing, and evidence that would plausibly move market-implied probability.
Resolution Agent	LLM specialist	Examines market wording, deadline buckets, official-source hierarchy, oracle signals, ambiguity, and settlement risk.
Calibration Agent	Rule-based aggregator	Anchors confidence to market-implied prices, data warnings, related-market stress, prior Brier history, and specialist confidence.
Skeptic Agent	LLM critique	Challenges weak evidence, missing price-change data, stale signals, narrative overreach, and probability miscalibration.
Panel Writer	LLM adapter	Writes the final panel payload from evidence, specialist reports, calibration state, and memory without introducing ungrounded claims.

Agent Prompts

Each model-backed node receives a system role, a compact JSON user packet, and a required output schema. The prompt does not ask the model to forecast from memory. It asks the model to inspect a bounded evidence state and return compact JSON containing findings, risks, watch items, confidence, and probability-adjustment notes. The specialist roles are intentionally asymmetric:

Microstructure: price formation, fill tape, volume, trade count, liquidity concentration, and spread quality.
Catalyst: external events, official releases, news timing, and cross-market triggers.
Resolution: market wording, oracle conditions, settlement ambiguity, and official-source hierarchy.
Skeptic: stale evidence, unsupported causal claims, missing data, hallucination risk, and overconfident probability shifts.
Panel Writer: conversion from calibrated graph state into dashboard-facing English JSON.

Inference Procedure

The runtime executes the graph as a stateful inference procedure. Deterministic nodes first compress raw inputs into a graph context; model nodes then operate over that context with optional tool traces; final outputs are normalized into panel schemas and stored with replay metadata.

Algorithm 1: Forecast Intelligence Graph Inference
Input: evidence packet x, lens l
Output: panel payload y, quant snapshot q, event trace e

1: s0 <- EvidenceBuilder(x, l)
2: r  <- RelatedMarkets(s0)
3: q  <- QuantForecaster(s0, r)
4: m  <- ReflexionMemory(s0, q, r)
5: a1 <- MicrostructureAgent(s0, r, q, m)
6: a2 <- CatalystAgent(s0, r, q, m)
7: a3 <- ResolutionAgent(s0, r, q, m)
8: c  <- CalibrationAgent(q, r, m, [a1, a2, a3])
9: k  <- SkepticAgent(s0, r, q, m, [a1, a2, a3], c)
10: y <- PanelWriter(s0, r, q, m, [a1, a2, a3], c, k)
11: e <- PersistNodeEvents()
12: return y, q, e

Panel Lenses

The dashboard views are not independent agents. They are lenses over a shared graph run. The same evidence packet and node event trace can support overview, special, and trend views, but each lens asks the final writer to emphasize a different decision surface.

Panel	Lens	What it should emphasize
Market-wide Insights	`overview`	Dominant market structure, probability interpretation, top caveat, and why the board matters now.
Special Radar	`special`	Anomalous markets, low-liquidity moves, conflicting prices, deadline ladders, and cross-market stress.
Trend Watch	`trend`	Category rotation, attention migration, catalyst clusters, and whether isolated event interest is becoming a broader trend.

Input Representation

The input packet is deliberately heterogeneous. Polymarket interpretation depends on market identity, trading activity, sibling markets, and settlement semantics. The graph therefore keeps the following surfaces separate rather than collapsing them into one prose context.

Input surface	Examples	Used by
Price and liquidity	`latestPrice`, `price24hAgo`, volume, trade count, bid/ask, LOB spread	Evidence Builder, Microstructure Agent, Quant Forecaster, Calibration Agent
Fill tape	`topMarketFillTape`, fill VWAP, recent fill drift, paired-fill ratio, price-source conflicts	Microstructure Agent, Quant Forecaster, Skeptic Agent
Resolution context	Market title, rules, end date, oracle events, official source hierarchy, settlement state	Resolution Agent, Skeptic Agent, Panel Writer
External catalysts	News, official releases, match feeds, government calendars, social/media trend signals	Catalyst Agent, Skeptic Agent, Panel Writer
Related markets	Same-event children, handicap markets, deadline ladders, adjacent category markets	Related Markets, Quant Forecaster, Calibration Agent
Historical memory	Prior forecast episodes, realized resolution, price drift after prediction, Brier score, category reliability	Reflexion Memory, Calibration Agent, Skeptic Agent

Output Schema

The system emits more than a written panel. The output is a triple: the user-facing payload, the quant evidence state, and the replayable event trace. This makes the architecture inspectable under both product and research evaluation.

{
  "runId": "fig-...",
  "agentArchitecture": "forecast-intelligence-graph-v2",
  "agentGraph": {
    "mode": "langgraph-supervisor-worker",
    "runtime": "langgraph-supervisor-stategraph",
    "nodes": [
      "evidence_builder",
      "related_markets",
      "quant_forecaster",
      "reflexion_memory",
      "microstructure",
      "catalyst",
      "resolution",
      "calibration_agent",
      "skeptic",
      "panel_writer"
    ],
    "events": []
  },
  "panelPayload": {},
  "usage": {
    "model": "gpt-5.5",
    "latencyMs": 0,
    "inputChars": 0,
    "outputChars": 0
  }
}

Evaluation Protocol

A paper-level evaluation should not score the system by prose fluency alone. The relevant question is whether the graph improves market interpretation under uncertainty while preserving calibration and auditability. We evaluate the system along four axes.

Calibration: compare probability statements or confidence bins against realized market resolutions using Brier score, expected calibration error, and category-level reliability curves.
Discrimination: measure whether high-confidence claims separate from low-confidence claims under subsequent price movement, resolution outcome, or analyst review.
Faithfulness: audit whether each generated claim is supported by the evidence packet, a specialist output, a tool trace, or an explicit uncertainty statement.
Efficiency: report latency, token usage, model calls, fallback rate, and quality changes under node ablations.

Ablations

The graph is designed to be evaluated by removing or replacing components. Useful ablations include deterministic-only output, no specialist agents, no skeptic, no reflexion memory, no related-market state, and writer-only generation. These ablations reveal whether accuracy comes from data construction, specialist reasoning, critique, or final writing.

Ablation	Question
No specialist agents	Does deterministic evidence plus a writer perform as well as agent decomposition?
No skeptic	Does critique reduce unsupported causal claims and overconfident language?
No related markets	Does cross-market context improve detection of spread, ladder, and event-level inconsistencies?
No memory	Does prior forecast history improve confidence discipline and repeated-market interpretation?
Deterministic fallback	How much value is lost when model calls are unavailable?

Limitations

The current system should be read as a monitoring and interpretation graph, not an autonomous trading or oracle-action agent. It does not submit orders, move liquidity, or initiate UMA disputes. It can surface resolution risk, but it does not replace legal or official settlement review. Its claims remain bounded by data freshness, source coverage, CLOB availability, model reliability, and the quality of the market metadata supplied to the graph.

Related Work

The design is closest to specialized financial and forecasting-agent systems that use structured roles, bounded context, and explicit evaluation. The useful lesson is not that more agents are always better, but that decomposition can improve interpretability when each agent owns a distinct evidence surface.

TradingAgents: financial workflow with analysts, bull/bear researchers, trader synthesis, risk review, and final decision control.
TradingAgents system architecture: structured reports and institutional-style coordination.
FinCon: manager-analyst hierarchy, risk control, and verbal reinforcement for financial decision workflows.
PROPHET: prediction-market modeling with price time series, event text semantics, and order-book microstructure.
ForesightFlow: coordination layer, calibration, discriminative power, and cost-quality evaluation for forecasting agents.
MASAI, SciAgents, and DrugAgent: task-specific systems that use short trajectories, explicit responsibilities, structured inputs and outputs, and domain knowledge instead of open-ended chat.

Runtime APIs

The public endpoints expose the current run artifacts needed for product display and research audit. Browser deployments access these routes through the /wm-api prefix.

Endpoint	Returns
`/runtime/agent/market-wide-insights/<lens>`	Dashboard-ready panel snapshot for `overview`, `special`, or `trend`.
`/runtime/agent/market-wide-quant/<lens>`	Quant snapshot with price drift, fill-tape microstructure, related-market scores, and data-quality warnings.
`/runtime/agent/market-wide-events/<run_id>`	Node-level audit log for replay, debugging, latency inspection, and model-output review.