Polymarket Agent
我们提出 polyMonitor Forecast Intelligence Graph:一个用于从 Polymarket 数据生成可审计预测市场情报的结构化多 Agent 系统。系统结合确定性证据构造、模型驱动的 specialist agents、critique、calibration 和 panel generation。它的目标不是模拟多人聊天,而是生成以价格、fills、相关市场、oracle state 和外部催化因素为依据的市场级判断。
摘要
预测市场通过价格暴露概率信念,但市场解读不能只依赖最近成交价。一个有用的监控系统必须同时推理订单簿微观结构、fill activity、兄弟市场、结算标准、oracle state 和外生信息。我们将 polyMonitor 形式化为图结构推理系统:给定全市场 evidence packet 和 dashboard lens,图会输出校准后的 panel payload、quant snapshot 和可回放的 node-level audit log。架构刻意保持小而清晰:确定性节点构建 evidence state,specialist LLM agents 分析互补的不确定性表面,skeptic agent 执行 critique,writer agent 将校准状态转成 dashboard views。
问题定义
令 x 表示一个 market evidence packet,其中包含 Polymarket market metadata、prices、order-book summaries、recent fills、related-market candidates、oracle state、external context 和 prior forecast memory。令 l 表示 {overview, special, trend} 中的一个 panel lens。系统在推理期间不学习私有市场模型,而是计算一个结构化映射:
F(x, l) -> (y, q, e)
其中 y 是可直接用于仪表盘的 intelligence payload,q 是 quant 与 data-quality snapshot,e 是有序 event trace,包含 node outputs、input hashes、model identifiers、latency、token usage、tool traces 和 errors。核心设计约束是可审计性:每条 generated claim 都应能归因到有界 input packet 或明确的模型推理节点。
系统概览
生产图是一个有向状态机,而不是开放式群聊。节点按固定顺序执行,使后续模型调用看到紧凑、类型化的状态,而不是无边界 transcript。当前图包含 deterministic state builders、三个 specialist model agents、一个 rule-based calibration node、一个 skeptic model agent,以及最终 panel writer。
evidence_builder
-> related_markets
-> quant_forecaster
-> reflexion_memory
-> microstructure
-> catalyst
-> resolution
-> calibration_agent
-> skeptic
-> panel_writer
这种设计遵循论文式 multi-agent system 视角:只有当某个 agent 拥有独立不确定性来源时才引入它。Microstructure、catalyst 和 resolution reasoning 被分离,因为它们的失败模式不同。流动性信号可能真实而新闻催化已经过时;催化因素可能很强,但市场措辞可能让结算变得模糊。
架构
| 节点 | 类型 | 职责 |
|---|---|---|
| Evidence Builder | Deterministic | 从 market candidates、groups、prices、fills、oracle snippets、external context 和 search results 构造初始 evidence state。 |
| Related Markets | Deterministic code | 连接 sibling markets、event-level groups、deadline ladders、adjacent outcomes 和 cross-market relationships,用来发现不一致定价。 |
| Quant Forecaster | Deterministic code | 计算 price drift、fill-tape microstructure、spread indicators、related-market scores 和 data-quality warnings。 |
| Reflexion Memory | Deterministic memory | 加载 prior forecast episodes 和 summary lessons,使当前推理可以与历史失败和成功案例比较。 |
| Microstructure Agent | LLM specialist | 通过 implied probability、volume、trade count、fill concentration、bid/ask quality、close probabilities 和 liquidity caveats 分析指定 markets。 |
| Catalyst Agent | LLM specialist | 识别 external triggers、related-market catalysts、event timing,以及可能推动 market-implied probability 的证据。 |
| Resolution Agent | LLM specialist | 检查 market wording、deadline buckets、official-source hierarchy、oracle signals、ambiguity 和 settlement risk。 |
| Calibration Agent | Rule-based aggregator | 将 confidence 锚定到 market-implied prices、data warnings、related-market stress、prior Brier history 和 specialist confidence。 |
| Skeptic Agent | LLM critique | 质疑 weak evidence、missing price-change data、stale signals、narrative overreach 和 probability miscalibration。 |
| Panel Writer | LLM adapter | 根据 evidence、specialist reports、calibration state 和 memory 写出最终 panel payload,同时避免引入无依据判断。 |
Agent Prompts
每个模型驱动节点都会接收 system role、紧凑 JSON user packet 和必需 output schema。prompt 不要求模型凭记忆预测,而是要求模型检查有界 evidence state,并返回包含 findings、risks、watch items、confidence 和 probability-adjustment notes 的紧凑 JSON。specialist roles 被刻意设计为不对称:
- Microstructure: price formation, fill tape, volume, trade count, liquidity concentration, and spread quality.
- Catalyst: external events, official releases, news timing, and cross-market triggers.
- Resolution: market wording, oracle conditions, settlement ambiguity, and official-source hierarchy.
- Skeptic:stale evidence、unsupported causal claims、missing data、hallucination risk 和 overconfident probability shifts。
- Panel Writer: conversion from calibrated graph state into dashboard-facing English JSON.
推理流程
runtime 将该图作为有状态推理流程执行。确定性节点先把 raw inputs 压缩为 graph context;模型节点随后在该 context 上运行,并可携带 tool traces;最终输出会被规范化到 panel schemas,并和 replay metadata 一起存储。
Algorithm 1: Forecast Intelligence Graph Inference
Input: evidence packet x, lens l
Output: panel payload y, quant snapshot q, event trace e
1: s0 <- EvidenceBuilder(x, l)
2: r <- RelatedMarkets(s0)
3: q <- QuantForecaster(s0, r)
4: m <- ReflexionMemory(s0, q, r)
5: a1 <- MicrostructureAgent(s0, r, q, m)
6: a2 <- CatalystAgent(s0, r, q, m)
7: a3 <- ResolutionAgent(s0, r, q, m)
8: c <- CalibrationAgent(q, r, m, [a1, a2, a3])
9: k <- SkepticAgent(s0, r, q, m, [a1, a2, a3], c)
10: y <- PanelWriter(s0, r, q, m, [a1, a2, a3], c, k)
11: e <- PersistNodeEvents()
12: return y, q, e
Panel 视角
dashboard views 不是独立 agents,而是同一次 graph run 上的不同 lenses。同一个 evidence packet 和 node event trace 可以支持 overview、special 和 trend views,但每个 lens 会要求最终 writer 强调不同决策表面。
| Panel | 视角 | 应强调什么 |
|---|---|---|
| 全市场 Insights | overview |
Dominant market structure, probability interpretation, top caveat, and why the board matters now. |
| Special Radar | special |
Anomalous markets, low-liquidity moves, conflicting prices, deadline ladders, and cross-market stress. |
| Trend Watch | trend |
category rotation、attention migration、catalyst clusters,以及孤立事件关注是否正在变成更广泛趋势。 |
输入表示
input packet 被刻意设计为异构。Polymarket 解读依赖 market identity、trading activity、sibling markets 和 settlement semantics。因此图会保持以下 surfaces 分离,而不是把它们折叠成一个 prose context。
| 输入表面 | 示例 | 使用者 |
|---|---|---|
| Price and liquidity | latestPrice, price24hAgo, volume, trade count, bid/ask, LOB spread |
Evidence Builder, Microstructure Agent, Quant Forecaster, Calibration Agent |
| Fill tape | topMarketFillTape, fill VWAP, recent fill drift, paired-fill ratio, price-source conflicts |
Microstructure Agent, Quant Forecaster, Skeptic Agent |
| Resolution context | Market title, rules, end date, oracle events, official source hierarchy, settlement state | Resolution Agent, Skeptic Agent, Panel Writer |
| External catalysts | News, official releases, match feeds, government calendars, social/media trend signals | Catalyst Agent, Skeptic Agent, Panel Writer |
| Related markets | Same-event children, handicap markets, deadline ladders, adjacent category markets | Related Markets, Quant Forecaster, Calibration Agent |
| Historical memory | Prior forecast episodes, realized resolution, price drift after prediction, Brier score, category reliability | Reflexion Memory, Calibration Agent, Skeptic Agent |
输出 Schema
系统输出的不只是 written panel,而是一个三元组:user-facing payload、quant evidence state 和可回放 event trace。这让架构在产品和研究评估中都可检查。
{
"runId": "fig-...",
"agentArchitecture": "forecast-intelligence-graph-v2",
"agentGraph": {
"mode": "langgraph-supervisor-worker",
"runtime": "langgraph-supervisor-stategraph",
"nodes": [
"evidence_builder",
"related_markets",
"quant_forecaster",
"reflexion_memory",
"microstructure",
"catalyst",
"resolution",
"calibration_agent",
"skeptic",
"panel_writer"
],
"events": []
},
"panelPayload": {},
"usage": {
"model": "gpt-5.5",
"latencyMs": 0,
"inputChars": 0,
"outputChars": 0
}
}
评估协议
论文级评估不应只按文字流畅度打分。真正问题是该图是否在保持 calibration 和 auditability 的同时,提升不确定条件下的市场解读。我们沿四个轴评估系统。
- Calibration:使用 Brier score、expected calibration error 和 category-level reliability curves,将 probability statements 或 confidence bins 与实际 market resolutions 比较。
- Discrimination:衡量 high-confidence claims 是否能在后续 price movement、resolution outcome 或 analyst review 中与 low-confidence claims 区分开。
- Faithfulness:审计每条 generated claim 是否由 evidence packet、specialist output、tool trace 或明确 uncertainty statement 支撑。
- Efficiency: report latency, token usage, model calls, fallback rate, and quality changes under node ablations.
消融实验
该图被设计为可通过移除或替换组件进行评估。有效消融包括 deterministic-only output、no specialist agents、no skeptic、no reflexion memory、no related-market state 和 writer-only generation。这些消融用于揭示准确性来自数据构造、specialist reasoning、critique 还是 final writing。
| 消融项 | 问题 |
|---|---|
| No specialist agents | Does deterministic evidence plus a writer perform as well as agent decomposition? |
| No skeptic | Does critique reduce unsupported causal claims and overconfident language? |
| No related markets | Does cross-market context improve detection of spread, ladder, and event-level inconsistencies? |
| No memory | Does prior forecast history improve confidence discipline and repeated-market interpretation? |
| Deterministic fallback | How much value is lost when model calls are unavailable? |
局限性
当前系统应被理解为 monitoring and interpretation graph,而不是 autonomous trading 或 oracle-action agent。它不会提交订单、移动流动性或发起 UMA disputes。它可以暴露 resolution risk,但不能替代法律或官方 settlement review。其判断受 data freshness、source coverage、CLOB availability、model reliability 以及输入 market metadata 质量约束。
相关工作
该设计最接近使用结构化角色、有界上下文和明确评估的专业金融/预测 agent systems。真正有用的结论不是 agent 越多越好,而是当每个 agent 拥有独立 evidence surface 时,分解可以提升可解释性。
- TradingAgents:包含 analysts、bull/bear researchers、trader synthesis、risk review 和 final decision control 的金融工作流。
- TradingAgents system architecture: structured reports and institutional-style coordination.
- FinCon: manager-analyst hierarchy, risk control, and verbal reinforcement for financial decision workflows.
- PROPHET: prediction-market modeling with price time series, event text semantics, and order-book microstructure.
- ForesightFlow:面向 forecasting agents 的 coordination layer、calibration、discriminative power 和 cost-quality evaluation。
- MASAI、SciAgents 和 DrugAgent:使用短轨迹、明确职责、结构化输入输出和领域知识的任务专用系统,而不是开放式聊天。
Runtime APIs
公开 endpoints 暴露产品展示和研究审计所需的当前 run artifacts。浏览器部署通过 /wm-api 前缀访问这些 routes。
| Endpoint | 返回内容 |
|---|---|
/runtime/agent/market-wide-insights/<lens> |
Dashboard-ready panel snapshot for overview, special, or trend. |
/runtime/agent/market-wide-quant/<lens> |
Quant snapshot with price drift, fill-tape microstructure, related-market scores, and data-quality warnings. |
/runtime/agent/market-wide-events/<run_id> |
Node-level audit log for replay, debugging, latency inspection, and model-output review. |