Panels

Polymarket Agent

我们提出 polyMonitor Forecast Intelligence Graph：一个用于从 Polymarket 数据生成可审计预测市场情报的结构化多 Agent 系统。系统结合确定性证据构造、模型驱动的 specialist agents、critique、calibration 和 panel generation。它的目标不是模拟多人聊天，而是生成以价格、fills、相关市场、oracle state 和外部催化因素为依据的市场级判断。

Diagram of the polyMonitor Forecast Intelligence Graph — 图 1. Forecast Intelligence Graph 将确定性状态构造与模型驱动的 specialist reasoning、critique 和 panel writing 分离。

摘要

预测市场通过价格暴露概率信念，但市场解读不能只依赖最近成交价。一个有用的监控系统必须同时推理订单簿微观结构、fill activity、兄弟市场、结算标准、oracle state 和外生信息。我们将 polyMonitor 形式化为图结构推理系统：给定全市场 evidence packet 和 dashboard lens，图会输出校准后的 panel payload、quant snapshot 和可回放的 node-level audit log。架构刻意保持小而清晰：确定性节点构建 evidence state，specialist LLM agents 分析互补的不确定性表面，skeptic agent 执行 critique，writer agent 将校准状态转成 dashboard views。

问题定义

令 x 表示一个 market evidence packet，其中包含 Polymarket market metadata、prices、order-book summaries、recent fills、related-market candidates、oracle state、external context 和 prior forecast memory。令 l 表示 {overview, special, trend} 中的一个 panel lens。系统在推理期间不学习私有市场模型，而是计算一个结构化映射：

F(x, l) -> (y, q, e)

其中 y 是可直接用于仪表盘的 intelligence payload，q 是 quant 与 data-quality snapshot，e 是有序 event trace，包含 node outputs、input hashes、model identifiers、latency、token usage、tool traces 和 errors。核心设计约束是可审计性：每条 generated claim 都应能归因到有界 input packet 或明确的模型推理节点。

系统概览

生产图是一个有向状态机，而不是开放式群聊。节点按固定顺序执行，使后续模型调用看到紧凑、类型化的状态，而不是无边界 transcript。当前图包含 deterministic state builders、三个 specialist model agents、一个 rule-based calibration node、一个 skeptic model agent，以及最终 panel writer。

evidence_builder
  -> related_markets
  -> quant_forecaster
  -> reflexion_memory
  -> microstructure
  -> catalyst
  -> resolution
  -> calibration_agent
  -> skeptic
  -> panel_writer

这种设计遵循论文式 multi-agent system 视角：只有当某个 agent 拥有独立不确定性来源时才引入它。Microstructure、catalyst 和 resolution reasoning 被分离，因为它们的失败模式不同。流动性信号可能真实而新闻催化已经过时；催化因素可能很强，但市场措辞可能让结算变得模糊。

架构

节点	类型	职责
Evidence Builder	Deterministic	从 market candidates、groups、prices、fills、oracle snippets、external context 和 search results 构造初始 evidence state。
Related Markets	Deterministic code	连接 sibling markets、event-level groups、deadline ladders、adjacent outcomes 和 cross-market relationships，用来发现不一致定价。
Quant Forecaster	Deterministic code	计算 price drift、fill-tape microstructure、spread indicators、related-market scores 和 data-quality warnings。
Reflexion Memory	Deterministic memory	加载 prior forecast episodes 和 summary lessons，使当前推理可以与历史失败和成功案例比较。
Microstructure Agent	LLM specialist	通过 implied probability、volume、trade count、fill concentration、bid/ask quality、close probabilities 和 liquidity caveats 分析指定 markets。
Catalyst Agent	LLM specialist	识别 external triggers、related-market catalysts、event timing，以及可能推动 market-implied probability 的证据。
Resolution Agent	LLM specialist	检查 market wording、deadline buckets、official-source hierarchy、oracle signals、ambiguity 和 settlement risk。
Calibration Agent	Rule-based aggregator	将 confidence 锚定到 market-implied prices、data warnings、related-market stress、prior Brier history 和 specialist confidence。
Skeptic Agent	LLM critique	质疑 weak evidence、missing price-change data、stale signals、narrative overreach 和 probability miscalibration。
Panel Writer	LLM adapter	根据 evidence、specialist reports、calibration state 和 memory 写出最终 panel payload，同时避免引入无依据判断。

Agent Prompts

每个模型驱动节点都会接收 system role、紧凑 JSON user packet 和必需 output schema。prompt 不要求模型凭记忆预测，而是要求模型检查有界 evidence state，并返回包含 findings、risks、watch items、confidence 和 probability-adjustment notes 的紧凑 JSON。specialist roles 被刻意设计为不对称：

Microstructure: price formation, fill tape, volume, trade count, liquidity concentration, and spread quality.
Catalyst: external events, official releases, news timing, and cross-market triggers.
Resolution: market wording, oracle conditions, settlement ambiguity, and official-source hierarchy.
Skeptic：stale evidence、unsupported causal claims、missing data、hallucination risk 和 overconfident probability shifts。
Panel Writer: conversion from calibrated graph state into dashboard-facing English JSON.

推理流程

runtime 将该图作为有状态推理流程执行。确定性节点先把 raw inputs 压缩为 graph context；模型节点随后在该 context 上运行，并可携带 tool traces；最终输出会被规范化到 panel schemas，并和 replay metadata 一起存储。

Algorithm 1: Forecast Intelligence Graph Inference
Input: evidence packet x, lens l
Output: panel payload y, quant snapshot q, event trace e

1: s0 <- EvidenceBuilder(x, l)
2: r  <- RelatedMarkets(s0)
3: q  <- QuantForecaster(s0, r)
4: m  <- ReflexionMemory(s0, q, r)
5: a1 <- MicrostructureAgent(s0, r, q, m)
6: a2 <- CatalystAgent(s0, r, q, m)
7: a3 <- ResolutionAgent(s0, r, q, m)
8: c  <- CalibrationAgent(q, r, m, [a1, a2, a3])
9: k  <- SkepticAgent(s0, r, q, m, [a1, a2, a3], c)
10: y <- PanelWriter(s0, r, q, m, [a1, a2, a3], c, k)
11: e <- PersistNodeEvents()
12: return y, q, e

Panel 视角

dashboard views 不是独立 agents，而是同一次 graph run 上的不同 lenses。同一个 evidence packet 和 node event trace 可以支持 overview、special 和 trend views，但每个 lens 会要求最终 writer 强调不同决策表面。

Panel	视角	应强调什么
全市场 Insights	`overview`	Dominant market structure, probability interpretation, top caveat, and why the board matters now.
Special Radar	`special`	Anomalous markets, low-liquidity moves, conflicting prices, deadline ladders, and cross-market stress.
Trend Watch	`trend`	category rotation、attention migration、catalyst clusters，以及孤立事件关注是否正在变成更广泛趋势。

输入表示

input packet 被刻意设计为异构。Polymarket 解读依赖 market identity、trading activity、sibling markets 和 settlement semantics。因此图会保持以下 surfaces 分离，而不是把它们折叠成一个 prose context。

输入表面	示例	使用者
Price and liquidity	`latestPrice`, `price24hAgo`, volume, trade count, bid/ask, LOB spread	Evidence Builder, Microstructure Agent, Quant Forecaster, Calibration Agent
Fill tape	`topMarketFillTape`, fill VWAP, recent fill drift, paired-fill ratio, price-source conflicts	Microstructure Agent, Quant Forecaster, Skeptic Agent
Resolution context	Market title, rules, end date, oracle events, official source hierarchy, settlement state	Resolution Agent, Skeptic Agent, Panel Writer
External catalysts	News, official releases, match feeds, government calendars, social/media trend signals	Catalyst Agent, Skeptic Agent, Panel Writer
Related markets	Same-event children, handicap markets, deadline ladders, adjacent category markets	Related Markets, Quant Forecaster, Calibration Agent
Historical memory	Prior forecast episodes, realized resolution, price drift after prediction, Brier score, category reliability	Reflexion Memory, Calibration Agent, Skeptic Agent

输出 Schema

系统输出的不只是 written panel，而是一个三元组：user-facing payload、quant evidence state 和可回放 event trace。这让架构在产品和研究评估中都可检查。

{
  "runId": "fig-...",
  "agentArchitecture": "forecast-intelligence-graph-v2",
  "agentGraph": {
    "mode": "langgraph-supervisor-worker",
    "runtime": "langgraph-supervisor-stategraph",
    "nodes": [
      "evidence_builder",
      "related_markets",
      "quant_forecaster",
      "reflexion_memory",
      "microstructure",
      "catalyst",
      "resolution",
      "calibration_agent",
      "skeptic",
      "panel_writer"
    ],
    "events": []
  },
  "panelPayload": {},
  "usage": {
    "model": "gpt-5.5",
    "latencyMs": 0,
    "inputChars": 0,
    "outputChars": 0
  }
}

评估协议

论文级评估不应只按文字流畅度打分。真正问题是该图是否在保持 calibration 和 auditability 的同时，提升不确定条件下的市场解读。我们沿四个轴评估系统。

Calibration：使用 Brier score、expected calibration error 和 category-level reliability curves，将 probability statements 或 confidence bins 与实际 market resolutions 比较。
Discrimination：衡量 high-confidence claims 是否能在后续 price movement、resolution outcome 或 analyst review 中与 low-confidence claims 区分开。
Faithfulness：审计每条 generated claim 是否由 evidence packet、specialist output、tool trace 或明确 uncertainty statement 支撑。
Efficiency: report latency, token usage, model calls, fallback rate, and quality changes under node ablations.

消融实验

该图被设计为可通过移除或替换组件进行评估。有效消融包括 deterministic-only output、no specialist agents、no skeptic、no reflexion memory、no related-market state 和 writer-only generation。这些消融用于揭示准确性来自数据构造、specialist reasoning、critique 还是 final writing。

消融项	问题
No specialist agents	Does deterministic evidence plus a writer perform as well as agent decomposition?
No skeptic	Does critique reduce unsupported causal claims and overconfident language?
No related markets	Does cross-market context improve detection of spread, ladder, and event-level inconsistencies?
No memory	Does prior forecast history improve confidence discipline and repeated-market interpretation?
Deterministic fallback	How much value is lost when model calls are unavailable?

局限性

当前系统应被理解为 monitoring and interpretation graph，而不是 autonomous trading 或 oracle-action agent。它不会提交订单、移动流动性或发起 UMA disputes。它可以暴露 resolution risk，但不能替代法律或官方 settlement review。其判断受 data freshness、source coverage、CLOB availability、model reliability 以及输入 market metadata 质量约束。

Runtime APIs

公开 endpoints 暴露产品展示和研究审计所需的当前 run artifacts。浏览器部署通过 /wm-api 前缀访问这些 routes。

Endpoint	返回内容
`/runtime/agent/market-wide-insights/<lens>`	Dashboard-ready panel snapshot for `overview`, `special`, or `trend`.
`/runtime/agent/market-wide-quant/<lens>`	Quant snapshot with price drift, fill-tape microstructure, related-market scores, and data-quality warnings.
`/runtime/agent/market-wide-events/<run_id>`	Node-level audit log for replay, debugging, latency inspection, and model-output review.