prediction market
The War Over Prediction Markets Is Just Getting Started
Prediction markets like Kalshi and Polymarket are booming, and so is a fight among regulators, lawmakers, and advocates over their legality. Former New Jersey governor Chris Christie, who currently serves as an advisor to the American Gaming Association, has criticized prediction markets. The political fight in the US over the future of prediction markets like Polymarket and Kalshi has escalated into a full-blown war, and battle lines aren't being neatly drawn along party lines. Instead, conservative Mormons have aligned themselves with Las Vegas bigwigs and MAGA royalty is siding with liberal Democrat lobbyists. One side argues that the platforms are breaking the law by operating as shadow casinos.
- North America > United States > New Jersey (0.25)
- North America > United States > Nevada > Clark County > Las Vegas (0.24)
- North America > United States > California (0.06)
- (10 more...)
- Government > Regional Government > North America Government > United States Government (1.00)
- Banking & Finance > Trading (1.00)
Senators Urge Top Regulator to Stay Out of Prediction Market Lawsuits
As prediction market platforms like Polymarket and Kalshi battle regulators in court, Senate Democrats are urging the CFTC to avoid weighing in, escalating a broader fight over the burgeoning industry. Senator Adam Schiff, a Democrat from California, is leading the group of lawmakers urging the CFTC to stay out of state prediction market lawsuits. A group of 23 Democratic US senators sent a letter Friday to the top federal regulator overseeing prediction markets, urging the agency to avoid weighing in on pending court cases over the legality of offerings on the platforms tied to "sports, war, and other prohibited events." Prediction markets, which sell contracts tied to the outcome of real-world developments, have exploded in popularity over the past year, attracting an increasingly mainstream fanbase eager to wager on everything from geopolitical conflicts to fashion choices to the Super Bowl. As they expanded, the platforms have become a magnet for ethical and legal controversies.
- North America > United States > California (0.37)
- North America > United States > New York (0.05)
- North America > United States > Minnesota (0.05)
- (6 more...)
- Law > Litigation (1.00)
- Government > Regional Government > North America Government > United States Government (1.00)
- Banking & Finance > Trading (1.00)
Going All-In on LLM Accuracy: Fake Prediction Markets, Real Confidence Signals
Large language models are increasingly used to evaluate other models, yet these judgments typically lack any representation of confidence. This pilot study tests whether framing an evaluation task as a betting game (a fictional prediction market with its own LLM currency) improves forecasting accuracy and surfaces calibrated confidence signals. We generated 100 math and logic questions with verifiable answers. Six Baseline models (three current-generation, three prior-generation) answered all items. Three Predictor models then forecasted, for each question-baseline pair, if the baseline would answer correctly. Each predictor completed matched runs in two conditions: Control (simple correct/incorrect predictions) and Incentive (predictions plus wagers of 1-100,000 LLMCoin under even odds, starting from a 1,000,000 LLMCoin bankroll). Across 5,400 predictions per condition, Incentive runs showed modestly higher accuracy (81.5% vs. 79.1%, p = .089, d = 0.86) and significantly faster learning across rounds (12.0 vs. 2.9 percentage-point improvement from Round 1 to Round 4, p = .011). Most notably, stake size tracked confidence. "Whale" bets of 40,000+ coins were correct ~99% of the time, while small bets (<1,000 coins) showed only ~74% accuracy. The key finding is not that fictional money makes models smarter; accuracy gains were modest and did not reach statistical significance (p = .089) in this pilot. Rather, the betting mechanic created a legible confidence signal absent from binary yes/no outputs. This suggests that simple financial framing may help transform LLMs into risk-aware forecasters, making their internal beliefs visible and usable. The protocol offers a foundation for future work for meta-evaluation systems and what may become LLM-to-LLM prediction markets.
Semantic Trading: Agentic AI for Clustering and Relationship Discovery in Prediction Markets
Capponi, Agostino, Gliozzo, Alfio, Zhu, Brian
Prediction markets allow users to trade on outcomes of real-world events, but are prone to fragmentation with overlapping questions, implicit equivalences, and hidden contradictions across markets. We present an agentic AI pipeline that autonomously (i) clusters markets into coherent topical groups using natural-language understanding over contract text and metadata, and (ii) identifies within-cluster market pairs whose resolved outcomes exhibit strong dependence, including "same-outcome" (correlated) and "different-outcome" (anti-correlated) relationships. Using a historical dataset of resolved markets on Poly-market, we evaluate the accuracy of the agent's relational predictions. We then synthesize discovered relationships into a simple trading strategy to quantify how discovered relationships translate into actionable strategies. Results show that agent-identified relationships have around 60-70% accuracy, and their induced trading strategies have an average return of 20% over week-long horizons, highlighting the ability of agen-tic AI and large language models to uncover latent semantic structure within prediction markets.
- North America > Canada (0.05)
- North America > United States > Iowa (0.04)
Outcome-based Reinforcement Learning to Predict the Future
Turtel, Benjamin, Franklin, Danny, Skotheim, Kris, Hewitt, Luke, Schoenegger, Philipp
Reinforcement Learning with Verifiable Rewards (RLVR) has been an effective approach for improving Large Language Models' reasoning in domains such as coding and mathematics. Here, we apply RLVR methods towards forecasting future real-world events - a challenging task for RL due to the very noisy (and delayed) outcomes involved. Using a novel dataset of recent questions from a prediction market, and accompanying relevant news headlines, we show that a compact (14B) reasoning model can be trained to match or surpass the predictive accuracy of frontier models like o1, while greatly improving probabilistic calibration. The model's performance is also practically meaningful: in a Polymarket trading simulation, we estimate that its bets would have yielded a return on investment of over 10% across all questions in the test set. We detail and compare approaches used in training our model, including augmenting our training-data with synthetic prediction questions, guardrails for learning stability, and median prediction sampling at inference-time.
- Europe > Ukraine (0.05)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
- North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
- North America > United States > California > Los Angeles County > Long Beach (0.04)
- Europe > Denmark (0.04)
Bounded-Loss Private Prediction Markets
Prior work has investigated variations of prediction markets that preserve participants' (differential) privacy, which formed the basis of useful mechanisms for purchasing data for machine learning objectives. Such markets required potentially unlimited financial subsidy, however, making them impractical.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > United States > Colorado > Boulder County > Boulder (0.04)
- North America > Canada > Quebec > Montreal (0.04)
Bounded-Loss Private Prediction Markets
Prior work has investigated variations of prediction markets that preserve participants' (differential) privacy, which formed the basis of useful mechanisms for purchasing data for machine learning objectives. Such markets required potentially unlimited financial subsidy, however, making them impractical.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > United States > Colorado > Boulder County > Boulder (0.04)
- North America > Canada > Quebec > Montreal (0.04)
AIA Forecaster: Technical Report
Alur, Rohan, Stadie, Bradly C., Kang, Daniel, Chen, Ryan, McManus, Matt, Rickert, Michael, Lee, Tyler, Federici, Michael, Zhu, Richard, Fogerty, Dennis, Williamson, Hayley, Lozinski, Nina, Linsky, Aaron, Sekhon, Jasjeet S.
This technical report describes the AIA Forecaster, a Large Language Model (LLM)-based system for judgmental forecasting using unstructured data. The AIA Forecaster approach combines three core elements: agentic search over high-quality news sources, a supervisor agent that reconciles disparate forecasts for the same event, and a set of statistical calibration techniques to counter behavioral biases in large language models. On the ForecastBench benchmark (Karger et al., 2024), the AIA Forecaster achieves performance equal to human superforecasters, surpassing prior LLM baselines. In addition to reporting on ForecastBench, we also introduce a more challenging forecasting benchmark sourced from liquid prediction markets. While the AIA Forecaster underperforms market consensus on this benchmark, an ensemble combining AIA Forecaster with market consensus outperforms consensus alone, demonstrating that our forecaster provides additive information. Our work establishes a new state of the art in AI forecasting and provides practical, transferable recommendations for future research. To the best of our knowledge, this is the first work that verifiably achieves expert-level forecasting at scale.
- North America > United States > New York > New York County > New York City (0.04)
- North America > Guadeloupe (0.04)
- Europe > France > Grand Est > Bas-Rhin > Strasbourg (0.04)
- (2 more...)
- Government (1.00)
- Banking & Finance > Trading (1.00)
- Leisure & Entertainment > Games > Chess (0.67)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)
LiveTradeBench: Seeking Real-World Alpha with Large Language Models
Yu, Haofei, Li, Fenghai, You, Jiaxuan
Large language models (LLMs) achieve strong performance across benchmarks--from knowledge quizzes and math reasoning to web-agent tasks--but these tests occur in static settings, lacking real dynamics and uncertainty. Consequently, they evaluate isolated reasoning or problem-solving rather than decision-making under uncertainty. To address this, we introduce LiveTradeBench, a live trading environment for evaluating LLM agents in realistic and evolving markets. LiveTradeBench follows three design principles: (i) Live data streaming of market prices and news, eliminating dependence on offline backtesting and preventing information leakage while capturing real-time uncertainty; (ii) a portfolio-management abstraction that extends control from single-asset actions to multi-asset allocation, integrating risk management and cross-asset reasoning; and (iii) multi-market evaluation across structurally distinct environments--U.S. stocks and Polymarket prediction markets--differing in volatility, liquidity, and information flow. At each step, an agent observes prices, news, and its portfolio, then outputs percentage allocations that balance risk and return. Using LiveTradeBench, we run 50-day live evaluations of 21 LLMs across families. Results show that (1) high LMArena scores do not imply superior trading outcomes; (2) models display distinct portfolio styles reflecting risk appetite and reasoning dynamics; and (3) some LLMs effectively leverage live signals to adapt decisions. These findings expose a gap between static evaluation and real-world competence, motivating benchmarks that test sequential decision making and consistency under live uncertainty.
- Europe > Russia (0.04)
- Asia > Russia (0.04)
- North America > United States > Illinois > Champaign County > Urbana (0.04)
- (4 more...)
- Banking & Finance > Trading (1.00)
- Government > Regional Government > North America Government > United States Government (0.93)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)