Genre
Americans Are Trading Billions of Dollars on Polymarket's Banned Offshore Platform
Americans Are Trading Billions of Dollars on Polymarket's Banned Offshore Platform It's the first estimate of how many Americans are sneaking onto Polymarket's banned crypto-based platform. Approximately 30 percent of the trading volume on Polymarket comes from the United States, according to a new study--an eye-popping number, considering that none of those people are legally allowed to use the crypto -based platform. The study, conducted by Rutgers University statistician Harry Crane, estimated that people in the US funneled between $10.6 to $26.7 billion through Polymarket. To track the platform's activity, Crane looked at what appeared to be US-based trades on offshore prediction market platforms from May 2025 to the end of April 2026. He found that many of the highest-volume markets on Polymarket were US-centric, including those covering US elections and sporting events.
Is PRM Necessary? Problem-Solving RL Implicitly Induces PRM Capability in LLMs
The development of reasoning capabilities represents a critical frontier in large language models (LLMs) research, where reinforcement learning (RL) and process reward models (PRMs) have emerged as predominant methodological frameworks. Contrary to conventional wisdom, empirical evidence from DeepSeek-R1 demonstrates that pure RL training focused on mathematical problem-solving can progressively enhance reasoning abilities without PRM integration, challenging the perceived necessity of process supervision. In this study, we conduct a systematic investigation of the relationship between RL training and PRM capabilities. Our findings demonstrate that problem-solving proficiency and process supervision capabilities represent complementary dimensions of reasoning that co-evolve synergistically during pure RL training. In particular, current PRMs underperform simple baselines like majority voting when applied to state-of-the-art models such as DeepSeek-R1 and QwQ-32B. To address this limitation, we propose Self-PRM, an introspective framework in which models autonomously evaluate and rerank their generated solutions through self-reward mechanisms. Although Self-PRM consistently improves the accuracy of the benchmark (particularly with larger sample sizes), analysis exposes persistent challenges: The approach exhibits low precision (<10\%) on difficult problems, frequently misclassifying flawed solutions as valid. These analyses underscore the need for combined training with process supervision and continued RL scaling to enhance reward alignment and introspective accuracy. We hope these findings provide actionable insights for building more reliable and self-aware complex reasoning models.
Defining Autonomy for Wellness Robots in Senior Care
Download this complimentary White Paper today! This White Paper gives engineers, researchers, and care professionals an overview of how socially assistive wellness robots can support senior wellness, and how a framework can measure their autonomy. What you will learn about:ย Why the senior care crisis exceeds incremental healthcare automation. Staffing shortages, rising dementia prevalence, and limited daily wellness programming all play a part. How the seven ICAA dimensions of wellness define a distinct category of socially assistive robot, separate from companion devices, medical devices, and general-purpose humanoids. How the Care Robot Autonomy Scale (CRAS), a six-level framework modeled on a driving-automation standard, measures autonomy across four wellness dimensions. What technical capabilities, clinical evidence, and a three-phase roadmap suggest about the path from current practice toward full wellness autonomy in the early 2030s. Click 'LOOK INSIDE' to Download Now.
What's Going On in Donald Trump's Head? We Don't Have Brain Scans. We Do Have This.
No one can say for sure what's going on in the president's head. His 25 greatest obsessions can get us a little closer. This is the year the first baby boomers--those born in 1946--turn 80, and that cohort includes Donald Trump. We have all recently lived through what it means to have an 80-year-old commander in chief, but at a political moment that's simultaneously more horrific, erratic, and just plain befuddling than anything this country has seen in ages, we wanted to understand the brain of 80-year-old president. Plenty of people are trying to discern whether his recent rants and raves are due to a more serious cognitive decline--we understand the instinct; we've done it too --but we went a different (if related) route. The more we dug into Trump's many fixations, the more we realized that this man still thinks he lives in the 1980s. We also discovered--without too much surprise--that he often seems to fundamentally misunderstand the works he treasures most deeply. These items might not replace a brain map, but they do create a certain holistic view of what animates and splinters Trump's mind. Sometimes, they just help explain his worldview. Other times, they seem to have had real influence on policy and the America that Trump is trying to create. Welcome to Trump Brain, the 25 things that define who the president is--and what he wants. Please enable javascript to fully experience this interactive. When millions of people took to the streets in October to protest Trump's authoritarianism, the president responded by dunking on his critics online. Specifically, he posted an A.I.-generated video of a fighter jet, piloted by himself in a literal crown, dropping human excrement onto the crowds. It was perhaps Trump's most juvenile use of A.I. slop yet--the kind of low-quality, feverish content made possible by artificial intelligence. Trump undoubtedly is the perfect president for the A.I. slop era. In some ways, this is because he's the ideal audience for it: Like many older internet users delighted by the technology, Trump seems to enjoy mindless, cartoonish, childish content. One of the videos he shared depicted him playing soccer with Cristiano Ronaldo in the Oval Office.
Evaluating multiple models using labeled and unlabeled data
It is difficult to evaluate machine learning classifiers without large labeled datasets, which are often unavailable. In contrast, unlabeled data is plentiful, but not easily used for evaluation. Here, we introduce Semi-Supervised Model Evaluation (SSME), a method that uses both labeled and unlabeled data to evaluate machine learning classifiers. The key idea is to estimate the joint distribution of ground truth labels and classifier scores using a semi-supervised mixture model. The semi-supervised mixture model allows SSME to learn from three sources of information: unlabeled data, multiple classifiers, and probabilistic classifier scores. Once fit, the mixture model enables estimation of any metric that is a function of classifier scores and ground truth labels (e.g., accuracy or AUC). We derive theoretical bounds on the error of these estimates, showing that estimation error decreases with the number of classifiers and the amount of unlabeled data. We present experiments in four domains where obtaining large labeled datasets is often impractical: healthcare, content moderation, molecular property prediction, and text classification. Our results demonstrate that SSME estimates performance more accurately than do competing methods, reducing error by 5.1x relative to using labeled data alone and 2.4x relative to the next best method.
Thoughts Are All Over the Place: On the Underthinking of Long Reasoning Models
Long reasoning models (LRMs) such as OpenAI's o1 and DeepSeek's R1 have demonstrated remarkable abilities in complex reasoning tasks by scaling test-time compute and exhibiting human-like deep thinking. However, we identify a phenomenon we term underthinking, where LRMs frequently switch between different reasoning thoughts without sufficiently exploring promising paths to reach a correct solution. This behavior leads to inadequate depth of reasoning and decreased performance, particularly on challenging mathematical problems. To systematically analyze this issue, we conduct experiments on three challenging test sets and two representative open-source LRMs, revealing that frequent thought switching correlates with incorrect responses. We introduce a novel metric to quantify underthinking by measuring token efficiency in incorrect answers. To address underthinking, we propose a decoding strategy with thought switching penalty (Tip) that discourages premature transitions between thoughts, encouraging deeper exploration of each reasoning path. Experimental results demonstrate that our approach improves accuracy across challenging datasets without requiring model fine-tuning. Our findings contribute to understanding reasoning inefficiencies in LRMs and offer a practical solution to enhance their problem-solving capabilities.
\epsilon -Seg: Sparsely Supervised Semantic Segmentation of Microscopy Data
Semantic segmentation of electron microscopy (EM) images of biological samples remains a challenge in the life sciences. EM data captures details of biological structures, sometimes with such complexity that even human observers can find it overwhelming. We introduce $\epsilon$-Seg, a method based on hierarchical variational autoencoders (HVAEs), employing center-region masking, sparse label contrastive learning (CL), a Gaussian mixture model (GMM) prior, and clustering-free label prediction. Center-region masking and the inpainting loss encourage the model to learn robust and representative embeddings to distinguish the desired classes, even if training labels are sparse ($0.05$\% of the total image data or less). For optimal performance, we employ CL and a GMM prior to shape the latent space of the HVAE such that encoded input patches tend to cluster w.r.t. the semantic classes we wish to distinguish. Finally, instead of clustering latent embeddings for semantic segmentation, we propose a MLP semantic segmentation head to directly predict class labels from latent embeddings. We show empirical results of $\epsilon$-Seg and baseline methods on $2$ dense EM datasets of biological tissues and demonstrate the applicability of our method also on fluorescence microscopy data. Our results show that $\epsilon$-Seg is capable of achieving competitive sparsely-supervised segmentation results on complex biological image data, even if only limited amounts of training labels are available.
Improving the Generation and Evaluation of Synthetic Data for Downstream Medical Causal Inference
Causal inference is essential for developing and evaluating medical interventions, yet real-world medical datasets are often difficult to access due to regulatory barriers. This makes synthetic data a potentially valuable asset that enables these medical analyses, along with the development of new inference methods themselves. Generative models can produce synthetic data that closely approximate real data distributions, yet existing methods do not consider the unique challenges that downstream causal inference tasks, and specifically those focused on treatments, pose. We establish a set of desiderata that synthetic data containing treatments should satisfy to maximise downstream utility: preservation of (i) the covariate distribution, (ii) the treatment assignment mechanism, and (iii) the outcome generation mechanism. Based on these desiderata, we propose a set of evaluation metrics to assess such synthetic data. Finally, we present STEAM: a novel method for generating Synthetic data for Treatment Effect Analysis in Medicine that mimics the data-generating process of data containing treatments and optimises for our desiderata. We empirically demonstrate that STEAM achieves state-of-the-art performance across our metrics as compared to existing generative models, particularly as the complexity of the true data-generating process increases.
Broken Tokens? Your Language Model can Secretly Handle Non-Canonical Tokenizations
Modern tokenizers employ deterministic algorithms to map text into a single ``canonical token sequence, yet the same string can be encoded as many non-canonical tokenizations using the language model vocabulary, including tokenizing by character. In this paper, we investigate the robustness of LMs to input encoded with non-canonical tokenizations entirely unseen during training. Surprisingly, when evaluated across 20 benchmarks, we find that instruction-tuned models retain up to 93.4\% of their original performance when given a randomly sampled tokenization, and 90.8\% with character-level tokenization. We find that overall stronger models tend to be more robust, and that robustness diminishes as the tokenization departs farther from the canonical form. Motivated by these results, we identify settings where non-canonical tokenization schemes can \textit{improve} performance, finding that character level segmentation improves string manipulation and code understanding tasks by up to 15\%, and right aligned digit grouping enhances large number arithmetic by over 33\%. Finally, we investigate the source of this robustness, finding that it arises in the instruction-tuning phase. We provide evidence that both base and post-trained models grasp the semantics of non-canonical tokenizations (perceiving them as containing misspellings). However, base models try to mimic the imagined mistakes and degenerate into nonsensical output, while post-trained models are committed to fluent responses. Overall, our findings suggest that models are less committed to their tokenizer than previously believed, and highlight the promise of intervening on tokenization at inference time to boost language model performance.