Goto

Collaborating Authors

 Large Language Model


LLMs as Implicit Imputers: Uncertainty Should Scale with Missing Information

arXiv.org Machine Learning

Large language models (LLMs) are increasingly deployed in settings where the available context is incomplete or degraded. We argue that an LLM generating answers under incomplete context can be viewed as an implicit imputer, and evaluated against a criterion from the multiple imputation (MI) literature: uncertainty should scale with the amount of missing information. We assess this criterion on SQuAD, using a controlled framework in which context availability is varied across five levels. We evaluate two answer-level uncertainty measures that can be estimated from repeated sampling: sampling-based confidence (empirical mode frequency) and response entropy. Confidence fails to reflect increasing missingness: it remains high even as accuracy collapses. Entropy, by contrast, increases with context removal, consistent with the MI analogy, and explains substantially more variance in accuracy than confidence across all evidence levels (quadratic $R^2$ gap up to 0.057). We further introduce a black-box diagnostic $ρ_R(α)$ that estimates the proportion of baseline uncertainty resolved by context level $α$, requiring only repeated sampling with and without context. These results suggest that entropy is a more responsive black-box uncertainty measure than confidence under incomplete context.


Learning Perturbations to Extrapolate Your LLM

arXiv.org Machine Learning

Training large language models (LLMs) such as GPT-5 and Qwen-3 (Singh et al., 2025; Yang et al., 2025) on massive text corpora aims at capturing the underlying distribution of natural language. Yet, it remains challenging for the trained model to extrapolate to out-of-distribution or out-of-domain settings beyond the support of its training data. The literature has seen the development of various data perturbation techniques, such as synonym replacement, random insertion, deletion, and swap, that modify training instances into semantically similar variants to effectively expose LLMs to a broader range of inputs and improve their ability to generalize beyond the training data (Feng et al., 2019, 2020; Li et al., 2024; Cen et al., 2026). However, their approach remains grounded in the discrete, word-level augmentation procedures mentioned previously, which may restrict its adaptivity across diverse domains. While discrete perturbations are simple to use, they could be too coarse and hard to refine due to the complexity of natural language (Park et al., 2022; Li et al., 2023). Meanwhile, fixed perturbations apply the same transformations to the data regardless of the contexts, thus failing to generalize appropriately (Ismailov and Asanova, 2025).


A Hierarchical Language Model with Predictable Scaling Laws and Provable Benefits of Reasoning

arXiv.org Machine Learning

We introduce a family of synthetic languages with hierarchical structure -- generated by a broadcast process on trees -- for which the role of context length and reasoning in autoregressive generation can be analyzed precisely. At the heart of our analytic approach is an \emph{exact $k$-gram ansatz} in place of transformers with context length $k$, a substitution we then validate empirically. Using this ansatz we derive explicit asymptotic predictions for distributional statistics of the sequences produced by a trained model, instantiated in two settings. For the \emph{Ising broadcast process} (a soft-constrained language), we prove that the variance of the generated sum scales log-linearly in the context depth and its kurtosis converges to that of a Gaussian -- both deviating from the true language for any sublinear context. For the \emph{coloring broadcast process} (a hard-constrained language) in the freezing regime, bounded-context autoregression produces sequences that, with high probability, are inconsistent with \emph{any} valid coloring of the underlying tree. Together these results imply an $Ω(n)$ lower bound on the context length required to faithfully sample length-$n$ sequences. In contrast, we prove that an autoregressive \emph{reasoning} model with only $Θ(\log n)$ working memory can sample exactly from the true language -- an exponential improvement. We confirm both the lower-bound predictions and the reasoning-based upper bound empirically with transformers trained on the synthetic language; the trained models track our asymptotic predictions quantitatively across a wide range of context sizes.


OpenAI endorses the Kids Online Safety Act

Engadget

OpenAI, which is currently facing a raft of lawsuits over alleged safety lapses in ChatGPT, has endorsed the Kids Online Safety Act (KOSA). The company said that its endorsement was part of a broader commitment to create AI-specific rules for kids safety. OpenAI's endorsement comes as KOSA, which passed the Senate in 2024, appears to be gaining some momentum . KOSA, which was first introduced in 2022, is one of several online safety bills that would require social media companies and other online platforms to implement stronger protections for children. The bill has been revised a number of times, but the current version includes a requirement for social media apps to allow minors to opt out of addictive features and algorithmic recommendations.


OpenAI Brings Its Ass to Court

WIRED

In, the company sought to show the jury a remarkable trophy as physical proof of Elon Musk's concerning behavior. Wednesday's episode of the trial kicked off on Wednesday with a unique proposition: OpenAI wanted to bring its ass into the courtroom, and lay it bare before the jury. It's a good thing lady justice wears that blindfold. A lawyer for Sam Altman's AI behemoth, Bradley Wilson, approached US district judge Yvonne Gonzalez Rogers and handed her a small gold statue with a white stone base. It depicted the rear end of a donkey--with two legs, a butt, and a tail--and was inscribed with the message, "Never stop being a jackass for safety."


Reports of the Workshops Held at the 2026 AAAI Conference on Artificial Intelligence

Interactive AI Magazine

The 10th International Workshop on Health Intelligence (W3PHIAI-26) celebrated a decade of bringing AI and health research together, building on a lineage that began with the AAAI-W3PHI workshops focused on population health (2014-2016), the AAAI-HIAI workshops focused on personalized health (2013-2016), and the subsequent joint W3PHIAI workshops held annually from 2017 through 2025. Over this decade, the series has produced hundreds of talks and high-impact publications that have collectively received thousands of citations, shaping the research agenda in both population health intelligence and personalized healthcare AI. This year's special theme, "Foundation Models and AI Agents," reflected the field's rapidly evolving frontier: the emergence of autonomous and semi-autonomous AI systems reshaping clinical workflows, patient management, health system operations, and public health surveillance. Day 1 of the workshop focused on medical imaging and the translation of AI for clinical ...


I tried Google's AI mouse pointer. It's not magic yet

PCWorld

PCWorld tested Google's new Magic Pointer, an AI-powered mouse feature for upcoming Googlebooks that uses Gemini to interpret gestures for tasks like image editing and web interactions. The feature represents Google's attempt to revolutionize computer interaction through AI, potentially allowing users to edit documents or book reservations with simple mouse movements. Early testing reveals the Magic Pointer shows promise but remains clunky and limited, requiring significant improvements before becoming truly useful for everyday computing tasks. A signature feature of Google's upcoming Googlebooks promises to put a fresh AI twist on one of the oldest computer interfaces: the mouse pointer. With the Magic Pointer, a product of Google's DeepMind lab, you'll be able to wave the pointer at an object or area on the computer screen and simply tell Gemini what you want it to do-anything from editing the image you're pointing at to adding ingredients from a recipe to a shopping list, with the AI-enabled mouse pointer acting as a shortcut for prompting. The Magic Pointer is one of top-line features for Google's new Googlebooks, the Gemini-powered successor to Chromebooks that are due in the fall.


The Download: making drugs in orbit and NASA's nuclear-powered spacecraft

MIT Technology Review

Plus: Sam Altman claims Elon Musk tried to seize control of OpenAI. A startup called Varda Space Industries is betting that the future of pharmaceuticals lies in orbit. The company has signed a deal with United Therapeutics to test whether drugs crystallize differently in microgravity, potentially creating improved versions with new properties. The idea sounds futuristic, but falling launch costs and reusable rockets are making space-based manufacturing seem increasingly plausible. Varda says the partnership could mark an important step toward building products in orbit for use back on Earth. Discover how space could become the next frontier for drug development .


Meet the Sad Wives of AI

WIRED

Are you married to a man who's obsessed with AI? If i had to listen to another minute of my husband talking about Claude Code, I might have actually died. It was 11 pm in Berkeley, California, where I was home alone with our 10-month-old daughter, and 2 am in Cambridge, Massachusetts, where he was visiting for his newish job in AI. "JUST LOOK AT THIS!" he shouted. The FaceTime camera zoomed toward a laptop sitting on a hotel bed. I still had to take the dog out. "ARE YOU LOOKING?" he shouted again. I was looking at our real baby. There are two babies in this household now: the small human one and the large language model.


SoftBank profit jumps, emboldens Son to bet more on OpenAI

The Japan Times

SoftBank Group has reported a surge in quarterly profit due to valuation gains on its OpenAI investment, boosting confidence at the Japanese company to bet even more on the ChatGPT-maker. The gains on OpenAI outweighed lackluster investment gains elsewhere in the Tokyo-based technology group's portfolio while war in the Middle East roiled markets. That points to growing reliance on the U.S. startup, which faces rising competition from Anthropic and Google and is reportedly trailing its highest internal targets. SoftBank earned a net income of ¥1.83 trillion ($11.6 billion) in its fiscal fourth quarter, compared with the average analyst estimate of ¥295.2 billion. The profit could be attributed entirely to its booking $25 billion in valuation gains on OpenAI in the quarter, according to Bloomberg Intelligence analyst Kirk Boodry. In a time of both misinformation and too much information, quality journalism is more crucial than ever.