Goto

Collaborating Authors

 blackwell






Evaluating the Ability of Large Language Models to Reason about Cardinal Directions, Revisited

Cohn, Anthony G, Blackwell, Robert E

arXiv.org Artificial Intelligence

We investigate the abilities of 28 Large language Models (LLMs) to reason about cardinal directions (CDs) using a benchmark generated from a set of templates, extensively testing an LLM's ability to determine the correct CD given a particular scenario. The templates allow for a number of degrees of variation such as means of locomotion of the agent involved, and whether set in the first, second or third person. Even the newer Large Reasoning Models are unable to reliably determine the correct CD for all questions. This paper summarises and extends earlier work presented at COSIT-24.


Blackwell's Approachability for Sequential Conformal Inference

Principato, Guillaume, Stoltz, Gilles

arXiv.org Machine Learning

Conformal inference [Vovk et al., 2005] provides a general procedure for constructing prediction sets with guaranteed coverage, under the assumption that the data are exchangeable. This assumption, however, is often too restrictive: it typically fails in sequential or time-dependent settings such as time series forecasting, where the distribution of observations may shift over time. To address this issue, Gibbs and Cand` es [2021] introduced Adaptive Conformal Inference (ACI), which extends Conformal Prediction (CP) to adversarial environments. ACI adapts to distribution shifts by updating prediction intervals in response to observed outcomes, ensuring that the empirical coverage converges to the desired level. While effective in maintaining coverage, ACI and its extensions generally lack efficiency guarantees-for instance, there is no control over the average length of prediction intervals in adversarial regimes. In this work, we study sequential conformal inference as a repeated two-player finite game and invoke Blackwell's theory of approachability to characterize feasible objectives. Building on this perspective, we design a calibration-based algorithm that ensures asymptotic validity while achieving asymptotic efficiency under mild assumptions. Our approach relies on the notion of opportunistic approachability [Bernstein et al., 2014], which allows the learner to exploit potential restrictions in the opponent's play. We argue that such assumptions better fit the typical use cases of ACI-such as distributional drift or regime switching-than the fully adversarial setting.




Nvidia sets fresh sales record amid fears of an AI bubble and Trump's trade wars

The Guardian

Chipmaker Nvidia set a fresh sales record in the second quarter, surpassing Wall Street expectations for its artificial intelligence chips. But shares of the chip giant still dropped 2.3% in after hours trading, in a sign that investors' worries of an AI bubble and the repercussions of Donald Trump's trade wars are not quelled. Nvidia's financial report was the first test of investor appetite since last week's mass AI-stock selloff, when several tech stocks saw shares tumble last week amid growing questions over whether AI-driven companies are being overvalued. On Wednesday, Nvidia reported an adjusted earnings per share of 1.08 on 46.74bn in revenue, surpassing Wall Street's projection of 1.01 in earnings per share on 46.05bn in revenue, according to Fact Set data. But investors had high expectations for the company.