Gupta, Ritwik
Enough Coin Flips Can Make LLMs Act Bayesian
Gupta, Ritwik, Corona, Rodolfo, Ge, Jiaxin, Wang, Eric, Klein, Dan, Darrell, Trevor, Chan, David M.
Large language models (LLMs) exhibit the ability to generalize given few-shot examples in their input prompt, an emergent capability known as in-context learning (ICL). We investigate whether LLMs utilize ICL to perform structured reasoning in ways that are consistent with a Bayesian framework or rely on pattern matching. Using a controlled setting of biased coin flips, we find that: (1) LLMs often possess biased priors, causing initial divergence in zero-shot settings, (2) in-context evidence outweighs explicit bias instructions, (3) LLMs broadly follow Bayesian posterior updates, with deviations primarily due to miscalibrated priors rather than flawed updates, and (4) attention magnitude has negligible effect on Bayesian inference. With sufficient demonstrations of biased coin flips via ICL, LLMs update their priors in a Bayesian manner.
Whack-a-Chip: The Futility of Hardware-Centric Export Controls
Gupta, Ritwik, Walker, Leah, Reddie, Andrew W.
U.S. export controls on semiconductors are widely known to be permeable, with the People's Republic of China (PRC) steadily creating state-of-the-art artificial intelligence (AI) models with exfiltrated chips. This paper presents the first concrete, public evidence of how leading PRC AI labs evade and circumvent U.S. export controls. We examine how Chinese companies, notably Tencent, are not only using chips that are restricted under U.S. export controls but are also finding ways to circumvent these regulations by using software and modeling techniques that maximize less capable hardware. Specifically, we argue that Tencent's ability to power its Hunyuan-Large model with non-export controlled NVIDIA H20s exemplifies broader gains in efficiency in machine learning that have eroded the moat that the United States initially built via its existing export controls. Finally, we examine the implications of this finding for the future of the United States' export control strategy.
Data-Centric AI Governance: Addressing the Limitations of Model-Focused Policies
Gupta, Ritwik, Walker, Leah, Corona, Rodolfo, Fu, Stephanie, Petryk, Suzanne, Napolitano, Janet, Darrell, Trevor, Reddie, Andrew W.
Current regulations on powerful AI capabilities are narrowly focused on "foundation" or "frontier" models. However, these terms are vague and inconsistently defined, leading to an unstable foundation for governance efforts. Critically, policy debates often fail to consider the data used with these models, despite the clear link between data and model performance. Even (relatively) "small" models that fall outside the typical definitions of foundation and frontier models can achieve equivalent outcomes when exposed to sufficiently specific datasets. In this work, we illustrate the importance of considering dataset size and content as essential factors in assessing the risks posed by models both today and in the future. More broadly, we emphasize the risk posed by over-regulating reactively and provide a path towards careful, quantitative evaluation of capabilities that can lead to a simplified regulatory environment.
xT: Nested Tokenization for Larger Context in Large Images
Gupta, Ritwik, Li, Shufan, Zhu, Tyler, Malik, Jitendra, Darrell, Trevor, Mangalam, Karttikeya
Modern computer vision pipelines handle large images in one of two sub-optimal ways: down-sampling or cropping. These two methods incur significant losses in the amount of information and context present in an image. There are many downstream applications in which global context matters as much as high frequency details, such as in real-world satellite imagery; in such cases researchers have to make the uncomfortable choice of which information to discard. We introduce xT, a simple framework for vision transformers which effectively aggregates global context with local details and can model large images end-to-end on contemporary GPUs. We select a set of benchmark datasets across classic vision tasks which accurately reflect a vision model's ability to understand truly large images and incorporate fine details over large scales and assess our method's improvement on them. By introducing a nested tokenization scheme for large images in conjunction with long-sequence length models normally used for natural language processing, we are able to increase accuracy by up to 8.6% on challenging classification tasks and $F_1$ score by 11.6 on context-dependent segmentation in large images.
ClimSim: A large multi-scale dataset for hybrid physics-ML climate emulation
Yu, Sungduk, Hannah, Walter, Peng, Liran, Lin, Jerry, Bhouri, Mohamed Aziz, Gupta, Ritwik, Lütjens, Björn, Will, Justus Christopher, Behrens, Gunnar, Busecke, Julius, Loose, Nora, Stern, Charles I, Beucler, Tom, Harrop, Bryce, Hillman, Benjamin R, Jenney, Andrea, Ferretti, Savannah, Liu, Nana, Anandkumar, Anima, Brenowitz, Noah D, Eyring, Veronika, Geneva, Nicholas, Gentine, Pierre, Mandt, Stephan, Pathak, Jaideep, Subramaniam, Akshay, Vondrick, Carl, Yu, Rose, Zanna, Laure, Zheng, Tian, Abernathey, Ryan, Ahmed, Fiaz, Bader, David C, Baldi, Pierre, Barnes, Elizabeth, Bretherton, Christopher, Caldwell, Peter, Chuang, Wayne, Han, Yilun, Huang, Yu, Iglesias-Suarez, Fernando, Jantre, Sanket, Kashinath, Karthik, Khairoutdinov, Marat, Kurth, Thorsten, Lutsko, Nicholas, Ma, Po-Lun, Mooers, Griffin, Neelin, J. David, Randall, David, Shamekh, Sara, Taylor, Mark A, Urban, Nathan, Yuval, Janni, Zhang, Guang, Pritchard, Michael
Modern climate projections lack adequate spatial and temporal resolution due to computational constraints. A consequence is inaccurate and imprecise predictions of critical processes such as storms. Hybrid methods that combine physics with machine learning (ML) have introduced a new generation of higher fidelity climate simulators that can sidestep Moore's Law by outsourcing compute-hungry, short, high-resolution simulations to ML emulators. However, this hybrid ML-physics simulation approach requires domain-specific treatment and has been inaccessible to ML experts because of lack of training data and relevant, easy-to-use workflows. We present ClimSim, the largest-ever dataset designed for hybrid ML-physics research. It comprises multi-scale climate simulations, developed by a consortium of climate scientists and ML researchers. It consists of 5.7 billion pairs of multivariate input and output vectors that isolate the influence of locally-nested, high-resolution, high-fidelity physics on a host climate simulator's macro-scale physical state. The dataset is global in coverage, spans multiple years at high sampling frequency, and is designed such that resulting emulators are compatible with downstream coupling into operational climate simulators. We implement a range of deterministic and stochastic regression baselines to highlight the ML challenges and their scoring.