Pacific Ocean
Anthropic can now track the bizarre inner workings of a large language model
It's no secret that large language models work in mysterious ways. Few--if any--mass-market technologies have ever been so little understood. That makes figuring out what makes them tick one of the biggest open challenges in science. Shedding some light on how these models work would expose their weaknesses, revealing why they make stuff up and can be tricked into going off the rails. It would help resolve deep disputes about exactly what these models can and can't do.
Interpretable Cross-Sphere Multiscale Deep Learning Predicts ENSO Skilfully Beyond 2 Years
Hao, Rixu, Zhao, Yuxin, Zhang, Shaoqing, Wang, Guihua, Deng, Xiong
Email: zhaoyuxin@hrbeu.edu.cn ( Y.Z.); szhang@ouc.edu.cn ( S.Z.) Abstract: El Niรฑ o - Southern Oscillation (ENSO) exerts global climate and societal impacts, but real - time prediction with lead times beyond one year remains challenging. Dynamical models suffer from large biases and uncertainties, while deep learning struggles with in terpretability and multi - scale dynamics. Here, we introduce PTSTnet, an interpretable model that unifies dynamical processes and cross - scale spatiotemporal learning in an innovative neural - network framework with physics - encoding learning. PTSTnet produces interpretable predictions significantly outperforming state - of - the - art benchmarks with lead times beyond 24 months, providing physical insights into error propagation in ocean - atmosphere interactions. PTSTnet learns feature representations with physical co nsistency from sparse data to tackle inherent multi - scale and multi - physics challenges underlying ocean - atmosphere processes, thereby inherently enhancing long - term prediction skill. Our successful realizations mark substantial steps forward in interpretab le insights into innovative neural ocean modelling . 2 Introduction The El Niรฑo Southern Oscillation (ENSO) represents the main source of interannual variability in the global climate system, and the ability to predict large - scale climate variability and its impacts on global social and environmental systems is highly depe ndent on the quality of ENSO predictions ( 1 - 5) . With significant advances in ENSO observations and process understanding, considerable progress has been made in associated modelling and prediction in recent decades ( 6 - 10) .
Most Japanese high school textbooks to include QR codes
Almost all textbooks to be used by first- and second-year high school students in Japan from fiscal 2026 will include quick response (QR) codes that link to websites with video and audio learning aid materials, sources said Tuesday. The education ministry said the same day that a total of 253 textbooks in 13 subjects have passed the second screenings under the current curriculum guidelines. In response to the rapid progress of digitalization, many of the textbooks include descriptions on information ethics and generative artificial intelligence. The average number of pages per textbook in 11 commonly taught subjects came to 321, slightly up from the previous screenings in 2021. All geography-history and civics textbooks take up the Northern Territories, which are effectively controlled by Russia; Takeshima, the Sea of Japan islets controlled by South Korea; and the Japanese-administered Senkaku Islands, which are also claimed by China.
Waymo aims to offer paid robotaxi rides in Washington DC next year
Waymo is continuing to expand its foothold across the US, having recently started offering paid robotaxi services in more parts of the San Francisco Bay Area. Next up are Atlanta and Miami, and now the company has revealed plans to offer its driverless Waymo One service in the nation's capital in 2026. Before that can happen, though, Waymo will need to get approval from regulators. The company says it will "continue to work closely with policymakers to formalize the regulations needed to operate without a human behind the wheel in the District." DC currently requires autonomous vehicles to have a human at the wheel, ready to take control if necessary.
Bigger But Not Better: Small Neural Language Models Outperform Large Language Models in Detection of Thought Disorder
Li, Changye, Xu, Weizhe, Pakhomov, Serguei, Bradley, Ellen, Ben-Zeev, Dror, Cohen, Trevor
Disorganized thinking is a key diagnostic indicator of schizophrenia-spectrum disorders. Recently, clinical estimates of the severity of disorganized thinking have been shown to correlate with measures of how difficult speech transcripts would be for large language models (LLMs) to predict. However, LLMs' deployment challenges -- including privacy concerns, computational and financial costs, and lack of transparency of training data -- limit their clinical utility. We investigate whether smaller neural language models can serve as effective alternatives for detecting positive formal thought disorder, using the same sliding window based perplexity measurements that proved effective with larger models. Surprisingly, our results show that smaller models are more sensitive to linguistic differences associated with formal thought disorder than their larger counterparts. Detection capability declines beyond a certain model size and context length, challenging the common assumption of ``bigger is better'' for LLM-based applications. Our findings generalize across audio diaries and clinical interview speech samples from individuals with psychotic symptoms, suggesting a promising direction for developing efficient, cost-effective, and privacy-preserving screening tools that can be deployed in both clinical and naturalistic settings.
Why Representation Engineering Works: A Theoretical and Empirical Study in Vision-Language Models
Tian, Bowei, Lyu, Xuntao, Liu, Meng, Wang, Hongyi, Li, Ang
Representation Engineering (RepE) has emerged as a powerful paradigm for enhancing AI transparency by focusing on high-level representations rather than individual neurons or circuits. It has proven effective in improving interpretability and control, showing that representations can emerge, propagate, and shape final model outputs in large language models (LLMs). However, in Vision-Language Models (VLMs), visual input can override factual linguistic knowledge, leading to hallucinated responses that contradict reality. To address this challenge, we make the first attempt to extend RepE to VLMs, analyzing how multimodal representations are preserved and transformed. Building on our findings and drawing inspiration from successful RepE applications, we develop a theoretical framework that explains the stability of neural activity across layers using the principal eigenvector, uncovering the underlying mechanism of RepE. We empirically validate these instrinsic properties, demonstrating their broad applicability and significance. By bridging theoretical insights with empirical validation, this work transforms RepE from a descriptive tool into a structured theoretical framework, opening new directions for improving AI robustness, fairness, and transparency.
Towards Long-Range ENSO Prediction with an Explainable Deep Learning Model
Chen, Qi, Cui, Yinghao, Hong, Guobin, Ashok, Karumuri, Pu, Yuchun, Zheng, Xiaogu, Zhang, Xuanze, Zhong, Wei, Zhan, Peng, Wang, Zhonglei
Its evolution is governed by intricate air-sea interactions, posing significant challenges for long-term prediction. In this study, we introduce CTEFNet, a multivariate deep learning model that synergizes convolutional neural networks and transformers to enhance ENSO forecasting. By integrating multiple oceanic and atmospheric predictors, CTEFNet extends the effective forecast lead time to 20 months while mitigating the impact of the spring predictability barrier, outperforming both dynamical models and state-of-the-art deep learning approaches. Furthermore, CTEFNet offers physically meaningful and statistically significant insights through gradient-based sensitivity analysis, revealing the key precursor signals that govern ENSO dynamics, which align with well-established theories and reveal new insights about inter-basin interactions among the Pacific, Atlantic, and Indian Oceans. The CTEFNet's superior predictive skill and interpretable sensitivity assessments underscore its potential for advancing climate prediction. Our findings highlight the importance of multivariate coupling in ENSO evolution and demonstrate the promise of deep learning in capturing complex climate dynamics with enhanced interpretability. 1 Introduction El Ni no-Southern Oscillation (ENSO) is one of the most prominent modes of inter-annual climate variability, characterized by shifts in sea surface temperatures (SST) across the tropical Pacific Ocean and the weakening of equatorial trade winds.
MAGIC-VQA: Multimodal And Grounded Inference with Commonsense Knowledge for Visual Question Answering
Yang, Shuo, Luo, Siwen, Han, Soyeon Caren, Hovy, Eduard
Visual Question Answering (VQA) requires reasoning across visual and textual modalities, yet Large Vision-Language Models (LVLMs) often lack integrated commonsense knowledge, limiting their robustness in real-world scenarios. To address this, we introduce MAGIC-VQA, a novel framework that enhances VQA by systematically integrating commonsense knowledge with LVLMs. MAGIC-VQA employs a three-stage process: (1) Explicit Knowledge Integration from external sources, (2) By-Type Post-Processing for contextual refinement, and (3) Implicit Knowledge Augmentation using a Graph Neural Network (GNN) for structured reasoning. While GNNs bring greater depth to structured inference, they enable superior relational inference beyond LVLMs. MAGIC-VQA bridges a key gap by unifying commonsensse knowledge with LVLM-driven reasoning, eliminating the need for extensive pre-training or complex prompt tuning. Our framework achieves state-of-the-art performance on benchmark datasets, significantly improving commonsense reasoning in VQA.
When is dataset cartography ineffective? Using training dynamics does not improve robustness against Adversarial SQuAD
In this paper, I investigate the effectiveness of dataset cartography for extractive question answering on the SQuAD dataset. I begin by analyzing annotation artifacts in SQuAD and evaluate the impact of two adversarial datasets, AddSent and AddOneSent, on an ELECTRA-small model. Using training dynamics, I partition SQuAD into easy-to-learn, ambiguous, and hard-to-learn subsets. I then compare the performance of models trained on these subsets to those trained on randomly selected samples of equal size. Results show that training on cartography-based subsets does not improve generalization to the SQuAD validation set or the AddSent adversarial set. While the hard-to-learn subset yields a slightly higher F1 score on the AddOneSent dataset, the overall gains are limited. These findings suggest that dataset cartography provides little benefit for adversarial robustness in SQuAD-style QA tasks. I conclude by comparing these results to prior findings on SNLI and discuss possible reasons for the observed differences.
Embedding spatial context in urban traffic forecasting with contrastive pre-training
Low, Matthew, Prabowo, Arian, Xue, Hao, Salim, Flora
Urban traffic forecasting is a commonly encountered problem, with wide-ranging applications in fields such as urban planning, civil engineering and transport. In this paper, we study the enhancement of traffic forecasting with pre-training, focusing on spatio-temporal graph methods. While various machine learning methods to solve traffic forecasting problems have been explored and extensively studied, there is a gap of a more contextual approach: studying how relevant non-traffic data can improve prediction performance on traffic forecasting problems. We call this data spatial context. We introduce a novel method of combining road and traffic information through the notion of a traffic quotient graph, a quotient graph formed from road geometry and traffic sensors. We also define a way to encode this relationship in the form of a geometric encoder, pre-trained using contrastive learning methods and enhanced with OpenStreetMap data. We introduce and discuss ways to integrate this geometric encoder with existing graph neural network (GNN)-based traffic forecasting models, using a contrastive pre-training paradigm. We demonstrate the potential for this hybrid model to improve generalisation and performance with zero additional traffic data. Code for this paper is available at https://github.com/mattchrlw/forecasting-on-new-roads.