Goto

Collaborating Authors

 Oceania




A critique of pure stupidity: understanding Trump 2.0

The Guardian

President Donald Trump holds charts as he speaks about the economy in the Oval Office, August 2025. President Donald Trump holds charts as he speaks about the economy in the Oval Office, August 2025. If the first term of Donald Trump provoked anxiety over the fate of objective knowledge, the second has led to claims we live in a world-historical age of stupid, accelerated by big tech. But might there be a way out? T he first and second Trump administrations have provoked markedly different critical reactions. The shock of 2016 and its aftermath saw a wave of liberal anxiety about the fate of objective knowledge, not only in the US but also in Britain, where the Brexit referendum that year had been won by a campaign that misrepresented key facts and figures.


I: Multi-modal Models Membership Inference Pingyi Hu

Neural Information Processing Systems

Those scores are then averaged over the whole corpus to reach an overall quality. The MSCOCO dataset is one of the most representative large-scale labeled image datasets available to the public. It is also the most authoritative and important benchmark in the current target recognition, detection and other fields. Its image data source is Y ahoo's photo album website, Flickr. Most of the images in the dataset display a human being involved in an activity.


Auto Learning Attention: Supplementary Material

Neural Information Processing Systems

The initial learning rate is 0.1, and The weight decay is set as 0.0005. The batch size is 256. The results are summarised in Table 3 of the paper. The learning rate starts from 0.1 We replace it with ResNet50 to evaluate the performance of different attention modules. The conv5_x, average pooling, fc, and the softmax layers are removed from the original classification model.


Musk becomes world's first half-trillionaire

BBC News

Musk becomes world's first half-trillionaire Tesla boss Elon Musk has become the first person ever to achieve a net worth of more than $500bn (ยฃ370.9bn), The tech magnate's net worth briefly reached $500.1bn on Wednesday afternoon New York time, before dipping slightly to just over $499bn later in the day, the Forbes billionaires index reported. Alongside Tesla, valuations of his other ventures, including the artificial intelligence start-up xAI and rocket company SpaceX, have also reportedly climbed in recent months. According to Forbes' billionaires index, Oracle founder Larry Ellison is the world's second richest person, with a fortune of about $350.7bn. Mr Ellison briefly overtook Musk last month after shares in Oracle soared by more than 40%, boosted by the firm's surprisingly rosy outlook for its cloud infrastructure business and artificial intelligence (AI) deals.


Stabilizing Policy Gradients for Sample-Efficient Reinforcement Learning in LLM Reasoning

arXiv.org Artificial Intelligence

Reinforcement Learning, particularly through policy gradient methods, has played a central role in enabling reasoning capabilities of Large Language Models. However, the optimization stability of policy gradients in this setting remains understudied. As a result, existing implementations often resort to conservative hyperparameter choices to ensure stability, which requires more training samples and increases computational costs. Hence, developing models for reliably tracking the underlying optimization dynamics and leveraging them into training enables more sample-efficient regimes and further unleashes scalable post-training. We address this gap by formalizing the stochastic optimization problem of policy gradients with explicit consideration of second-order geometry. We propose a tractable computational framework that tracks and leverages curvature information during policy updates. We further employ this framework to design interventions in the optimization process through data selection. The resultant algorithm, Curvature-Aware Policy Optimization (CAPO), identifies samples that contribute to unstable updates and masks them out. Theoretically, we establish monotonic improvement guarantees under realistic assumptions. On standard math reasoning benchmarks, we empirically show that CAPO ensures stable updates under aggressive learning regimes where baselines catastrophically fail. With minimal intervention (rejecting fewer than 8% of tokens), CAPO achieves up to 30x improvement in sample efficiency over standard GRPO for LLM reasoning.


Approximately Unimodal Likelihood Models for Ordinal Regression

arXiv.org Machine Learning

Ordinal regression (OR, also called ordinal classification) is classification of ordinal data, in which the underlying target variable is categorical and considered to have a natural ordinal relation for the underlying explanatory variable. A key to successful OR models is to find a data structure `natural ordinal relation' common to many ordinal data and reflect that structure into the design of those models. A recent OR study found that many real-world ordinal data show a tendency that the conditional probability distribution (CPD) of the target variable given a value of the explanatory variable will often be unimodal. Several previous studies thus developed unimodal likelihood models, in which a predicted CPD is guaranteed to become unimodal. However, it was also observed experimentally that many real-world ordinal data partly have values of the explanatory variable where the underlying CPD will be non-unimodal, and hence unimodal likelihood models may suffer from a bias for such a CPD. Therefore, motivated to mitigate such a bias, we propose approximately unimodal likelihood models, which can represent up to a unimodal CPD and a CPD that is close to be unimodal. We also verify experimentally that a proposed model can be effective for statistical modeling of ordinal data and OR tasks.


Are All Marine Species Created Equal? Performance Disparities in Underwater Object Detection

arXiv.org Artificial Intelligence

Underwater object detection is critical for monitoring marine ecosystems but poses unique challenges, including degraded image quality, imbalanced class distribution, and distinct visual characteristics. Not every species is detected equally well, yet underlying causes remain unclear. We address two key research questions: 1) What factors beyond data quantity drive class-specific performance disparities? 2) How can we systematically improve detection of under-performing marine species? We manipulate the DUO and RUOD datasets to separate the object detection task into localization and classification and investigate the under-performance of the scallop class. Localization analysis using YOLO11 and TIDE finds that foreground-background discrimination is the most problematic stage regardless of data quantity. Classification experiments reveal persistent precision gaps even with balanced data, indicating intrinsic feature-based challenges beyond data scarcity and inter-class dependencies. We recommend imbalanced distributions when prioritizing precision, and balanced distributions when prioritizing recall. Improving under-performing classes should focus on algorithmic advances, especially within localization modules. We publicly release our code and datasets.


REAL: Reading Out Transformer Activations for Precise Localization in Language Model Steering

arXiv.org Artificial Intelligence

Inference-time steering aims to alter a large language model's (LLM's) responses without changing its parameters, but a central challenge is identifying the internal modules that most strongly govern the target behavior. Existing approaches often rely on simplistic cues or ad hoc heuristics, leading to suboptimal or unintended effects. We introduce REAL, a framework for identifying behavior-relevant modules (attention heads or layers) in Transformer models. For each module, REAL trains a vector-quantized autoencoder (VQ-AE) on its hidden activations and uses a shared, learnable codebook to partition the latent space into behavior-relevant and behavior-irrelevant subspaces. REAL quantifies a module's behavioral relevance by how well its VQ-AE encodings discriminate behavior-aligned from behavior-violating responses via a binary classification metric; this score guides both module selection and steering strength. We evaluate REAL across eight LLMs from the Llama and Qwen families and nine datasets spanning truthfulness enhancement, open-domain QA under knowledge conflicts, and general alignment tasks. REAL enables more effective inference-time interventions, achieving an average relative improvement of 20% (up to 81.5%) over the ITI method on truthfulness steering. In addition, the modules selected by REAL exhibit strong zero-shot generalization in cross-domain truthfulness-steering scenarios.