Goto

Collaborating Authors

 Technology


EgoThinker: Unveiling Egocentric Reasoning with Spatio-Temporal CoT

Neural Information Processing Systems

Egocentric video reasoning centers on an unobservable agent behind the camera who dynamically shapes the environment, requiring inference of hidden intentions and recognition of fine-grained interactions. This core challenge limits current multimodal large language models (MLLMs), which excel at visible event reasoning but lack embodied, first-person understanding. To bridge this gap, we introduce EgoThinker, a novel framework that endows MLLMs with robust egocentric reasoning capabilities through spatio-temporal chain-ofthought supervision and a two-stage learning curriculum. First, we introduce EgoRe-5M, a large-scale egocentric QA dataset constructed from 13M diverse egocentric video clips. This dataset features multi-minute segments annotated with detailed CoT rationales and dense hand-object grounding. Second, we employ SFT on EgoRe-5M to instill reasoning skills, followed by reinforcement fine-tuning (RFT) to further enhance spatio-temporal localization. Experimental results show that EgoThinker outperforms existing methods across multiple egocentric benchmarks, while achieving substantial improvements in finegrained spatio-temporal localization tasks.


How Prince George will follow in his father's footsteps at Eton College

BBC News

How Prince George will follow in his father's footsteps at Eton College Prince George will attend Eton College in Berkshire from September, Kensington Palace has announced. His father, Prince William, also attended the elite boarding school for boys, where fees are about ยฃ63,000 a year. He is the Prince and Princess of Wales' oldest child and the second in line of succession to the throne. The BBC's senior royal correspondent Daniela Relph explains the Royal Family's connection to Eton College. Did Coppell lose his keys during 106 celebration?


C2Prompt: Class-aware Client Knowledge Interaction for Federated Continual Learning

Neural Information Processing Systems

Federated continual learning (FCL) tackles scenarios of learning from continuously emerging task data across distributed clients, where the key challenge lies in addressing both temporal forgetting over time and spatial forgetting simultaneously. Recently, prompt-based FCL methods have shown advanced performance through task-wise prompt communication. In this study, we underscore that the existing prompt-based FCL methods are prone to class-wise knowledge coherence between prompts across clients. The class-wise knowledge coherence includes two aspects: (1) intra-class distribution gap across clients, which degrades the learned semantics across prompts, (2) inter-prompt class-wise relevance, which highlights crossclass knowledge confusion. During prompt communication, insufficient classwise coherence exacerbates knowledge conflicts among new prompts and induces interference with old prompts, intensifying both spatial and temporal forgetting. To address these issues, we propose a novel Class-aware Client Knowledge Interaction (C2Prompt) method that explicitly enhances class-wise knowledge coherence during prompt communication. Specifically, a local class distribution compensation mechanism (LCDC) is introduced to reduce intra-class distribution disparities across clients, thereby reinforcing intra-class knowledge consistency. Additionally, a class-aware prompt aggregation scheme (CPA) is designed to alleviate interclass knowledge confusion by selectively strengthening class-relevant knowledge aggregation. Extensive experiments on multiple FCL benchmarks demonstrate that C2Prompt achieves state-of-the-art performance.


Efficient Algorithms for Robust and Partial Semi-Discrete Optimal Transport

Neural Information Processing Systems

The sensitivity of optimal transport (OT) to noise has motivated the study of robust variants. In this paper, we study two such formulations of semi-discrete OT in Rd: (i) the ฮฑ-optimal partial transport, which minimizes the cost of transporting a mass of ฮฑ; and (ii) the ฮป-robust optimal transport, which regularizes the OT problem using the total variation (TV) distance. First, we provide a novel characterization of the optimal solutions in these settings, showing they can be represented as a restricted Laguerre diagram. Second, we exploit this characterization to establish a strong algorithmic connection between the two problems, showing that any solver for one can be adapted to solve the other with comparable precision. Third, we overcome key challenges posed in extending the cost-scaling paradigm to compute these variants of OT and present an algorithm that computes the exact solution up to log(1/ฮต) bits of precision in nO(d) log(1/ฮต) time, where nis the support size of the discrete distribution.


Empirical Study on Robustness and Resilience in Cooperative Multi-Agent Reinforcement Learning

Neural Information Processing Systems

In cooperative Multi-Agent Reinforcement Learning (MARL), it is a common practice to tune hyperparameters in ideal simulated environments to maximize cooperative performance. However, policies tuned for cooperation often fail to maintain robustness and resilience under real-world uncertainties. Building trustworthy MARL systems requires a deep understanding of robustness, which ensures stability under uncertainties, and resilience, the ability to recover from disruptions--a concept extensively studied in control systems but largely overlooked in MARL. In this paper, we present a large-scale empirical study comprising over 82,620 experiments to evaluate cooperation, robustness, and resilience in MARL across 4 real-world environments, 13 uncertainty types, and 15 hyperparameters. Our key findings are: (1) Under mild uncertainty, optimizing cooperation improves robustness and resilience, but this link weakens as perturbations intensify. Robustness and resilience also varies by algorithm and uncertainty type.


Safety Pretraining: Toward the Next Generation of Safe AI

Neural Information Processing Systems

As large language models (LLMs) are increasingly deployed in high-stakes settings, the risk of generating harmful or toxic content remains a central challenge. Post-hoc alignment methods are brittle: once unsafe patterns are learned during pretraining, they are hard to remove. In this work, we present a data-centric pretraining framework that builds safety into the model from the start. Our framework consists of four key steps: (i) Safety Filtering: building a safety classifier to classify webdata into safe and unsafe categories; (ii) Safety Rephrasing: we recontextualize unsafe webdata into safer narratives; (iii) Native Refusal: we synthetically generate pretraining datasets that actively teach models to refuse on unsafe content and the moral reasoning behind it, and (iv) Harmfulness-Tag annotated pretraining: we flag unsafe content during pretraining using a special token, and use it to steer models away from unsafe generations at inference-time. Our safety-pretrained models reduce attack success rates from 38.8% to 8.4% on standard LLM safety benchmarks with no performance degradation on general tasks.


Enhancing Sample Selection Against Label Noise by Cutting Mislabeled Easy Examples

Neural Information Processing Systems

Sample selection is a prevalent approach in learning with noisy labels, aiming to identify confident samples for training. Although existing sample selection methods have achieved decent results by reducing the noise rate of the selected subset, they often overlook that not all mislabeled examples harm the model's performance equally. In this paper, we demonstrate that mislabeled examples correctly predicted by the model early in the training process are particularly harmful to model performance. We refer to these examples as Mislabeled Easy Examples (MEEs). To address this, we propose Early Cutting, which introduces a recalibration step that employs the model's later training state to re-select the confident subset identified early in training, thereby avoiding misleading confidence from early learning and effectively filtering out MEEs. Experiments on the CIFAR, WebVision, and full ImageNet-1k datasets demonstrate that our method effectively improves sample selection and model performance by reducing MEEs.


Qualcomm unveils its Snapdragon Reality Elite chip for next-gen AR headsets

Engadget

The company also debuted a new platform for brands wanting to build their own AI glasses. High-end augmented reality and mixed reality devices are set to get a boost thanks to Qualcomm's latest XR chip. During a keynote at Augmented World Expo (AWE), the company unveiled its Snapdragon Reality Elite processor, which it says will allow the next generation of AR and mixed reality headsets to be smaller and more efficient. In terms of specs, the Snapdragon Reality Elite can support up to 4.4K resolution in each eye at 90 fps, a modest upgrade from the XR2+ Gen 2, but one that Qualcomm says will enable better image quality and lower latency. It also delivers significant improvements in terms of efficiency, with up to 20 percent boost in battery life while running up to 12 degrees Celsius (about 54 degrees Fahrenheit) cooler, compared with the XR2+ Gen 2. Performance-wise, Reality Elite comes with notable gains over the previous generation as well.


One Climate Change Innovation: Just Look Up

WIRED

To build one family's dream house on a flood-prone Mississippi bayou, AD100 architect Tom Kundig decided the sky's the limit. Tom Kundig absorbed lessons in resilience before he even knew the word. As a child, he saw many of the industrial and agricultural buildings of the rural Pacific Northwest abandoned but still standing, the harsh winter conditions no match for their steel columns. That background came in handy when he was asked to design a house for a young family on a coastal Mississippi site susceptible to severe flooding. The clients, Joel and Jill Kavanaugh, had fallen in love with a plot bordering the Gulf Islands National Seashore in Ocean Springs, Mississippi.


GRASS: Scalable Data Attribution with Gradient Sparsification and Sparse Projection

Neural Information Processing Systems

Gradient-based data attribution methods, such as influence functions, are critical for understanding the impact of individual training samples without requiring repeated model retraining. However, their scalability is often limited by the high computational and memory costs associated with per-sample gradient computation. In this work, we propose GRASS, a novel gradient compression algorithm and its variants FACTGRASS for linear layers specifically, that explicitly leverage the inherent sparsity of per-sample gradients to achieve sub-linear space and time complexity. Extensive experiments demonstrate the effectiveness of our approach, achieving substantial speedups while preserving data influence fidelity. In particular, FACTGRASS achieves up to 165% faster throughput on billion-scale models compared to the previous state-of-the-art baselines.