Technology
Exclusive eBook: How AI is becoming the next military advisor
Access a subscriber-only eBook of a collection of stories about how militaries are using Al models to make decisions. This ebook is available only for subscribers. A collection of stories about how militaries are using AI models to make decisions. Stories written by James O'Donnel by James O'Donnell A new US phone network for Christians aims to block porn and gender-related content James O'Donnell Musk v. Altman week 1: Elon Musk says he was duped, warns AI could kill us all, and admits that xAI distills OpenAI's models Michelle Kim Launching next week on T-Mobile's network, the cell plan takes a nuclear approach to online safety. Musk v. Altman week 1: Elon Musk says he was duped, warns AI could kill us all, and admits that xAI distills OpenAI's models Musk kept his cool, and OpenAI's lawyer bulldozed him with piercing questions about his motivations for suing the company. China has approved the world's first invasive brain-computer chip--here's what's next The country wants to become a global leader in brain implants.
EgoBridge: Domain Adaptation for Generalizable Imitation from Egocentric Human Data
Egocentric human experience data presents a vast resource for scaling up endto-end imitation learning for robotic manipulation. However, significant domain gaps in visual appearance, sensor modalities, and kinematics between human and robot impede knowledge transfer. This paper presents EgoBridge, a unified cotraining framework that explicitly aligns the policy latent spaces between human and robot data using domain adaptation. Through a measure of discrepancy on the joint policy latent features and actions based on Optimal Transport (OT), we learn observation representations that not only align between the human and robot domain but also preserve the action-relevant information critical for policy learning. EgoBridge achieves a significant absolute policy success rate improvement by 44% over human-augmented cross-embodiment baselines in three real-world single-arm and bimanual manipulation tasks. EgoBridge also generalizes to new objects, scenes, and tasks seen only in human data, where baselines fail entirely. Videos and additional information can be found at https://ego-bridge.github.io/
Towards Multi Turn Referential Grounded Video Chat with Large Language Models
Achieving fine-grained spatio-temporal understanding in videos remains a major challenge for current Video Large Multimodal Models (Video LMMs). Addressing this challenge requires mastering two core capabilities: video referring understanding, which captures the semantics of video regions, and video grounding, which segments object regions based on natural language descriptions. However, most existing approaches tackle these tasks in isolation, limiting progress toward unified, referentially grounded video interaction. We identify a key bottleneck in the lack of high-quality, unified video instruction data and a comprehensive benchmark for evaluating referentially grounded video chat. To address these challenges, we contribute in three core aspects: dataset, model, and benchmark. First, we introduce SAMA-239K, a large-scale dataset comprising 15K videos specifically curated to enable joint learning of video referring understanding, grounding, and multi-turn video chat. Second, we propose the SAMA model, which incorporates a versatile spatio-temporal context aggregator and a Segment Anything Model to jointly enhance fine-grained video comprehension and precise grounding capabilities. Finally, we establish SAMA-Bench, a meticulously designed benchmark consisting of 5,067 questions from 522 videos, to comprehensively evaluate the integrated capabilities of Video LMMs in multi-turn, spatio-temporal referring understanding and grounded dialogue. Extensive experiments and benchmarking results show that SAMA not only achieves strong performance on SAMA-Bench but also sets a new state-of-the-art on general grounding benchmarks, while maintaining highly competitive performance on standard visual understanding benchmarks.
Long-Tailed Recognition via Information-Preservable Two-Stage Learning
The imbalance (or long-tail) is the nature of many real-world data distributions, which often induces the undesirable bias of deep classification models toward frequent classes, resulting in poor performance for tail classes. In this paper, we propose a novel two-stage learning approach to mitigate such a majority-biased tendency while preserving valuable information within datasets. Specifically, the first stage proposes a new representation learning technique from the information theory perspective. This approach is theoretically equivalent to minimizing intraclass distance, yielding an effective and well-separated feature space. The second stage develops a novel sampling strategy that selects mathematically informative instances, able to rectify majority-biased decision boundaries without compromising a model's overall performance. As a result, our approach achieves state-of-the-art performance across various long-tailed benchmark datasets.
Leak Exposes Members of Peter Thiel's Secretive 'Dialog' Society
More than 200 of the world's elites registered for a retreat whose agenda runs from panels on cult-building and sex to prepping for World War III. An associated app offers matchmaking. A trove of internal records from a secret society for powerful figures in US politics, finance, and tech was left exposed online, WIRED has confirmed, naming participants in its events and revealing sensitive personal details they were assured would stay private. The group, called Dialog, is a private, invitation-only organization cofounded in 2006 by the billionaire tech investor Peter Thiel . It convenes US officials, foreign government figures, and Silicon Valley executives at off-the-record annual retreats. Dialog has spent two decades declining to disclose its members.
Hi3DEval: Advancing 3DGeneration Evaluation with Hierarchical Validity
Despite rapid advances in 3D content generation, quality assessment for the generated 3D assets remains challenging. Existing methods mainly rely on image-based metrics and operate solely at the object level, limiting their ability to capture spatial coherence, material authenticity, and high-fidelity local details. 1) To address these challenges, we introduce Hi3DEval, a hierarchical evaluation framework tailored for 3D generative content. It combines both object-level and part-level evaluation, enabling holistic assessments across multiple dimensions as well as fine-grained quality analysis. Additionally, we extend texture evaluation beyond aesthetic appearance by explicitly assessing material realism, focusing on attributes such as albedo, saturation, and metallicness.
DGS-LRM: Real-Time Deformable 3DGaussian Reconstruction From Monocular Videos
We introduce the Deformable Gaussian Splats Large Reconstruction Model (DGSLRM), the first feed-forward method predicting deformable 3DGaussian splats from a monocular posed video of any dynamic scene. Feed-forward scene reconstruction has gained significant attention for its ability to rapidly create digital replicas of real-world environments. However, most existing models are limited to static scenes and fail to reconstruct the motion of moving objects. Developing a feed-forward model for dynamic scene reconstruction poses significant challenges, including the scarcity of training data and the need for appropriate 3D representations and training paradigms.
Improved Regret and Contextual Linear Extension for Pandora's Box and Prophet Inequality
We study the Pandora's Box problem in an online learning setting with semi-bandit feedback. In each round, the learner sequentially pays to open up to nboxes with unknown reward distributions, observes rewards upon opening, and decides when to stop. The utility of the learner is the maximum observed reward minus the cumulative cost of opened boxes, and the goal is to minimize regret defined as the gap between the cumulative expected utility and that of the optimal policy. We propose a new algorithm that achieves eO( nT)regret after T rounds, which improves the eO(n T) bound of Agarwal et al. [2024] and matches the known lower bound up to logarithmic factors. To better capture real-life applications, we then extend our results to a natural but challenging contextual linear setting, where each box's expected reward is linear in some known but time-varying ddimensional context and the noise distribution is fixed over time. We design an algorithm that learns both the linear function and the noise distributions, achieving eO(nd T) regret. Finally, we show that our techniques also apply to the online Prophet Inequality problem, where the learner must decide immediately whether or not to accept a revealed reward. In both non-contextual and contextual settings, our approach achieves similar improvements and regret bounds.
Deep Tree Tensor Networks
Originating in quantum physics, tensor networks (TNs) have been widely adopted as exponential machines and parametric decomposers for recognition tasks. Typical TN models, such as Matrix Product States (MPS), have not yet achieved successful application in natural image recognition. When employed, they primarily serve to compress parameters within pre-existing networks, thereby losing their distinctive capability to capture exponential-order feature interactions. This paper introduces a novel architecture named Deep Tree Tensor Network (DTTN), which captures 2L-order multiplicative interactions across features through multilinear operations, while essentially unfolding into a tree-like TN topology with the parameter-sharing property. DTTN is stacked with multiple antisymmetric interaction modules (AIMs), and this design facilitates efficient implementation. Furthermore, our theoretical analysis demonstrates the equivalence between quantum-inspired TN models and polynomial/multilinear networks under specific conditions.
ShortListing Model: AStreamlined Simplex Diffusion for Discrete Variable Generation
Generative modeling of discrete variables is challenging yet crucial for applications in natural language processing and biological sequence design. We introduce the Shortlisting Model (SLM), a novel simplex-based diffusion model inspired by progressive candidate pruning. SLM operates on simplex centroids, reducing generation complexity and enhancing scalability. Additionally, SLM incorporates a flexible implementation of classifier-free guidance, enhancing unconditional generation performance. Extensive experiments on DNA promoter and enhancer design, protein design, character-level and large-vocabulary language modeling demonstrate the competitive performance and strong potential of SLM.