Goto

Collaborating Authors

 opération


Building tech in the world's secret R&D hub

MIT Technology Review

Zurich has created a technology ecosystem nearing the density of Silicon Valley. Few places outside Silicon Valley can claim R&D hubs from all of these companies. Fewer still are concentrated in a city of just over 400,000 people--roughly half the size of San Francisco. Over the past two decades, however, many of the world's most influential technology companies have established R&D operations in and around Zurich, Switzerland. What began with Google's decision to build its largest R&D hub outside the United States has evolved into one of the world's most concentrated centers for AI research, talent, and commercialization, in certain areas at a higher density than Silicon Valley. The question is why so many technology leaders keep choosing the same place to build and scale.


Truckloads of Tesla Batteries Keep Getting Stolen Before They Even Leave the Factory

WIRED

Nine major suspected cargo thefts happened at Tesla's Nevada battery factory in January alone, according to sheriff's records obtained by WIRED. Trailers containing millions of dollars worth of Tesla car and home batteries have allegedly been stolen straight from loading docks at the company's Nevada facilities at least 11 times since last December, according to sheriff's records obtained by WIRED. "It's an epidemic right now," says Storey County Sheriff's Detective Sam Hatley, who has been investigating the Tesla cases. Three men suspected of carrying out one of the heists were arrested in January and charged with felony possession of stolen property. But the broader spate of cargo thefts plaguing Elon Musk's car company are still under investigation and have not been previously reported.


HumanoidGen: Data Generation for Bimanual Dexterous Manipulation via LLMReasoning

Neural Information Processing Systems

For robotic manipulation, existing robotics datasets and simulation benchmarks predominantly cater to robot-arm platforms. However, for humanoid robots equipped with dual arms and dexterous hands, simulation tasks and high-quality demonstrations are notably lacking. Bimanual dexterous manipulation is inherently more complex, as it requires coordinated arm movements and hand operations, making autonomous data collection challenging. This paper presents HumanoidGen, an automated task creation and demonstration collection framework that leverages atomic dexterous operations and LLM reasoning to generate relational constraints. Specifically, we provide spatial annotations for both assets and dexterous hands based on the atomic operations, and perform an LLM planner to generate a chain of actionable spatial constraints for arm movements based on object affordances and scenes. To further improve planning ability, we employ a variant of Monte Carlo tree search to enhance LLM reasoning for long-horizon tasks and insufficient annotation. In experiments, we create a novel benchmark with augmented scenarios to evaluate the quality of the collected data. The results show that the performance of the 2D and 3D diffusion policies can scale with the generated dataset.


Training-Free Test-Time Adaptation via Shape and Style Guidance for Vision-Language Models

Neural Information Processing Systems

Test-time adaptation with pre-trained vision-language models shows impressive zero-shot classification abilities, and training-free methods further improve the performance without any optimization burden. However, existing training-free test-time adaptation methods typically rely on entropy criteria to select the visual features and update the visual caches, while ignoring the generalizable factors, such as shape-sensitive and style-insensitive factors. In this paper, we propose a novel shape and style guidance method (SSG) for training-free test-time adaptation in vision-language models, aiming to highlight the shape-sensitive (SHS) and styleinsensitive (STI) factors. Specifically, SSG perturbs the raw test image with shape and style corruption operations, and measures the prediction difference between the raw and corrupted ones as perturbed prediction difference (PPD). Based on the PPD measurement, SSG reweights the high-confidence visual features and corresponding predictions, aiming to highlight the effect of SHS and STI factors during the test-time procedure. Furthermore, SSG takes both PPD and entropy into consideration to update the visual cache, aiming to maintain the stored sample with high entropy and generalizable factors. Extensive experimental results on out-of-distribution and cross-domain benchmark datasets demonstrate that our proposed SSG consistently outperforms previous state-of-the-art methods while also exhibiting promising computational efficiency.


GSO: Challenging Software Optimization Tasks for Evaluating SWE-Agents

Neural Information Processing Systems

Developing high-performance software is a complex task that requires specialized expertise. We introduce GSO, a benchmark for evaluating language models' capabilities in developing high-performance software. We develop an automated pipeline that generates and executes performance tests to analyze repository commit histories to identify 102challenging optimization tasks across 10codebases, spanning diverse domains and programming languages. An agent is provided with a codebase and performance test as a precise specification, and tasked to improve the runtime efficiency, which is measured against the expert developer optimization. Our quantitative evaluation reveals that leading SWE-Agents struggle significantly, achieving less than 5% success rate, with limited improvements even with inference-time scaling. Our qualitative analysis identifies key failure modes, including difficulties with low-level languages, practicing lazy optimization strategies, and challenges in accurately localizing bottlenecks. We release the code and artifacts of our benchmark along with agent trajectories to enable future research.


Adaptive Frontier Exploration on Graphs with Applications to Network-Based Disease Testing

Neural Information Processing Systems

We study a sequential decision-making problem on a n-node graph G where each node has an unknown label from a finite set Ω, drawn from a joint distribution P that is Markov with respect to G. At each step, selecting a node reveals its label and yields a label-dependent reward. The goal is to adaptively choose nodes to maximize expected accumulated discounted rewards. We impose a frontier exploration constraint, where actions are limited to neighbors of previously selected nodes, reflecting practical constraints in settings such as contact tracing and robotic exploration. We design a Gittins index-based policy that applies to general graphs and is provably optimal when G is a forest.


Reinforcement learning for one-shot DAG scheduling with comparability identification and dense reward

Neural Information Processing Systems

In recent years, many studies proposed to generate solutions for Directed Acyclic Graph (DAG) scheduling problem in one shot by combining reinforcement learning and list scheduling heuristic. However, these existing methods suffer from biased estimation of sampling probabilities and inefficient guidance in training, due to redundant comparisons among node priorities and the sparse reward challenge. To address these issues, we analyze of the limitations of these existing methods, and propose a novel one-shot DAG scheduling method with comparability identification and dense reward signal, based on the policy gradient framework. In our method, a comparable antichain identification mechanism is proposed to eliminate the problem of redundant nodewise priority comparison. We also propose a dense reward signal for node level decision-making optimization in training, effectively addressing the sparse reward challenge. The experimental results show that the proposed method can yield superior results of scheduling objectives compared to other learning-based DAG scheduling methods.


Reinforcing Spatial Reasoning in Vision-Language Models with Interwoven Thinking and Visual Drawing

Neural Information Processing Systems

As textual reasoning with large language models (LLMs) has advanced significantly, there has been growing interest in enhancing the multimodal reasoning capabilities of large vision-language models (LVLMs). However, existing methods primarily approach multimodal reasoning in a straightforward, text-centric manner, where both reasoning and answer derivation are conducted purely through text, with the only difference being the presence of multimodal input. As a result, these methods often encounter fundamental limitations in spatial reasoning tasks that demand precise geometric understanding and continuous spatial tracking--capabilities that humans achieve through mental visualization and manipulation. To address the limitations, we propose drawing to reason in space, a novel paradigm that enables LVLMs to reason through elementary drawing operations in the visual space. By equipping models with basic drawing operations, including annotating bounding boxes and drawing auxiliary lines, we empower them to express and analyze spatial relationships through direct visual manipulation, meanwhile avoiding the performance ceiling imposed by specialized perception tools in previous tool-integrated reasoning approaches. To cultivate this capability, we develop a three-stage training framework: cold-start training with synthetic data to establish basic drawing abilities, reflective rejection sampling to enhance self-reflection behaviors, and reinforcement learning to directly optimize for target rewards. Extensive experiments demonstrate that our model, named VILASR, consistently outperforms existing methods across diverse spatial reasoning benchmarks, involving maze navigation, static spatial reasoning, video-based reasoning, and multi-view-based reasoning tasks, with an average improvement of 18.4%. Ablation studies reveal the critical role of each training stage, where reflective rejection sampling strengthens the model's self-correction capabilities, and reinforcement learning effectively unlocks its reasoning potential.


Enhancing Graph Classification Robustness with Singular Pooling

Neural Information Processing Systems

Graph Neural Networks (GNNs) have achieved strong performance across a range of graph representation learning tasks, yet their adversarial robustness in graph classification remains underexplored compared to node classification. While most existing defenses focus on the message-passing component, this work investigates the overlooked role of pooling operations in shaping robustness. We present a theoretical analysis of standard flat pooling methods (sum, average and max), deriving upper bounds on their adversarial risk and identifying their vulnerabilities under different attack scenarios and graph structures. Motivated by these insights, we propose Robust Singular Pooling (RS-Pool), a novel pooling strategy that leverages the dominant singular vector of the node embedding matrix to construct a robust graph-level representation. We theoretically investigate the robustness of RS-Pool and interpret the resulting bound leading to improved understanding of our proposed pooling operator. While our analysis centers on Graph Convolutional Networks (GCNs), RS-Pool is model-agnostic and can be implemented efficiently via power iteration. Empirical results on real-world benchmarks show that RS-Pool provides better robustness than the considered pooling methods when subject to state-of-the-art adversarial attacks while maintaining competitive clean accuracy. Our code is publicly available at: https://github.com/king/rs-pool.


Just One Layer Norm Guarantees Stable Extrapolation

Neural Information Processing Systems

In spite of their prevalence, the behaviour of Neural Networks when extrapolating far from the training distribution remains poorly understood, with existing results limited to specific cases. In this work, we prove general results--the first of their kind--by applying Neural Tangent Kernel (NTK) theory to analyse infinitelywide neural networks trained until convergence and prove that the inclusion of just one Layer Norm (LN) fundamentally alters the induced NTK, transforming it into a bounded-variance kernel. As a result, the output of an infinitely wide network with at least one LN remains bounded, even on inputs far from the training data. In contrast, we show that a broad class of networks without LN can produce pathologically large outputs for certain inputs. We support these theoretical findings with empirical experiments on finite-width networks, demonstrating that while standard NNs often exhibit uncontrolled growth outside the training domain, a single LN layer effectively mitigates this instability. Finally, we explore real-world implications of this extrapolatory stability, including applications to predicting residue sizes in proteins larger than those seen during training and estimating age from facial images of underrepresented ethnicities absent from the training set.