Transportation
Driverless taxis are beginning to react like humans on San Francisco streets… and the results could be terrifying
Driverless cars are beginning to display human-like behaviors like impatience on the roads, in a sign of increased intelligence in the robotaxis. The chilling development was identified by University of San Francisco engineering Professor William Riggs, who has been studying Waymo cars since their inception. On a journey with a reporter from the San Francisco Chronicle, the pair noticed the Waymo they were traveling in crept to a rolling start at a pedestrian crossing before the person had reached the other footpath. The subtle movement was reminiscent of the way humans act behind the wheel, but a strange occurrence for the robotic Waymo, which prides itself on being safer than a driver because it errs on the side of caution and leaves no room for human error. The action of letting the foot gently off the break moments before they should to allow the car to begin creeping forward at a rolling pace displays a sense of impatience - a human reaction not previously seen in the robotic cars.
Neuro-Vision to Language: Enhancing Brain Recording-based Visual Reconstruction and Language Interaction
Decoding non-invasive brain recordings is pivotal for advancing our understanding of human cognition but faces challenges due to individual differences and complex neural signal representations. Traditional methods often require customized models and extensive trials, lacking interpretability in visual reconstruction tasks.
Ukraine, Russia meet for peace talks in Istanbul after explosive weekend
Former U.S. ambassador to Ukraine John Herbst explains the impact of the drone strike on Russian air bases. Russian and Ukrainian delegations have begun talks in Istanbul, Turkey, on Monday, less than 24 hours after a massive Ukrainian drone attack struck Russian airfields. The two delegations entered Ciragan Palace in Istanbul alongside a group of senior Turkish officials. It is the second round of peace talks to take place in the three years since Russia invaded Ukraine. Images from the event show many of the Ukrainian delegation wearing military uniforms, while the Russian group exclusively wore suits.
Factorized Diffusion Architectures for Unsupervised Image Generation and Segmentation
We develop a neural network architecture which, trained in an unsupervised manner as a denoising diffusion model, simultaneously learns to both generate and segment images. Learning is driven entirely by the denoising diffusion objective, without any annotation or prior knowledge about regions during training. A computational bottleneck, built into the neural architecture, encourages the denoising network to partition an input into regions, denoise them in parallel, and combine the results. Our trained model generates both synthetic images and, by simple examination of its internal predicted partitions, semantic segmentations of those images. Without fine-tuning, we directly apply our unsupervised model to the downstream task of segmenting real images via noising and subsequently denoising them. Experiments demonstrate that our model achieves accurate unsupervised image segmentation and high-quality synthetic image generation across multiple datasets.
No " Zero-Shot " Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance
Web-crawled datasets underlie the impressive "zero-shot" performance of multimodal models, such as CLIP for classification and Stable-Diffusion for image generation. However, it is unclear how meaningful the notion of "zero-shot" generalization is for such models because the extent to which their pretraining datasets encompass downstream concepts used in "zero-shot" evaluation is unknown. In this work, we ask: How is the performance of multimodal models on downstream concepts influenced by the frequency of these concepts in their pretraining datasets? We comprehensively investigate this question across 34 models and 5 standard pretraining datasets, generating over 300GB of data artifacts. We consistently find that, far from exhibiting "zero-shot" generalization, multimodal models require exponentially more data to achieve linear improvements in downstream "zero-shot" performance, following a sample inefficient log-linear scaling trend. This trend persists even when controlling for sample-level similarity between pretraining and evaluation datasets [81], and testing on purely synthetic data distributions [52]. Furthermore, upon benchmarking models on long-tailed data sampled based on our analysis, we demonstrate that multimodal models across the board perform poorly. We contribute this long-tail test dataset as the Let it Wag! benchmark to further research in this direction. Taken together, our study reveals an exponential need for training data which implies that the key to "zero-shot" generalization capabilities under large-scale training data and compute paradigms remains to be found.
Dynamic Service Fee Pricing under Strategic Behavior: Actions as Instruments and Phase Transition Rui Ai
We study a dynamic pricing problem for third-party platform service fees under strategic, far-sighted customers. In each time period, the platform sets a service fee based on historical data, observes the resulting transaction quantities, and collects revenue. The platform also monitors equilibrium prices influenced by both demand and supply. The objective is to maximize total revenue over a time horizon T. Our problem incorporates three practical challenges: (a) initially, the platform lacks knowledge of the demand side beforehand, necessitating a balance between exploring (learning the demand curve) and exploiting (maximizing revenue) simultaneously; (b) since only equilibrium prices and quantities are observable, traditional Ordinary Least Squares (OLS) estimators would be biased and inconsistent; (c) buyers are rational and strategic, seeking to maximize their consumer surplus and potentially misrepresenting their preferences. To address these challenges, we propose novel algorithmic solutions. Our approach involves: (i) a carefully designed active randomness injection to balance exploration and exploitation effectively; (ii) using non-i.i.d.
Increasing the Scope as You Learn: Adaptive Bayesian Optimization in Nested Subspaces
Recent advances have extended the scope of Bayesian optimization (BO) to expensive-to-evaluate black-box functions with dozens of dimensions, aspiring to unlock impactful applications, for example, in the life sciences, neural architecture search, and robotics. However, a closer examination reveals that the state-of-the-art methods for high-dimensional Bayesian optimization (HDBO) suffer from degrading performance as the number of dimensions increases or even risk failure if certain unverifiable assumptions are not met.
ReEvo: Large Language Models as Hyper-Heuristics with Reflective Evolution Haoran Ye
The omnipresence of NP-hard combinatorial optimization problems (COPs) compels domain experts to engage in trial-and-error heuristic design. The long-standing endeavor of design automation has gained new momentum with the rise of large language models (LLMs). This paper introduces Language Hyper-Heuristics (LHHs), an emerging variant of Hyper-Heuristics that leverages LLMs for heuristic generation, featuring minimal manual intervention and open-ended heuristic spaces. To empower LHHs, we present Reflective Evolution (ReEvo), a novel integration of evolutionary search for efficiently exploring the heuristic space, and LLM reflections to provide verbal gradients within the space.
Bag of Tricks: Benchmarking of Jailbreak Attacks on LLMs
Although Large Language Models (LLMs) have demonstrated significant capabilities in executing complex tasks in a zero-shot manner, they are susceptible to jailbreak attacks and can be manipulated to produce harmful outputs. Recently, a growing body of research has categorized jailbreak attacks into token-level and prompt-level attacks. However, previous work primarily overlooks the diverse key factors of jailbreak attacks, with most studies concentrating on LLM vulnerabilities and lacking exploration of defense-enhanced LLMs. To address these issues, we introduced JailTrickBench to evaluate the impact of various attack settings on LLM performance and provide a baseline for jailbreak attacks, encouraging the adoption of a standardized evaluation framework. Specifically, we evaluate the eight key factors of implementing jailbreak attacks on LLMs from both target-level and attack-level perspectives. We further conduct seven representative jailbreak attacks on six defense methods across two widely used datasets, encompassing approximately 354 experiments with about 55,000 GPU hours on A800-80G. Our experimental results highlight the need for standardized benchmarking to evaluate these attacks on defense-enhanced LLMs.
Sym-NCO: Leveraging Symmetricity for Neural Combinatorial Optimization
Deep reinforcement learning (DRL)-based combinatorial optimization (CO) methods (i.e., DRL-NCO) have shown significant merit over the conventional CO solvers as DRL-NCO is capable of learning CO solvers less relying on problem-specific expert domain knowledge (heuristic method) and supervised labeled data (supervised learning method). This paper presents a novel training scheme, Sym-NCO, which is a regularizer-based training scheme that leverages universal symmetricities in various CO problems and solutions. Leveraging symmetricities such as rotational and reflectional invariance can greatly improve the generalization capability of DRL-NCO because it allows the learned solver to exploit the commonly shared symmetricities in the same CO problem class. Our experimental results verify that our Sym-NCO greatly improves the performance of DRL-NCO methods in four CO tasks, including the traveling salesman problem (TSP), capacitated vehicle routing problem (CVRP), prize collecting TSP (PCTSP), and orienteering problem (OP), without utilizing problem-specific expert domain knowledge. Remarkably, Sym-NCO outperformed not only the existing DRL-NCO methods but also a competitive conventional solver, the iterative local search (ILS), in PCTSP at 240 faster speed.