Transportation


There's a Very Simple Pattern to Elon Musk's Broken Promises

WIRED

My predictions about achieving full self-driving have been optimistic in the past,


Reinforcement Learning for Solving the Vehicle Routing Problem

Neural Information Processing Systems

We present an end-to-end framework for solving the Vehicle Routing Problem (VRP) using reinforcement learning. In this approach, we train a single policy model that finds near-optimal solutions for a broad range of problem instances of similar size, only by observing the reward signals and following feasibility rules. We consider a parameterized stochastic policy, and by applying a policy gradient algorithm to optimize its parameters, the trained model produces the solution as a sequence of consecutive actions in real time, without the need to re-train for every new problem instance. On capacitated VRP, our approach outperforms classical heuristics and Google's OR-Tools on medium-sized instances in solution quality with comparable computation time (after training). We demonstrate how our approach can handle problems with split delivery and explore the effect of such deliveries on the solution quality. Our proposed framework can be applied to other variants of the VRP such as the stochastic VRP, and has the potential to be applied more generally to combinatorial optimization problems.


The Finale of "The Rehearsal" Is Outlandish and Sublime

The New Yorker

Nathan Fielder, like Andy Kaufman before him, makes performance-art comedy that does not only poke fun at the world but experimentally perturbs it, and he plies this trade in the buffer zone between reality and artifice. He presents himself as something of a Kaspar Hauser figure for the age of artificial intelligence, a foundling raised not by wolves but by an advanced and affectless race of extraterrestrial anthropologists. His object is to isolate and mimic the rudiments of human sociability. Fielder's intuition is that many putatively normal people share his own bewildered dread of everyday interactions, which are at once governed by established, if opaque, social norms and subject to unnerving unpredictability. Children learn to tame uncertainty through repetition: they replay interactions in an effort to interpret and control the varied challenges of their environment.



Dataset Distillation using Neural Feature Regression

Neural Information Processing Systems

Dataset distillation aims to learn a small synthetic dataset that preserves most of the information from the original dataset. Dataset distillation can be formulated as a bi-level meta-learning problem where the outer loop optimizes the metadataset and the inner loop trains a model on the distilled data. Meta-gradient computation is one of the key challenges in this formulation, as differentiating through the inner loop learning procedure introduces significant computation and memory costs. In this paper, we address these challenges using neural Feature Regression with Pooling (FRePo), achieving the state-of-the-art performance with an order of magnitude less memory requirement and two orders of magnitude faster training than previous methods. The proposed algorithm is analogous to truncated backpropagation through time with a pool of models to alleviate various types of overfitting in dataset distillation. FRePo significantly outperforms the previous methods on CIFAR100, Tiny ImageNet, and ImageNet-1K. Furthermore, we show that high-quality distilled data can greatly improve various downstream applications, such as continual learning and membership inference defense. Please check out our webpage at https://sites.google.com/view/frepo.


CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts

Neural Information Processing Systems

Recent advancements in Multimodal Large Language Models (LLMs) have focused primarily on scaling by increasing text-image pair data and enhancing LLMs to improve performance on multimodal tasks. However, these scaling approaches are computationally expensive and overlook the significance of efficiently improving model capabilities from the vision side. Inspired by the successful applications of Mixture-of-Experts (MoE) in LLMs, which improves model scalability during training while keeping inference costs similar to those of smaller models, we propose CuMo, which incorporates Co-upcycled Top-K sparsely-gated Mixtureof-experts blocks into both the vision encoder and the MLP connector, thereby enhancing the multimodal LLMs with neglectable additional activated parameters during inference. CuMo first pre-trains the MLP blocks and then initializes each expert in the MoE block from the pre-trained MLP block during the visual instruction tuning stage, with auxiliary losses to ensure a balanced loading of experts. CuMo outperforms state-of-the-art multimodal LLMs across various VQA and visual-instruction-following benchmarks within each model size group, all while training exclusively on open-sourced datasets.


SARDet-100K: Towards Open-Source Benchmark and ToolKit for Large-Scale SAR Object Detection

Neural Information Processing Systems

Synthetic Aperture Radar (SAR) object detection has gained significant attention recently due to its irreplaceable all-weather imaging capabilities. However, this research field suffers from both limited public datasets (mostly comprising <2K images with only mono-category objects) and inaccessible source code. To tackle these challenges, we establish a new benchmark dataset and an open-source method for large-scale SAR object detection. Our dataset, SARDet-100K, is a result of intense surveying, collecting, and standardizing 10 existing SAR detection datasets, providing a large-scale and diverse dataset for research purposes. To the best of our knowledge, SARDet-100K is the first COCO-level large-scale multi-class SAR object detection dataset ever created. With this high-quality dataset, we conducted comprehensive experiments and uncovered a crucial challenge in SAR object detection: the substantial disparities between the pretraining on RGB datasets and finetuning on SAR datasets in terms of both data domain and model structure. To bridge these gaps, we propose a novel Multi-Stage with Filter Augmentation (MSFA) pretraining framework that tackles the problems from the perspective of data input, domain transition, and model migration. The proposed MSFA method significantly enhances the performance of SAR object detection models while demonstrating exellent generalizability and flexibility across diverse models. This work aims to pave the way for further advancements in SAR object detection.


Pedestrian Trajectory Prediction with Missing Data: Datasets, Imputation, and Benchmarking Pravendra Singh

Neural Information Processing Systems

Pedestrian trajectory prediction is crucial for several applications such as robotics and self-driving vehicles. Significant progress has been made in the past decade thanks to the availability of pedestrian trajectory datasets, which enable trajectory prediction methods to learn from pedestrians' past movements and predict future trajectories. However, these datasets and methods typically assume that the observed trajectory sequence is complete, ignoring real-world issues such as sensor failure, occlusion, and limited fields of view that can result in missing values in observed trajectories. To address this challenge, we present TrajImpute, a pedestrian trajectory prediction dataset that simulates missing coordinates in the observed trajectory, enhancing real-world applicability. TrajImpute maintains a uniform distribution of missing data within the observed trajectories. In this work, we comprehensively examine several imputation methods to reconstruct the missing coordinates and benchmark them for imputing pedestrian trajectories. Furthermore, we provide a thorough analysis of recent trajectory prediction methods and evaluate the performance of these models on the imputed trajectories. Our experimental evaluation of the imputation and trajectory prediction methods offers several valuable insights. Our dataset provides a foundational resource for future research on imputation-aware pedestrian trajectory prediction, potentially accelerating the deployment of these methods in real-world applications.


A Appendix

Neural Information Processing Systems

A.1 Creation of the Multimodal Web Document Dataset A.1.1 Collecting of a Large Number of HTML Files Our data collection process begins by considering the 25 most recent Common Crawl It contains webpages spanning from February 2020 to January/February 2023. This process yields a total of 41.2 billion documents. Selection of English content To identify non-English content, we apply the FastText classifier (Joulin et al., 2017) to the extracted text, e ectively filtering out 63.6% of the documents. Early text deduplication Often, a set of URLs is crawled repeatedly across di erent Common Crawl snapshots. However, the content of these websites may vary as web administrators make changes over time. Hence, at this stage, we refrain from deduplicating documents based on their URLs. Instead, we perform MinHash (Broder, 1997) deduplication with 16 hashes calculated over 5-grams. To further refine the data, we eliminate documents containing substantial proportions of repeated paragraphs and n-grams, employing the methodology described in MassiveText (Rae et al., 2022).


Finding Safe Zones of Markov Decision Processes Policies Lee Cohen Yishay Mansour Michal Moshkovitz TTI-Chicago Tel-Aviv University Bosch Center for AI Google Research

Neural Information Processing Systems

One notable exception to that is Safe RL which addresses the concept of safety. Traditional Safe RL focuses on finding the best policy that meets safety requirements, typically by either adjusting the objective to include the safety requirements and then optimizing for it, or incorporating additional safety constraints to the exploration. In both of these cases, the safety requirements should be pre-specified. Anomaly Detection is the problem of identifying patterns in data that are unexpected, i.e., anomalies (see, e.g., Chandola et al. (2009) for survey).