Goto

Collaborating Authors

Bandits with Preference Feedback: A Stackelberg Game Perspective Barna Pásztor,1,2 ETH Zurich

Neural Information Processing Systems

Bandits with preference feedback present a powerful tool for optimizing unknown target functions when only pairwise comparisons are allowed instead of direct value queries. This model allows for incorporating human feedback into online inference and optimization and has been employed in systems for fine-tuning large language models. The problem is well understood in simplified settings with linear target functions or over finite small domains that limit practical interest. Taking the next step, we consider infinite domains and nonlinear (kernelized) rewards. In this setting, selecting a pair of actions is quite challenging and requires balancing exploration and exploitation at two levels: within the pair, and along the iterations of the algorithm.


Identifying Selections for Unsupervised Subtask Discovery

Neural Information Processing Systems

When solving long-horizon tasks, it is intriguing to decompose the high-level task into subtasks. Decomposing experiences into reusable subtasks can improve data efficiency, accelerate policy generalization, and in general provide promising solutions to multi-task reinforcement learning and imitation learning problems. However, the concept of subtasks is not sufficiently understood and modeled yet, and existing works often overlook the true structure of the data generation process: subtasks are the results of a selection mechanism on actions, rather than possible underlying confounders or intermediates. Specifically, we provide a theory to identify, and experiments to verify the existence of selection variables in such data. These selections serve as subgoals that indicate subtasks and guide policy. In light of this idea, we develop a sequential non-negative matrix factorization (seq-NMF) method to learn these subgoals and extract meaningful behavior patterns as subtasks. Our empirical results on a challenging Kitchen environment demonstrate that the learned subtasks effectively enhance the generalization to new tasks in multi-task imitation learning scenarios. The codes are provided at this link.


Kangaroo: Lossless Self-Speculative Decoding for Accelerating LLMs via Double Early Exiting

Neural Information Processing Systems

Speculative decoding has demonstrated its effectiveness in accelerating the inference of large language models (LLMs) while maintaining an identical sampling distribution. However, the conventional approach of training separate draft model to achieve a satisfactory token acceptance rate can be costly and impractical. In this paper, we propose a novel self-speculative decoding framework Kangaroo with double early exiting strategy, which leverages the shallow sub-network and the LM Head of the well-trained target LLM to construct a self-drafting model. Then, the self-verification stage only requires computing the remaining layers over the early-exited hidden states in parallel. To bridge the representation gap between the sub-network and the full model, we train a lightweight and efficient adapter module on top of the sub-network.


Get effortlessly clean pools this summer with Aiper's Scuba X1 Pro Max

PCWorld

The weather is warming up and we're all dreaming about days lounging by the pool. For some, however, the thought of having to keep cleaning out the pool turns this dream into a nightmare. Enter the Aiper Scuba X1 Pro Max, a robotic pool cleaner that can take this chore off your hands without even breaking a sweat. The new Scuba X1 Pro Max is the most advanced and user-friendly robotic pool cleaner you can get your hands on, and it will ensure your pool water remains crystal clear throughout the swimming season. The Aiper robotic cleaner can navigate your pool on its own, map out the area, and deploy 8,500 GPH suction to ensure everything is clean.


Interpretable Generalized Additive Models for Datasets with Missing Values Jon Donnelly * Department of Computer Science Department of Computer Science Duke University

Neural Information Processing Systems

Many important datasets contain samples that are missing one or more feature values. Maintaining the interpretability of machine learning models in the presence of such missing data is challenging. Singly or multiply imputing missing values complicates the model's mapping from features to labels. On the other hand, reasoning on indicator variables that represent missingness introduces a potentially large number of additional terms, sacrificing sparsity.


The sex robot you control REMOTELY: Creepy 1,400 doll can be managed via an app - with options to adjust squeezing, thrusting, and moaning

Daily Mail - Science & tech

From Austin Powers to Subservience, sex robots have been staple features of blockbusters for decades. But the unusual devices are slowly but surely becoming more mainstream, with human-robot sex even predicted to become more common than human-human by 2050. Now, a Chinese company has unveiled its latest model - and it's one of the strangest we've seen yet. Ridmii, a company based in Dongguan City, has created a range of sex robots that can be controlled remotely. The doll syncs up to an app via Bluetooth, where the person in control can manage everything from squeezing to thrusting, and even moaning.


GarmentLab: A Unified Simulation and Benchmark for Garment Manipulation Ruihai Wu1

Neural Information Processing Systems

Manipulating garments and fabrics has long been a critical endeavor in the development of home-assistant robots. However, due to complex dynamics and topological structures, garment manipulations pose significant challenges. Recent successes in reinforcement learning and vision-based methods offer promising avenues for learning garment manipulation. Nevertheless, these approaches are severely constrained by current benchmarks, which offer limited diversity of tasks and unrealistic simulation behavior. Therefore, we present GarmentLab, a content-rich benchmark and realistic simulation designed for deformable object and garment manipulation.


A Versatile Diffusion Transformer with Mixture of Noise Levels for Audiovisual Generation Gwanghyun Kim 1,:, Alonso Martinez 2 Brendan Jou

Neural Information Processing Systems

Training diffusion models for audiovisual sequences allows for a range of generation tasks by learning conditional distributions of various input-output combinations of the two modalities. Nevertheless, this strategy often requires training a separate model for each task which is expensive. Here, we propose a novel training approach to effectively learn arbitrary conditional distributions in the audiovisual space. Our key contribution lies in how we parameterize the diffusion timestep in the forward diffusion process. Instead of the standard fixed diffusion timestep, we propose applying variable diffusion timesteps across the temporal dimension and across modalities of the inputs. This formulation offers flexibility to introduce variable noise levels for various portions of the input, hence the term mixture of noise levels. We propose a transformer-based audiovisual latent diffusion model and show that it can be trained in a task-agnostic fashion using our approach to enable a variety of audiovisual generation tasks at inference time. Experiments demonstrate the versatility of our method in tackling cross-modal and multimodal interpolation tasks in the audiovisual space. Notably, our proposed approach surpasses baselines in generating temporally and perceptually consistent samples conditioned on the input.


HyperPrism: An Adaptive Non-linear Aggregation Framework for Distributed Machine Learning over non-IID Data and Time-varying Communication Links

Neural Information Processing Systems

While Distributed Machine Learning (DML) has been widely used to achieve decent performance, it is still challenging to take full advantage of data and devices distributed at multiple vantage points to adapt and learn; this is because the current linear aggregation paradigm cannot solve inter-model divergence caused by (1) heterogeneous learning data at different devices (i.e., non-IID data) and (2) in the case of time-varying communication links, the limited ability for devices to reconcile model divergence. In this paper, we present a non-linear class aggregation framework HyperPrism that leverages Kolmogorov Means to conduct distributed mirror descent with the averaging occurring within the mirror descent dual space; HyperPrism selects the degree for a Weighted Power Mean (WPM), a subset of the Kolmogorov Means, each round. Moreover, HyperPrism can adaptively choose different mapping for different layers of the local model with a dedicated hypernetwork per device, achieving automatic optimization of DML in high divergence settings. We perform rigorous analysis and experimental evaluations to demonstrate the effectiveness of adaptive, mirror-mapping DML. In particular, we extend the generalizability of existing related works and position them as special cases within HyperPrism. For practitioners, the strength of HyperPrism is in making feasible the possibility of distributed asynchronous training with minimal communication. Our experimental results show HyperPrism can improve the convergence speed up to 98.63% and scale well to more devices compared with the state-of-the-art, all with little additional computation overhead compared to traditional linear aggregation.


Road Network Representation Learning with the Third Law of Geography Yile Chen 1

Neural Information Processing Systems

Road network representation learning aims to learn compressed and effective vectorized representations for road segments that are applicable to numerous tasks. In this paper, we identify the limitations of existing methods, particularly their overemphasis on the distance effect as outlined in the First Law of Geography. In response, we propose to endow road network representation with the principles of the recent Third Law of Geography. To this end, we propose a novel graph contrastive learning framework that employs geographic configuration-aware graph augmentation and spectral negative sampling, ensuring that road segments with similar geographic configurations yield similar representations, and vice versa, aligning with the principles stated in the Third Law. The framework further fuses the Third Law with the First Law through a dual contrastive learning objective to effectively balance the implications of both laws. We evaluate our framework on two real-world datasets across three downstream tasks. The results show that the integration of the Third Law significantly improves the performance of road segment representations in downstream tasks. Our code is available at https://github.com/Haicang/Garner.