Debiasing Conditional Stochastic Optimization

Neural Information Processing Systems

In this paper, we study the conditional stochastic optimization (CSO) problem which covers a variety of applications including portfolio selection, reinforcement learning, robust learning, causal inference, etc. The sample-averaged gradient of the CSO objective is biased due to its nested structure, and therefore requires a high sample complexity for convergence. We introduce a general stochastic extrapolation technique that effectively reduces the bias. We show that for nonconvex smooth objectives, combining this extrapolation with variance reduction techniques can achieve a significantly better sample complexity than the existing bounds. Additionally, we develop new algorithms for the finite-sum variant of the CSO problem that also significantly improve upon existing results.


Accelerating Motion Planning via Optimal Transport

Neural Information Processing Systems

Motion planning is still an open problem for many disciplines, e.g., robotics, autonomous driving, due to their need for high computational resources that hinder real-time, efficient decision-making. A class of methods striving to provide smooth solutions is gradient-based trajectory optimization. However, those methods usually suffer from bad local minima, while for many settings, they may be inapplicable due to the absence of easy-to-access gradients of the optimization objectives. In response to these issues, we introduce Motion Planning via Optimal Transport (MPOT)---a \textit{gradient-free} method that optimizes a batch of smooth trajectories over highly nonlinear costs, even for high-dimensional tasks, while imposing smoothness through a Gaussian Process dynamics prior via the planning-as-inference perspective. To facilitate batch trajectory optimization, we introduce an original zero-order and highly-parallelizable update rule----the Sinkhorn Step, which uses the regular polytope family for its search directions.


Contextual Stochastic Bilevel Optimization

Neural Information Processing Systems

We introduce contextual stochastic bilevel optimization (CSBO) -- a stochastic bilevel optimization framework with the lower-level problem minimizing an expectation conditioned on some contextual information and the upper-level decision variable. This framework extends classical stochastic bilevel optimization when the lower-level decision maker responds optimally not only to the decision of the upper-level decision maker but also to some side information and when there are multiple or even infinite many followers. It captures important applications such as meta-learning, personalized federated learning, end-to-end learning, and Wasserstein distributionally robust optimization with side information (WDRO-SI). Due to the presence of contextual information, existing single-loop methods for classical stochastic bilevel optimization are unable to converge. To overcome this challenge, we introduce an efficient double-loop gradient method based on the Multilevel Monte-Carlo (MLMC) technique and establish its sample and computational complexities. When specialized to stochastic nonconvex optimization, our method matches existing lower bounds.


MoCa: Measuring Human-Language Model Alignment on Causal and Moral Judgment Tasks

Neural Information Processing Systems

Human commonsense understanding of the physical and social world is organized around intuitive theories. These theories support making causal and moral judgments. When something bad happens, we naturally ask: who did what, and why? A rich literature in cognitive science has studied people's causal and moral intuitions. This work has revealed a number of factors that systematically influence people's judgments, such as the violation of norms and whether the harm is avoidable or inevitable.


Are entangled qubits following a quantum Moore's law?

New Scientist

The number of qubits that have been entangled in quantum computers has nearly doubled within the past year – the increase is happening so fast, it seems to be following a "quantum Moore's law". First proposed by Gordon Moore at Intel in 1965, Moore's law states that the power we can get out of a single traditional computer chip doubles at regular intervals; every year at first, then every two years as manufacturing encountered…


Fed-GraB: Federated Long-tailed Learning with Self-Adjusting Gradient Balancer

Neural Information Processing Systems

Data privacy and long-tailed distribution are the norms rather than the exception in many real-world tasks. This paper investigates a federated long-tailed learning (Fed-LT) task in which each client holds a locally heterogeneous dataset; if the datasets can be globally aggregated, they jointly exhibit a long-tailed distribution. Under such a setting, existing federated optimization and/or centralized long-tailed learning methods hardly apply due to challenges in (a) characterizing the global long-tailed distribution under privacy constraints and (b) adjusting the local learning strategy to cope with the head-tail imbalance. In response, we propose a method termed \texttt{Fed-GraB}, comprised of a Self-adjusting Gradient Balancer (SGB) module that re-weights clients' gradients in a closed-loop manner, based on the feedback of global long-tailed distribution evaluated by a Direct Prior Analyzer (DPA) module. Using \texttt{Fed-GraB}, clients can effectively alleviate the distribution drift caused by data heterogeneity during the model training process and obtain a global model with better performance on the minority classes while maintaining the performance of the majority classes.


8-year-old kid with a metal detector stumbles upon a 19th century shipwreck

Popular Science

Breakthroughs, discoveries, and DIY tips sent every weekday. A Canadian kid is proof that major scientific discoveries don't always have to come from grizzled researchers with fancy equipment. Two years ago, then-8-year-old Lucas Atchison went on a family trip to Point Farms Provincial Park in Ontario. Armed with a metal detector he had just received as a birthday present, Atchison dutifully scanned the area, hoping to hear that coveted "beep." Eagerly digging into the site, Lucas uncovered a metal spike, which his father initially dismissed as something used to tie up boats.


Differentially Private Statistical Inference through \beta -Divergence One Posterior Sampling

Neural Information Processing Systems

Differential privacy guarantees allow the results of a statistical analysis involving sensitive data to be released without compromising the privacy of any individual taking part. Achieving such guarantees generally requires the injection of noise, either directly into parameter estimates or into the estimation process. Instead of artificially introducing perturbations, sampling from Bayesian posterior distributions has been shown to be a special case of the exponential mechanism, producing consistent,and efficient private estimates without altering the data generative process. The application of current approaches has, however, been limited by their strong bounding assumptions which do not hold for basic models, such as simple linear regressors.To ameliorate this, we propose \beta D-Bayes, a posterior sampling scheme from a generalised posterior targeting the minimisation of the \beta -divergence between the model and the data generating process. This provides private estimation that is generally applicable without requiring changes to the underlying model and consistently learns the data generating parameter.


ASL Citizen: A Community-Sourced Dataset for Advancing Isolated Sign Language Recognition

Neural Information Processing Systems

Sign languages are used as a primary language by approximately 70 million D/deaf people world-wide. However, most communication technologies operate in spoken and written languages, creating inequities in access. To help tackle this problem, we release ASL Citizen, the first crowdsourced Isolated Sign Language Recognition (ISLR) dataset, collected with consent and containing 83,399 videos for 2,731 distinct signs filmed by 52 signers in a variety of environments. We propose that this dataset be used for sign language dictionary retrieval for American Sign Language (ASL), where a user demonstrates a sign to their webcam to retrieve matching signs from a dictionary. We show that training supervised machine learning classifiers with our dataset advances the state-of-the-art on metrics relevant for dictionary retrieval, achieving 63\% accuracy and a recall-at-10 of 91\%, evaluated entirely on videos of users who are not present in the training or validation sets.


The Day Grok Told Everyone About 'White Genocide'

The Atlantic - Technology

Yesterday, a user on X saw a viral post of Timothée Chalamet celebrating courtside at a Knicks game and had a simple question: Who was sitting next to him? The user tapped in Grok, X's proprietary chatbot, as people often do when they want help answering questions on the platform--the software functions like ChatGPT, except it can be summoned via reply to a post. And for the most part, Grok has performed reasonably well at providing responses. Chalamet was sitting with Kylie and Kendall Jenner, but here is how the chatbot replied: "I believe you're referring to a photo with Timothée Chalamet, but the context you mention doesn't seem to align with this image. The post discusses South African politics, which doesn't relate to Timothée or the people around him."