Country
Local Nonparametric Meta-Learning
A central goal of meta-learning is to find a learning rule that enables fast adaptation across a set of tasks, by learning the appropriate inductive bias for that set. Most meta-learning algorithms try to find a \textit{global} learning rule that encodes this inductive bias. However, a global learning rule represented by a fixed-size representation is prone to meta-underfitting or -overfitting since the right representational power for a task set is difficult to choose a priori. Even when chosen correctly, we show that global, fixed-size representations often fail when confronted with certain types of out-of-distribution tasks, even when the same inductive bias is appropriate. To address these problems, we propose a novel nonparametric meta-learning algorithm that utilizes a meta-trained local learning rule, building on recent ideas in attention-based and functional gradient-based meta-learning. In several meta-regression problems, we show improved meta-generalization results using our local, nonparametric approach and achieve state-of-the-art results in the robotics benchmark, Omnipush.
Privacy-Preserving Image Classification in the Local Setting
Image data has been greatly produced by individuals and commercial vendors in the daily life, and it has been used across various domains, like advertising, medical and traffic analysis. Recently, image data also appears to be greatly important in social utility, like emergency response. However, the privacy concern becomes the biggest obstacle that prevents further exploration of image data, due to that the image could reveal sensitive information, like the personal identity and locations. The recent developed Local Differential Privacy (LDP) brings us a promising solution, which allows the data owners to randomly perturb their input to provide the plausible deniability of the data before releasing. In this paper, we consider a two-party image classification problem, in which data owners hold the image and the untrustworthy data user would like to fit a machine learning model with these images as input. To protect the image privacy, we propose to locally perturb the image representation before revealing to the data user. Subsequently, we analyze how the perturbation satisfies {\epsilon}-LDP and affect the data utility regarding count-based and distance-based machine learning algorithm, and propose a supervised image feature extractor, DCAConv, which produces an image representation with scalable domain size. Our experiments show that DCAConv could maintain a high data utility while preserving the privacy regarding multiple image benchmark datasets.
Composing Molecules with Multiple Property Constraints
Jin, Wengong, Barzilay, Regina, Jaakkola, Tommi
Drug discovery aims to find novel compounds with specified chemical property profiles. In terms of generative modeling, the goal is to learn to sample molecules in the intersection of multiple property constraints. This task becomes increasingly challenging when there are many property constraints. We propose to offset this complexity by composing molecules from a vocabulary of substructures that we call molecular rationales. These rationales are identified from molecules as substructures that are likely responsible for each property of interest. We then learn to expand rationales into a full molecule using graph generative models. Our final generative model composes molecules as mixtures of multiple rationale completions, and this mixture is fine-tuned to preserve the properties of interest. We evaluate our model on various drug design tasks and demonstrate significant improvements over state-of-the-art baselines in terms of accuracy, diversity, and novelty of generated compounds.
Multi-task Reinforcement Learning with a Planning Quasi-Metric
Micheli, Vincent, Sinnathamby, Karthigan, Fleuret, Franรงois
We introduce a new reinforcement learning approach combining a planning quasi-metric (PQM) that estimates the number of actions required to go from a state to another, with task-specific planners that compute a target state to reach a given goal. The main advantage of this decomposition is to allow the sharing across tasks of a task-agnostic model of the quasi-metric that captures the environment's dynamics and can be learned in a dense and unsupervised manner. We demonstrate the usefulness of this approach on the standard bit-flip problem and in the MuJoCo robotic arm simulator.
Curse of Dimensionality on Randomized Smoothing for Certifiable Robustness
Kumar, Aounon, Levine, Alexander, Goldstein, Tom, Feizi, Soheil
Randomized smoothing, using just a simple isotropic Gaussian distribution, has been shown to produce good robustness guarantees against $\ell_2$-norm bounded adversaries. In this work, we show that extending the smoothing technique to defend against other attack models can be challenging, especially in the high-dimensional regime. In particular, for a vast class of i.i.d. smoothing distributions, we prove that the largest $\ell_p$-radius that can be certified decreases as $O(1/d^{\frac{1}{2} - \frac{1}{p}})$ with dimension $d$ for $p > 2$. Notably, for $p \geq 2$, this dependence on $d$ is no better than that of the $\ell_p$-radius that can be certified using isotropic Gaussian smoothing, essentially putting a matching lower bound on the robustness radius. When restricted to generalized Gaussian smoothing, these two bounds can be shown to be within a constant factor of each other in an asymptotic sense, establishing that Gaussian smoothing provides the best possible results, up to a constant factor, when $p \geq 2$. We present experimental results on CIFAR to validate our theory. For other smoothing distributions, such as, a uniform distribution within an $\ell_1$ or an $\ell_\infty$-norm ball, we show upper bounds of the form $O(1 / d)$ and $O(1 / d^{1 - \frac{1}{p}})$ respectively, which have an even worse dependence on $d$.
Learning CHARME models with (deep) neural networks
Gรณmez-Garcรญa, Josรฉ G., Fadili, Jalal, Chesneau, Christophe
In this paper, we consider a model called CHARME (Conditional Heteroscedastic Autoregressive Mixture of Experts), a class of generalized mixture of nonlinear nonparametric AR-ARCH time series. Under certain Lipschitz-type conditions on the autoregressive and volatility functions, we prove that this model is stationary, ergodic and $\tau$-weakly dependent. These conditions are much weaker than those presented in the literature that treats this model. Moreover, this result forms the theoretical basis for deriving an asymptotic theory of the underlying (non)parametric estimation, which we present for this model. As an application, from the universal approximation property of neural networks (NN), possibly with deep architectures, we develop a learning theory for the NN-based autoregressive functions of the model, where the strong consistency and asymptotic normality of the considered estimator of the NN weights and biases are guaranteed under weak conditions.
Hierarchical Generation of Molecular Graphs using Structural Motifs
Jin, Wengong, Barzilay, Regina, Jaakkola, Tommi
Graph generation techniques are increasingly being adopted for drug discovery. Previous graph generation approaches have utilized relatively small molecular building blocks such as atoms or simple cycles, limiting their effectiveness to smaller molecules. Indeed, as we demonstrate, their performance degrades significantly for larger molecules. In this paper, we propose a new hierarchical graph encoder-decoder that employs significantly larger and more flexible graph motifs as basic building blocks. Our encoder produces a multi-resolution representation for each molecule in a fine-to-coarse fashion, from atoms to connected motifs. Each level integrates the encoding of constituents below with the graph at that level. Our autoregressive coarse-to-fine decoder adds one motif at a time, interleaving the decision of selecting a new motif with the process of resolving its attachments to the emerging molecule. We evaluate our model on multiple molecule generation tasks, including polymers, and show that our model significantly outperforms previous state-of-the-art baselines.
Supervised Quantile Normalization for Low-rank Matrix Approximation
Cuturi, Marco, Teboul, Olivier, Niles-Weed, Jonathan, Vert, Jean-Philippe
Low rank matrix factorization is a fundamental building block in machine learning, used for instance to summarize gene expression profile data or word-document counts. To be robust to outliers and differences in scale across features, a matrix factorization step is usually preceded by ad-hoc feature normalization steps, such as \texttt{tf-idf} scaling or data whitening. We propose in this work to learn these normalization operators jointly with the factorization itself. More precisely, given a $d\times n$ matrix $X$ of $d$ features measured on $n$ individuals, we propose to learn the parameters of quantile normalization operators that can operate row-wise on the values of $X$ and/or of its factorization $UV$ to improve the quality of the low-rank representation of $X$ itself. This optimization is facilitated by the introduction of a differentiable quantile normalization operator built using optimal transport, providing new results on top of existing work by Cuturi et al. (2019). We demonstrate the applicability of these techniques on synthetic and genomics datasets.
Conjoined Dirichlet Process
Ngo, Michelle N., Pluta, Dustin S., Ngo, Alexander N., Shahbaba, Babak
Biclustering is a class of techniques that simultaneously clusters the rows and columns of a matrix to sort heterogeneous data into homogeneous blocks. Although many algorithms have been proposed to find biclusters, existing methods suffer from the pre-specification of the number of biclusters or place constraints on the model structure. To address these issues, we develop a novel, non-parametric probabilistic biclustering method based on Dirichlet processes to identify biclusters with strong co-occurrence in both rows and columns. The proposed method utilizes dual Dirichlet process mixture models to learn row and column clusters, with the number of resulting clusters determined by the data rather than pre-specified. Probabilistic biclusters are identified by modeling the mutual dependence between the row and column clusters. We apply our method to two different applications, text mining and gene expression analysis, and demonstrate that our method improves bicluster extraction in many settings compared to existing approaches.
SUOD: Toward Scalable Unsupervised Outlier Detection
Zhao, Yue, Ding, Xueying, Yang, Jianing, Bai, Haoping
Outlier detection is a key field of machine learning for identifying abnormal data objects. Due to the high expense of acquiring ground truth, unsupervised models are often chosen in practice. To compensate for the unstable nature of unsupervised algorithms, practitioners from high-stakes fields like finance, health, and security, prefer to build a large number of models for further combination and analysis. However, this poses scalability challenges in high-dimensional large datasets. In this study, we propose a three-module acceleration framework called SUOD to expedite the training and prediction with a large number of unsupervised detection models. SUOD's Random Projection module can generate lower subspaces for high-dimensional datasets while reserving their distance relationship. Balanced Parallel Scheduling module can forecast the training and prediction cost of models with high confidence---so the task scheduler could assign nearly equal amount of taskload among workers for efficient parallelization. SUOD also comes with a Pseudo-supervised Approximation module, which can approximate fitted unsupervised models by lower time complexity supervised regressors for fast prediction on unseen data. It may be considered as an unsupervised model knowledge distillation process. Notably, all three modules are independent with great flexibility to "mix and match"; a combination of modules can be chosen based on use cases. Extensive experiments on more than 30 benchmark datasets have shown the efficacy of SUOD, and a comprehensive future development plan is also presented.