Goto

Collaborating Authors

 contrast function




Feature learning from non-Gaussian inputs: the case of Independent Component Analysis in high dimensions

Ricci, Fabiola, Bardone, Lorenzo, Goldt, Sebastian

arXiv.org Machine Learning

Deep neural networks learn structured features from complex, non-Gaussian inputs, but the mechanisms behind this process remain poorly understood. Our work is motivated by the observation that the first-layer filters learnt by deep convolutional neural networks from natural images resemble those learnt by independent component analysis (ICA), a simple unsupervised method that seeks the most non-Gaussian projections of its inputs. This similarity suggests that ICA provides a simple, yet principled model for studying feature learning. Here, we leverage this connection to investigate the interplay between data structure and optimisation in feature learning for the most popular ICA algorithm, FastICA, and stochastic gradient descent (SGD), which is used to train deep networks. We rigorously establish that FastICA requires at least $n\gtrsim d^4$ samples to recover a single non-Gaussian direction from $d$-dimensional inputs on a simple synthetic data model. We show that vanilla online SGD outperforms FastICA, and prove that the optimal sample complexity $n \gtrsim d^2$ can be reached by smoothing the loss, albeit in a data-dependent way. We finally demonstrate the existence of a search phase for FastICA on ImageNet, and discuss how the strong non-Gaussianity of said images compensates for the poor sample complexity of FastICA.


Fractional Order Distributed Optimization

Lixandru, Andrei, van Gerven, Marcel, Pequito, Sergio

arXiv.org Artificial Intelligence

Distributed optimization is fundamental to modern machine learning applications like federated learning, but existing methods often struggle with ill-conditioned problems and face stability-versus-speed tradeoffs. We introduce fractional order distributed optimization (FrODO); a theoretically-grounded framework that incorporates fractional-order memory terms to enhance convergence properties in challenging optimization landscapes. Our approach achieves provable linear convergence for any strongly connected network. Through empirical validation, our results suggest that FrODO achieves up to 4 times faster convergence versus baselines on ill-conditioned problems and 2-3 times speedup in federated neural network training, while maintaining stability and theoretical guarantees.


Recursive Learning of Asymptotic Variational Objectives

Mastrototaro, Alessandro, Müller, Mathias, Olsson, Jimmy

arXiv.org Machine Learning

General state-space models (SSMs) are widely used in statistical machine learning and are among the most classical generative models for sequential time-series data. SSMs, comprising latent Markovian states, can be subjected to variational inference (VI), but standard VI methods like the importance-weighted autoencoder (IWAE) lack functionality for streaming data. To enable online VI in SSMs when the observations are received in real time, we propose maximising an IWAE-type variational lower bound on the asymptotic contrast function, rather than the standard IWAE ELBO, using stochastic approximation. Unlike the recursive maximum likelihood method, which directly maximises the asymptotic contrast, our approach, called online sequential IWAE (OSIWAE), allows for online learning of both model parameters and a Markovian recognition model for inferring latent states. By approximating filter state posteriors and their derivatives using sequential Monte Carlo (SMC) methods, we create a particle-based framework for online VI in SSMs. This approach is more theoretically well-founded than recently proposed online variational SMC methods. We provide rigorous theoretical results on the learning objective and a numerical study demonstrating the method's efficiency in learning model parameters and particle proposal kernels.


Differentially Private M-Estimators

Neural Information Processing Systems

This paper studies privacy preserving M-estimators using perturbed histograms. The proposed approach allows the release of a wide class of M-estimators with both differential privacy and statistical utility without knowing a priori the particular inference procedure. The performance of the proposed method is demonstrated through a careful study of the convergence rates. A practical algorithm is given and applied on a real world data set containing both continuous and categorical variables.


Statistically Efficient Advantage Learning for Offline Reinforcement Learning in Infinite Horizons

Shi, Chengchun, Luo, Shikai, Le, Yuan, Zhu, Hongtu, Song, Rui

arXiv.org Artificial Intelligence

Reinforcement learning (RL, see Sutton and Barto, 2018, for an overview) is concerned with how intelligence agents learn and take actions in an unknown environment in order to maximize the cumulative reward that it receives. It has been arguably one of the most vibrant research frontiers in machine learning over the last few years. According to Google Scholar, over 40K scientific articles have been published in 2020 with the phrase "reinforcement learning". Over 100 papers on RL were accepted for presentation at ICML 2021, a premier conference in the machine learning area, accounting for more than 10% of the accepted papers in total. RL algorithms have been applied in a wide variety of real applications, including games (Silver et al., 2016), robotics (Kormushev et al., 2013), healthcare (Komorowski et al., 2018), bidding (Jin et al., 2018), ridesharing (Xu et al., 2018) and automated driving (de Haan et al., 2019), to name a few. This paper is partly motivated by developing statistical learning methodologies in offline RL domains such as mobile health (mHealth).


Adaptive Semi-Supervised Inference for Optimal Treatment Decisions with Electronic Medical Record Data

Gunn, Kevin, Lu, Wenbin, Song, Rui

arXiv.org Machine Learning

A treatment regime is a rule that assigns a treatment to patients based on their covariate information. Recently, estimation of the optimal treatment regime that yields the greatest overall expected clinical outcome of interest has attracted a lot of attention. In this work, we consider estimation of the optimal treatment regime with electronic medical record data under a semi-supervised setting. Here, data consist of two parts: a set of `labeled' patients for whom we have the covariate, treatment and outcome information, and a much larger set of `unlabeled' patients for whom we only have the covariate information. We proposes an imputation-based semi-supervised method, utilizing `unlabeled' individuals to obtain a more efficient estimator of the optimal treatment regime. The asymptotic properties of the proposed estimators and their associated inference procedure are provided. Simulation studies are conducted to assess the empirical performance of the proposed method and to compare with a fully supervised method using only the labeled data. An application to an electronic medical record data set on the treatment of hypotensive episodes during intensive care unit (ICU) stays is also given for further illustration.


Boosting Independent Component Analysis

Li, Yunpeng, Ye, ZhaoHui

arXiv.org Machine Learning

Independent component analysis is intended to recover the unknown components as independent as possible from their linear mixtures. This technique has been widely used in many fields, such as data analysis, signal processing, and machine learning. In this paper, we present a novel boosting-based algorithm for independent component analysis. Our algorithm fills the gap in the nonparametric independent component analysis by introducing boosting to maximum likelihood estimation. A variety of experiments validate its performance compared with many of the presently known algorithms.


CAPITAL: Optimal Subgroup Identification via Constrained Policy Tree Search

Cai, Hengrui, Lu, Wenbin, West, Rachel Marceau, Mehrotra, Devan V., Huang, Lingkang

arXiv.org Machine Learning

Personalized medicine, a paradigm of medicine tailored to a patient's characteristics, is an increasingly attractive field in health care. An important goal of personalized medicine is to identify a subgroup of patients, based on baseline covariates, that benefits more from the targeted treatment than other comparative treatments. Most of the current subgroup identification methods only focus on obtaining a subgroup with an enhanced treatment effect without paying attention to subgroup size. Yet, a clinically meaningful subgroup learning approach should identify the maximum number of patients who can benefit from the better treatment. In this paper, we present an optimal subgroup selection rule (SSR) that maximizes the number of selected patients, and in the meantime, achieves the pre-specified clinically meaningful mean outcome, such as the average treatment effect. We derive two equivalent theoretical forms of the optimal SSR based on the contrast function that describes the treatment-covariates interaction in the outcome. We further propose a ConstrAined PolIcy Tree seArch aLgorithm (CAPITAL) to find the optimal SSR within the interpretable decision tree class. The proposed method is flexible to handle multiple constraints that penalize the inclusion of patients with negative treatment effects, and to address time to event data using the restricted mean survival time as the clinically interesting mean outcome. Extensive simulations, comparison studies, and real data applications are conducted to demonstrate the validity and utility of our method.