Goto

Collaborating Authors

 Regression


Off-the-grid learning of sparse mixtures from a continuous dictionary

arXiv.org Machine Learning

We consider a general non-linear model where the signal is a finite mixture of an unknown, possibly increasing, number of features issued from a continuous dictionary parameterized by a real nonlinear parameter. The signal is observed with Gaussian (possibly correlated) noise in either a continuous or a discrete setup. We propose an off-the-grid optimization method, that is, a method which does not use any discretization scheme on the parameter space, to estimate both the non-linear parameters of the features and the linear parameters of the mixture. We use recent results on the geometry of off-the-grid methods to give minimal separation on the true underlying non-linear parameters such that interpolating certificate functions can be constructed. Using also tail bounds for suprema of Gaussian processes we bound the prediction error with high probability. Assuming that the certificate functions can be constructed, our prediction error bound is up to log --factors similar to the rates attained by the Lasso predictor in the linear regression model. We also establish convergence rates that quantify with high probability the quality of estimation for both the linear and the non-linear parameters.


Modeling speech recognition and synthesis simultaneously: Encoding and decoding lexical and sublexical semantic information into speech with no direct access to speech data

arXiv.org Artificial Intelligence

Human speakers encode information into raw speech which is then decoded by the listeners. This complex relationship between encoding (production) and decoding (perception) is often modeled separately. Here, we test how encoding and decoding of lexical semantic information can emerge automatically from raw speech in unsupervised generative deep convolutional networks that combine the production and perception principles of speech. We introduce, to our knowledge, the most challenging objective in unsupervised lexical learning: a network that must learn unique representations for lexical items with no direct access to training data. We train several models (ciwGAN and fiwGAN arXiv:2006.02951) and test how the networks classify acoustic lexical items in unobserved test data. Strong evidence in favor of lexical learning and a causal relationship between latent codes and meaningful sublexical units emerge. The architecture that combines the production and perception principles is thus able to learn to decode unique information from raw acoustic data without accessing real training data directly. We propose a technique to explore lexical (holistic) and sublexical (featural) learned representations in the classifier network. The results bear implications for unsupervised speech technology, as well as for unsupervised semantic modeling as language models increasingly bypass text and operate from raw acoustics.


How to Build an Online Machine Learning App With Python

#artificialintelligence

Machine learning is rapidly becoming as ubiquitous as data itself. Quite literally wherever there is an abundance of data, machine learning is somehow intertwined. After all, what utility would data have if we were not able to use it to predict something about the future? Luckily there is a plethora of toolkits and frameworks that have made it rather simple to deploy ML in Python. Specifically, Sklearn has done a terrifically effective job at making ML accessible to developers.


Causal discovery under a confounder blanket

arXiv.org Machine Learning

Inferring causal relationships from observational data is rarely straightforward, but the problem is especially difficult in high dimensions. For these applications, causal discovery algorithms typically require parametric restrictions or extreme sparsity constraints. We relax these assumptions and focus on an important but more specialized problem, namely recovering the causal order among a subgraph of variables known to descend from some (possibly large) set of confounding covariates, i.e. a $\textit{confounder blanket}$. This is useful in many settings, for example when studying a dynamic biomolecular subsystem with genetic data providing background information. Under a structural assumption called the $\textit{confounder blanket principle}$, which we argue is essential for tractable causal discovery in high dimensions, our method accommodates graphs of low or high sparsity while maintaining polynomial time complexity. We present a structure learning algorithm that is provably sound and complete with respect to a so-called $\textit{lazy oracle}$. We design inference procedures with finite sample error control for linear and nonlinear systems, and demonstrate our approach on a range of simulated and real-world datasets. An accompanying $\texttt{R}$ package, $\texttt{cbl}$, is available from $\texttt{CRAN}$.


Benign overfitting and adaptive nonparametric regression

arXiv.org Machine Learning

Benign overfitting has attracted a great deal of attention in the recent years. It was initially motivated by the fact that deep neural networks have good predictive properties even when perfectly interpolating the training data [Belkin et al., 2019a], [Belkin et al., 2018b], [Zhang et al., 2021], [Belkin, 2021]. Such a behavior stands in strong contrast with the classical point of view that perfectly fitting the data points is not compatible with predicting well. With the aim of understanding this new phenomenon, a series of recent papers studied benign overfitting in linear regression setting, see [Bartlett et al., 2020], [Tsigler and Bartlett, 2020], [Chinot and Lerasle, 2020], [Muthukumar et al., 2020], [Bartlett and Long, 2021], [Lecué and Shang, 2022] and the references therein. The main conclusion for the linear model is that an unbalanced spectrum of the design matrix and over-parametrization, which in a sense approaches the model to non-parametric setting, are essential for benign overfitting to occur in linear regression. Extensions to kernel ridgeless regression were considered in [Liang and Rakhlin, 2020] when the sample size n and the dimension d were assumed to satisfy n null d, and in [Liang et al., 2020] for a more general case d null n


Nonparametric, Nonasymptotic Confidence Bands with Paley-Wiener Kernels for Band-Limited Functions

arXiv.org Artificial Intelligence

The paper introduces a method to construct confidence bands for bounded, band-limited functions based on a finite sample of input-output pairs. The approach is distribution-free w.r.t. the observation noises and only the knowledge of the input distribution is assumed. It is nonparametric, that is, it does not require a parametric model of the regression function and the regions have non-asymptotic guarantees. The algorithm is based on the theory of Paley-Wiener reproducing kernel Hilbert spaces. The paper first studies the fully observable variant, when there are no noises on the observations and only the inputs are random; then it generalizes the ideas to the noisy case using gradient-perturbation methods. Finally, numerical experiments demonstrating both cases are presented.


Fast ABC-Boost: A Unified Framework for Selecting the Base Class in Multi-Class Classification

arXiv.org Machine Learning

The work in ICML'09 showed that the derivatives of the classical multi-class logistic regression loss function could be re-written in terms of a pre-chosen "base class" and applied the new derivatives in the popular boosting framework. In order to make use of the new derivatives, one must have a strategy to identify/choose the base class at each boosting iteration. The idea of "adaptive base class boost" (ABC-Boost) in ICML'09, adopted a computationally expensive "exhaustive search" strategy for the base class at each iteration. It has been well demonstrated that ABC-Boost, when integrated with trees, can achieve substantial improvements in many multi-class classification tasks. Furthermore, the work in UAI'10 derived the explicit second-order tree split gain formula which typically improved the classification accuracy considerably, compared with using only the fist-order information for tree-splitting, for both multi-class and binary-class classification tasks. In this paper, we develop a unified framework for effectively selecting the base class by introducing a series of ideas to improve the computational efficiency of ABC-Boost. Our framework has parameters $(s,g,w)$. At each boosting iteration, we only search for the "$s$-worst classes" (instead of all classes) to determine the base class. We also allow a "gap" $g$ when conducting the search. That is, we only search for the base class at every $g+1$ iterations. We furthermore allow a "warm up" stage by only starting the search after $w$ boosting iterations. The parameters $s$, $g$, $w$, can be viewed as tunable parameters and certain combinations of $(s,g,w)$ may even lead to better test accuracy than the "exhaustive search" strategy. Overall, our proposed framework provides a robust and reliable scheme for implementing ABC-Boost in practice.


On boundary conditions parametrized by analytic functions

arXiv.org Machine Learning

Computer algebra can answer various questions about partial differential equations using symbolic algorithms. However, the inclusion of data into equations is rare in computer algebra. Therefore, recently, computer algebra models have been combined with Gaussian processes, a regression model in machine learning, to describe the behavior of certain differential equations under data. While it was possible to describe polynomial boundary conditions in this context, we extend these models to analytic boundary conditions. Additionally, we describe the necessary algorithms for Gröbner and Janet bases of Weyl algebras with certain analytic coefficients. Using these algorithms, we provide examples of divergence-free flow in domains bounded by analytic functions and adapted to observations. Keywords: Gaussian processes boundary conditions Gröbner bases partial differential equations.


Machine Learning Algorithms Cheat Sheet

#artificialintelligence

Machine learning is a subfield of artificial intelligence (AI) and computer science that focuses on using data and algorithms to mimic the way people learn, progressively improving its accuracy. This way, Machine Learning is one of the most interesting methods in Computer Science these days, and it's being applied behind the scenes in products and services we consume in everyday life. In case you want to know what Machine Learning algorithms are used in different applications, or if you are a developer and you're looking for a method to use for a problem you are trying to solve, keep reading below and use these steps as a guide. Machine Learning can be divided into three different types of learning: Unsupervised Learning, Supervised Learning, and Semi-supervised Learning. Unsupervised learning uses information data that is not labeled, that way the machine should work with no guidance according to patterns, similarities, and differences. On the other hand, supervised learning has a presence of a "teacher", who is in charge of training the machine by labeling the data to work with. Next, the machine receives some examples that allow it to produce a correct outcome.


An Investigation on Non-Invasive Brain-Computer Interfaces: Emotiv Epoc+ Neuroheadset and Its Effectiveness

arXiv.org Artificial Intelligence

In this study, we illustrate the progress of BCI research and present scores of unveiled contemporary approaches. First, we explore a decoding natural speech approach that is designed to decode human speech directly from the human brain onto a digital screen introduced by Facebook Reality Lab and University of California San Francisco. Then, we study a recently presented visionary project to control the human brain using Brain-Machine Interfaces (BMI) approach. We also investigate well-known electroencephalography (EEG) based Emotiv Epoc+ Neuroheadset to identify six emotional parameters including engagement, excitement, focus, stress, relaxation, and interest using brain signals by experimenting the neuroheadset among three human subjects where we utilize two supervised learning classifiers, Naive Bayes and Linear Regression to show the accuracy and competency of the Epoc+ device and its associated applications in neurotechnological research. We present experimental studies and the demonstration indicates 69% and 62% improved accuracy for the aforementioned classifiers respectively in reading the performance matrices of the participants. We envision that non-invasive, insertable, and low-cost BCI approaches shall be the focal point for not only an alternative for patients with physical paralysis but also understanding the brain that would pave us to access and control the memories and brain somewhere very near.