Goto

Collaborating Authors

 Regression


On Uniform Convergence and Low-Norm Interpolation Learning

Neural Information Processing Systems

We consider an underdetermined noisy linear regression model where the minimum-norm interpolating predictor is known to be consistent, and ask: can uniform convergence in a norm ball, or at least (following Nagarajan and Kolter) the subset of a norm ball that the algorithm selects on a typical input set, explain this success? We show that uniformly bounding the difference between empirical and population errors cannot show any learning in the norm ball, and cannot show consistency for any set, even one depending on the exact algorithm and distribution. But we argue we can explain the consistency of the minimal-norm interpolator with a slightly weaker, yet standard, notion: uniform convergence of zero-error predictors in a norm ball. We use this to bound the generalization error of low- (but not minimal-)norm interpolating predictors.


Statistical Inference with M-Estimators on Adaptively Collected Data

Neural Information Processing Systems

Bandit algorithms are increasingly used in real-world sequential decision-making problems. Associated with this is an increased desire to be able to use the resulting datasets to answer scientific questions like: Did one type of ad lead to more purchases? In which contexts is a mobile health intervention effective? However, classical statistical approaches fail to provide valid confidence intervals when used with data collected with bandit algorithms. Alternative methods have recently been developed for simple models (e.g., comparison of means). Yet there is a lack of general methods for conducting statistical inference using more complex models on data collected with (contextual) bandit algorithms; for example, current methods cannot be used for valid inference on parameters in a logistic regression model for a binary reward.


GPU-Accelerated Primal Learning for Extremely Fast Large-Scale Classification

Neural Information Processing Systems

One of the most efficient methods to solve L2 -regularized primal problems, such as logistic regression and linear support vector machine (SVM) classification, is the widely used trust region Newton algorithm, TRON. While TRON has recently been shown to enjoy substantial speedups on shared-memory multi-core systems, exploiting graphical processing units (GPUs) to speed up the method is significantly more difficult, owing to the highly complex and heavily sequential nature of the algorithm. In this work, we show that using judicious GPU-optimization principles, TRON training time for different losses and feature representations may be drastically reduced. For sparse feature sets, we show that using GPUs to train logistic regression classifiers in LIBLINEAR is up to an order-of-magnitude faster than solely using multithreading. For dense feature sets–which impose far more stringent memory constraints–we show that GPUs substantially reduce the lengthy SVM learning times required for state-of-the-art proteomics analysis, leading to dramatic improvements over recently proposed speedups.


Kolmogorov-Arnold Neural Networks for High-Entropy Alloys Design

arXiv.org Artificial Intelligence

A wide range of deep learning-based machine learning techniques are extensively applied to the design of high-entropy alloys (HEAs), yielding numerous valuable insights. Kolmogorov-Arnold Networks (KAN) is a recently developed architecture that aims to improve both the accuracy and interpretability of input features. In this work, we explore three different datasets for HEA design and demonstrate the application of KAN for both classification and regression models. In the first example, we use a KAN classification model to predict the probability of single-phase formation in high-entropy carbide ceramics based on various properties such as mixing enthalpy and valence electron concentration. In the second example, we employ a KAN regression model to predict the yield strength and ultimate tensile strength of HEAs based on their chemical composition and process conditions including annealing time, cold rolling percentage, and homogenization temperature. The third example involves a KAN classification model to determine whether a certain composition is an HEA or non-HEA, followed by a KAN regressor model to predict the bulk modulus of the identified HEA, aiming to identify HEAs with high bulk modulus. In all three examples, KAN either outperform or match the performance in terms of accuracy such as F1 score for classification and Mean Square Error (MSE), and coefficient of determination (R2) for regression of the multilayer perceptron (MLP) by demonstrating the efficacy of KAN in handling both classification and regression tasks. We provide a promising direction for future research to explore advanced machine learning techniques, which lead to more accurate predictions and better interpretability of complex materials, ultimately accelerating the discovery and optimization of HEAs with desirable properties.


Nonlinear second-order dynamics describe labial constriction trajectories across languages and contexts

arXiv.org Artificial Intelligence

We investigate the dynamics of labial constriction trajectories during the production of /b/ and /m/ in English and Mandarin. We find that, across languages and contexts, the ratio of instantaneous displacement to instantaneous velocity generally follows an exponential decay curve from movement onset to movement offset. We formalize this empirical discovery in a differential equation and, in combination with an assumption of point attractor dynamics, derive a nonlinear second-order dynamical system describing labial constriction trajectories. The equation has only two parameters, T and r. T corresponds to the target state and r corresponds to movement rapidity. Thus, each of the parameters corresponds to a phonetically relevant dimension of control. Nonlinear regression demonstrates that the model provides excellent fits to individual movement trajectories. Moreover, trajectories simulated from the model qualitatively match empirical trajectories, and capture key kinematic variables like duration, peak velocity, and time to achieve peak velocity. The model constitutes a proposal for the dynamics of individual articulatory movements, and thus offers a novel foundation from which to understand additional influences on articulatory kinematics like prosody, inter-movement coordination, and stochastic noise.


Machine Learning for Missing Value Imputation

arXiv.org Artificial Intelligence

In recent times, a considerable number of research studies have been carried out to address the issue of Missing Value Imputation (MVI). MVI aims to provide a primary solution for datasets that have one or more missing attribute values. The advancements in Artificial Intelligence (AI) drive the development of new and improved machine learning (ML) algorithms and methods. The advancements in ML have opened up significant opportunities for effectively imputing these missing values. The main objective of this article is to conduct a comprehensive and rigorous review, as well as analysis, of the state-of-the-art ML applications in MVI methods. This analysis seeks to enhance researchers' understanding of the subject and facilitate the development of robust and impactful interventions in data preprocessing for Data Analytics. The review is performed following the Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA) technique. More than 100 articles published between 2014 and 2023 are critically reviewed, considering the methods and findings. Furthermore, the latest literature is examined to scrutinize the trends in MVI methods and their evaluation. The accomplishments and limitations of the existing literature are discussed in detail. The survey concludes by identifying the current gaps in research and providing suggestions for future research directions and emerging trends in related fields of interest.


Robustness Auditing for Linear Regression: To Singularity and Beyond

arXiv.org Artificial Intelligence

It has recently been discovered that the conclusions of many highly influential econometrics studies can be overturned by removing a very small fraction of their samples (often less than $0.5\%$). These conclusions are typically based on the results of one or more Ordinary Least Squares (OLS) regressions, raising the question: given a dataset, can we certify the robustness of an OLS fit on this dataset to the removal of a given number of samples? Brute-force techniques quickly break down even on small datasets. Existing approaches which go beyond brute force either can only find candidate small subsets to remove (but cannot certify their non-existence) [BGM20, KZC21], are computationally intractable beyond low dimensional settings [MR22], or require very strong assumptions on the data distribution and too many samples to give reasonable bounds in practice [BP21, FH23]. We present an efficient algorithm for certifying the robustness of linear regressions to removals of samples. We implement our algorithm and run it on several landmark econometrics datasets with hundreds of dimensions and tens of thousands of samples, giving the first non-trivial certificates of robustness to sample removal for datasets of dimension $4$ or greater. We prove that under distributional assumptions on a dataset, the bounds produced by our algorithm are tight up to a $1 + o(1)$ multiplicative factor.


Ising Model Selection Using \ell_{1} -Regularized Linear Regression: A Statistical Mechanics Analysis

Neural Information Processing Systems

We theoretically analyze the typical learning performance of \ell_{1} -regularized linear regression ( \ell_1 -LinR) for Ising model selection using the replica method from statistical mechanics. For typical random regular graphs in the paramagnetic phase, an accurate estimate of the typical sample complexity of \ell_1 -LinR is obtained. Remarkably, despite the model misspecification, \ell_1 -LinR is model selection consistent with the same order of sample complexity as \ell_{1} -regularized logistic regression ( \ell_1 -LogR), i.e., M \mathcal{O}\left(\log N\right), where N is the number of variables of the Ising model. Moreover, we provide an efficient method to accurately predict the non-asymptotic behavior of \ell_1 -LinR for moderate M, N, such as precision and recall. Simulations show a fairly good agreement between theoretical predictions and experimental results, even for graphs with many loops, which supports our findings.


Fast, Provably convergent IRLS Algorithm for p-norm Linear Regression

Neural Information Processing Systems

Linear regression in Lp-norm is a canonical optimization problem that arises in several applications, including sparse recovery, semi-supervised learning, and signal processing. Generic convex optimization algorithms for solving Lp-regression are slow in practice. Iteratively Reweighted Least Squares (IRLS) is an easy to implement family of algorithms for solving these problems that has been studied for over 50 years. However, these algorithms often diverge for p 3, and since the work of Osborne (1985), it has been an open problem whether there is an IRLS algorithm that converges for p 3. We propose p-IRLS, the first IRLS algorithm that provably converges geometrically for any p \in [2,\infty). Our algorithm is simple to implement and is guaranteed to find a high accuracy solution in a sub-linear number of iterations.


Robust Meta-learning for Mixed Linear Regression with Small Batches

Neural Information Processing Systems

A common challenge faced in practical supervised learning, such as medical image processing and robotic interactions, is that there are plenty of tasks but each task cannot afford to collect enough labeled examples to be learned in isolation. However, by exploiting the similarities across those tasks, one can hope to overcome such data scarcity. Under a canonical scenario where each task is drawn from a mixture of k linear regressions, we study a fundamental question: can abundant small-data tasks compensate for the lack of big-data tasks? Existing second moment based approaches of \cite{2020arXiv200208936K} show that such a trade-off is efficiently achievable, with the help of medium-sized tasks with \Omega(k {1/2}) examples each. However, this algorithm is brittle in two important scenarios.