Regression
Blind Polynomial Regression
Fitting a polynomial to observed data is an ubiquitous task in many signal processing and machine learning tasks, such as interpolation and prediction. In that context, input and output pairs are available and the goal is to find the coefficients of the polynomial. However, in many applications, the input may be partially known or not known at all, rendering conventional regression approaches not applicable. In this paper, we formally state the (potentially partial) blind regression problem, illustrate some of its theoretical properties, and propose algorithmic approaches to solve it. As a case-study, we apply our methods to a jitter-correction problem and corroborate its performance.
Applications of Logistic Regression part3(Machine Learning)
Abstract: We consider vertical logistic regression (VLR) trained with mini-batch gradient descent -- a setting which has attracted growing interest among industries and proven to be useful in a wide range of applications including finance and medical research. We provide a comprehensive and rigorous privacy analysis of VLR in a class of open-source Federated Learning frameworks, where the protocols might differ between one another, yet a procedure of obtaining local gradients is implicitly shared. We first consider the honest-but-curious threat model, in which the detailed implementation of protocol is neglected and only the shared procedure is assumed, which we abstract as an oracle. We find that even under this general setting, single-dimension feature and label can still be recovered from the other party under suitable constraints of batch size, thus demonstrating the potential vulnerability of all frameworks following the same philosophy. Then we look into a popular instantiation of the protocol based on Homomorphic Encryption (HE).
Top 10 Machine Learning Algorithms for Beginners to Dive Into
Each machine learning algorithm handles one specific problem, and this way beginners can dive into one of these to figure out solutions, one at a time. Here is a compilation of the top machine learning algorithms that are frequently used in all machine learning fields. Now, you can practice ML algorithms here. Forming relationships between two variables is almost the starting point of a model, and linear regression in machine learning achieves that. The relationship between the dependent and independent variables is established by aligning them on a regression line.
Global Big Data Conference
As we inch closer to Black Friday and the start of the holiday buying extravaganza, retailers are putting the final touches on the demand forecasts they're using to predict the mix of goods they'll carry this winter. There are lot of variables to juggle, including COVID, the economy, and the weather. It seems like a perfect use case for the increasingly sophisticated machine learning models that are in vogue in the industry. But can they trust their predictions? Over the past decade, retailers and other companies in the consumer goods supply chain have started upgrading their demand forecasting systems in hopes of gaining ground in this super competitive industry. Forward-looking retailers, in particular, are replacing the largely deterministic approaches that were favored in the pastโwhich used simple linear regression models based on historical data with relatively static assumptions about the state of the worldโwith probabilistic approaches that bring more data into the equation and rely on more sophisticated machine learning algorithms, like neural nets and XGBoost, to generate more detailed forecast ranges.
Fast Instrument Learning with Faster Rates
Wang, Ziyu, Zhou, Yuhao, Zhu, Jun
We investigate nonlinear instrumental variable (IV) regression given high-dimensional instruments. We propose a simple algorithm which combines kernelized IV methods and an arbitrary, adaptive regression algorithm, accessed as a black box. Our algorithm enjoys faster-rate convergence and adapts to the dimensionality of informative latent features, while avoiding an expensive minimax optimization procedure, which has been necessary to establish similar guarantees. It further brings the benefit of flexible machine learning models to quasi-Bayesian uncertainty quantification, likelihood-based model selection, and model averaging. Simulation studies demonstrate the competitive performance of our method.
Batch Bayesian optimisation via density-ratio estimation with guarantees
Oliveira, Rafael, Tiao, Louis, Ramos, Fabio
Bayesian optimisation (BO) algorithms have shown remarkable success in applications involving expensive black-box functions. Traditionally BO has been set as a sequential decision-making process which estimates the utility of query points via an acquisition function and a prior over functions, such as a Gaussian process. Recently, however, a reformulation of BO via density-ratio estimation (BORE) allowed reinterpreting the acquisition function as a probabilistic binary classifier, removing the need for an explicit prior over functions and increasing scalability. In this paper, we present a theoretical analysis of BORE's regret and an extension of the algorithm with improved uncertainty estimates. We also show that BORE can be naturally extended to a batch optimisation setting by recasting the problem as approximate Bayesian inference. The resulting algorithms come equipped with theoretical performance guarantees and are assessed against other batch and sequential BO baselines in a series of experiments.
Adaptive Data Fusion for Multi-task Non-smooth Optimization
Lam, Henry, Wang, Kaizheng, Wu, Yuhang, Zhang, Yichen
In most machine-learning contexts, algorithm developers and theorists are concerned with solving a single task or optimizing a single metric at a time. Nonetheless, even in the big data era, the datasets are expensive and oftentimes collected for a large number of tasks, and models based on a single task likely hit the performance ceiling due to the limited sample size without fully exploiting the dataset featuring multiple tasks. For instance, in inventory management, the hype cycle of technology is getting shortened. It is increasingly critical for retailers to recognize the consumption patterns of customers as early as possible, so as to minimize the cost caused by backordering and holding. Since the selling data is limited at the early stage of the operations, decision making can generally be challenging.
Calibration tests beyond classification
Widmann, David, Lindsten, Fredrik, Zachariah, Dave
Most supervised machine learning tasks are subject to irreducible prediction errors. Probabilistic predictive models address this limitation by providing probability distributions that represent a belief over plausible targets, rather than point estimates. Such models can be a valuable tool in decision-making under uncertainty, provided that the model output is meaningful and interpretable. Calibrated models guarantee that the probabilistic predictions are neither over- nor under-confident. In the machine learning literature, different measures and statistical tests have been proposed and studied for evaluating the calibration of classification models. For regression problems, however, research has been focused on a weaker condition of calibration based on predicted quantiles for real-valued targets. In this paper, we propose the first framework that unifies calibration evaluation and tests for general probabilistic predictive models. It applies to any such model, including classification and regression models of arbitrary dimension. Furthermore, the framework generalizes existing measures and provides a more intuitive reformulation of a recently proposed framework for calibration in multi-class classification. In particular, we reformulate and generalize the kernel calibration error, its estimators, and hypothesis tests using scalar-valued kernels, and evaluate the calibration of real-valued regression problems.
A Non-Asymptotic Moreau Envelope Theory for High-Dimensional Generalized Linear Models
Zhou, Lijia, Koehler, Frederic, Sur, Pragya, Sutherland, Danica J., Srebro, Nathan
We prove a new generalization bound that shows for any class of linear predictors in Gaussian space, the Rademacher complexity of the class and the training error under any continuous loss $\ell$ can control the test error under all Moreau envelopes of the loss $\ell$. We use our finite-sample bound to directly recover the "optimistic rate" of Zhou et al. (2021) for linear regression with the square loss, which is known to be tight for minimal $\ell_2$-norm interpolation, but we also handle more general settings where the label is generated by a potentially misspecified multi-index model. The same argument can analyze noisy interpolation of max-margin classifiers through the squared hinge loss, and establishes consistency results in spiked-covariance settings. More generally, when the loss is only assumed to be Lipschitz, our bound effectively improves Talagrand's well-known contraction lemma by a factor of two, and we prove uniform convergence of interpolators (Koehler et al. 2021) for all smooth, non-negative losses. Finally, we show that application of our generalization bound using localized Gaussian width will generally be sharp for empirical risk minimizers, establishing a non-asymptotic Moreau envelope theory for generalization that applies outside of proportional scaling regimes, handles model misspecification, and complements existing asymptotic Moreau envelope theories for M-estimation.
Sequential Gradient Descent and Quasi-Newton's Method for Change-Point Analysis
One common approach to detecting change-points is minimizing a cost function over possible numbers and locations of change-points. The framework includes several well-established procedures, such as the penalized likelihood and minimum description length. Such an approach requires finding the cost value repeatedly over different segments of the data set, which can be time-consuming when (i) the data sequence is long and (ii) obtaining the cost value involves solving a non-trivial optimization problem. This paper introduces a new sequential method (SE) that can be coupled with gradient descent (SeGD) and quasi-Newton's method (SeN) to find the cost value effectively. The core idea is to update the cost value using the information from previous steps without re-optimizing the objective function. The new method is applied to change-point detection in generalized linear models and penalized regression. Numerical studies show that the new approach can be orders of magnitude faster than the Pruned Exact Linear Time (PELT) method without sacrificing estimation accuracy.