AITopics

Specifying a Bayesian prior is notoriously difficult for complex models such as neural networks. Reasoning about parameters is made challenging by the high-dimensionality and over-parameterization of the space. Priors that seem benign and uninformative can have unintuitive and detrimental effects on a model's predictions. For this reason, we propose predictive complexity priors: a functional prior that is defined by comparing the model's predictions to those of a reference model. Although originally defined on the model outputs, we transfer the prior to the model parameters via a change of variables. The traditional Bayesian workflow can then proceed as usual. We apply our predictive complexity prior to high-dimensional regression, reasoning over neural network depth, and sharing of statistical strength for few-shot learning.

artificial intelligence, machine learning, predcp, (15 more...)

2006.10801

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States (0.14)
North America > Canada > Ontario > Toronto (0.14)
(3 more...)

Genre: Research Report (1.00)

Industry: Health & Medicine (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.67)

Differentially Private (Gradient) Expectation Maximization Algorithm with Statistical Guarantees

Wang, Di, Ding, Jiahao, Xie, Zejun, Pan, Miao, Xu, Jinhui

(Gradient) Expectation Maximization (EM) is a widely used algorithm for estimating the maximum likelihood of mixture models or incomplete data problems. A major challenge facing this popular technique is how to effectively preserve the privacy of sensitive data. Previous research on this problem has already lead to the discovery of some Differentially Private (DP) algorithms for (Gradient) EM. However, unlike in the non-private case, existing techniques are not yet able to provide finite sample statistical guarantees. To address this issue, we propose in this paper the first DP version of (Gradient) EM algorithm with statistical guarantees. Moreover, we apply our general framework to three canonical models: Gaussian Mixture Model (GMM), Mixture of Regressions Model (MRM) and Linear Regression with Missing Covariates (RMC). Specifically, for GMM in the DP model, our estimation error is near optimal in some cases. For the other two models, we provide the first finite sample statistical guarantees. Our theory is supported by thorough numerical experiments.

algorithm, artificial intelligence, machine learning, (16 more...)

2010.1352

Country:

North America > United States > New York (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Asia > China (0.04)

Genre: Research Report (1.00)

Industry:

Information Technology > Security & Privacy (0.46)
Health & Medicine (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.68)

Transfer Learning in Large-scale Gaussian Graphical Models with False Discovery Rate Control

Li, Sai, Cai, T. Tony, Li, Hongzhe

Gaussian graphical models (GGMs), which represent the dependence structure among a set of random variables, have been widely used to model the conditional dependence relationships in many applications, including gene regulatory networks and brain connectivity maps (Drton and Maathuis, 2017; Varoquaux et al., 2010; Zhao et al., 2014; Glymour et al., 2019). In the classical setting with data from a single study, the estimation of high-dimensional GGMs has been well studied in a series of papers, including penalized likelihood methods (Yuan and Lin, 2007; Lam and Fan, 2009; Friedman et al., 2008; Rothman et al., 2008) and convex optimization based methods (Cai et al., 2011, 2016; Liu and Wang, 2017). The minimax optimal rates are studied in Cai et al. (2016) and a review can be found in Cai (2017). Liu (2013) considers the inference in GGMs based on a node-wise regression approach and Ren et al. (2015) studies the estimation optimality and inference for individual entries. Methods for estimating a single GGM have also been extended to simultaneously estimating multiple graphs when data from multiple studies are available. For example, Guo et al. (2011); Danaher et al. (2014); Cai et al. (2016) consider jointly estimating multiple GGMs with some penalties for inducing common structures among different graphs.

artificial intelligence, machine learning, trans-clime, (18 more...)

2010.11037

Country:

North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.14)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.50)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.48)

Guo, Yiping, Bondell, Howard D.

Conditional Density Estimation via Weighted Logistic Regressions

Compared to the conditional mean as a simple point estimator, the conditional density function is more informative to describe the distributions with multi-modality, asymmetry or heteroskedasticity. In this paper, we propose a novel parametric conditional density estimation method by showing the connection between the general density and the likelihood function of inhomogeneous Poisson process models. The maximum likelihood estimates can be obtained via weighted logistic regressions, and the computation can be significantly relaxed by combining a block-wise alternating maximization scheme and local case-control sampling. We also provide simulation studies for illustration.

artificial intelligence, bayesian inference, machine learning, (14 more...)

2010.10896

Country: North America > Montserrat (0.04)

Genre: Research Report > Experimental Study (0.38)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.75)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.67)

JSRT: James-Stein Regression Tree

Xiang, Xingchun, Tang, Qingtao, Zhang, Huaixuan, Dai, Tao, Li, Jiawei, Xia, Shu-Tao

Regression tree (RT) has been widely used in machine learning and data mining community. Given a target data for prediction, a regression tree is first constructed based on a training dataset before making prediction for each leaf node. In practice, the performance of RT relies heavily on the local mean of samples from an individual node during the tree construction/prediction stage, while neglecting the global information from different nodes, which also plays an important role. To address this issue, we propose a novel regression tree, named James-Stein Regression Tree (JSRT) by considering global information from different nodes. Specifically, we incorporate the global mean information based on James-Stein estimator from different nodes during the construction/predicton stage. Besides, we analyze the generalization error of our method under the mean square error (MSE) metric. Extensive experiments on public benchmark datasets verify the effectiveness and efficiency of our method, and demonstrate the superiority of our method over other RT prediction methods.

artificial intelligence, estimator, machine learning, (19 more...)

2010.09022

Country:

Asia > China > Guangdong Province > Shenzhen (0.05)
Oceania > Australia (0.04)
North America > United States > California > San Francisco County > San Francisco (0.04)
(4 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

Gaussian Gated Linear Networks

Budden, David, Marblestone, Adam, Sezener, Eren, Lattimore, Tor, Wayne, Greg, Veness, Joel

We propose the Gaussian Gated Linear Network (G-GLN), an extension to the recently proposed GLN family of deep neural networks. Instead of using backpropagation to learn features, GLNs have a distributed and local credit assignment mechanism based on optimizing a convex objective. This gives rise to many desirable properties including universality, data-efficient online learning, trivial interpretability and robustness to catastrophic forgetting. We extend the GLN framework from classification to multiple regression and density modelling by generalizing geometric mixing to a product of Gaussian densities. The G-GLN achieves competitive or state-of-the-art performance on several univariate and multivariate regression benchmarks, and we demonstrate its applicability to practical tasks including online contextual bandits and density estimation via denoising.

artificial intelligence, machine learning, neuron, (15 more...)

2006.05964

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(2 more...)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Zhang, Qiong, Chen, Jiahua

Distributed Learning of Finite Gaussian Mixtures

arXiv.org Machine LearningOct-20-2020

Advances in information technology have led to extremely large datasets that are often kept in different storage centers. Existing statistical methods must be adapted to overcome the resulting computational obstacles while retaining statistical validity and efficiency. Split-and-conquer approaches have been applied in many areas, including quantile processes, regression analysis, principal eigenspaces, and exponential families. We study split-and-conquer approaches for the distributed learning of finite Gaussian mixtures. We recommend a reduction strategy and develop an effective MM algorithm. The new estimator is shown to be consistent and retains root-n consistency under some general conditions. Experiments based on simulated and real-world data show that the proposed split-and-conquer approach has comparable statistical performance with the global estimator based on the full dataset, if the latter is feasible. It can even slightly outperform the global estimator if the model assumption does not match the real-world data. It also has better statistical and computational performance than some existing methods.

artificial intelligence, data mining, machine learning, (19 more...)

2010.10412

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > New York (0.04)
North America > United States > Florida > Palm Beach County > Boca Raton (0.04)
(2 more...)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.48)

arXiv.org Machine LearningOct-20-2020

On the Adversarial Robustness of LASSO Based Feature Selection

Li, Fuwei, Lai, Lifeng, Cui, Shuguang

In this paper, we investigate the adversarial robustness of feature selection based on the $\ell_1$ regularized linear regression model, namely LASSO. In the considered model, there is a malicious adversary who can observe the whole dataset, and then will carefully modify the response values or the feature matrix in order to manipulate the selected features. We formulate the modification strategy of the adversary as a bi-level optimization problem. Due to the difficulty of the non-differentiability of the $\ell_1$ norm at the zero point, we reformulate the $\ell_1$ norm regularizer as linear inequality constraints. We employ the interior-point method to solve this reformulated LASSO problem and obtain the gradient information. Then we use the projected gradient descent method to design the modification strategy. In addition, We demonstrate that this method can be extended to other $\ell_1$ based feature selection methods, such as group LASSO and sparse group LASSO. Numerical examples with synthetic and real data illustrate that our method is efficient and effective.

artificial intelligence, machine learning, regression coefficient, (17 more...)

2010.10045

Country:

North America > United States > California > Yolo County > Davis (0.14)
South America > Brazil (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)
(6 more...)

Genre: Research Report (1.00)

Industry:

Health & Medicine (0.46)
Energy (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.73)

Tan, Chang Wei, Bergmeir, Christoph, Petitjean, Francois, Webb, Geoffrey I.

Time Series Extrinsic Regression

arXiv.org Machine LearningOct-20-2020

This paper studies Time Series Extrinsic Regression (TSER): a regression task of which the aim is to learn the relationship between a time series and a continuous scalar variable; a task closely related to time series classification (TSC), which aims to learn the relationship between a time series and a categorical class label. This task generalizes time series forecasting (TSF), relaxing the requirement that the value predicted be a future value of the input series or primarily depend on more recent values. In this paper, we motivate and study this task, and benchmark existing solutions and adaptations of TSC algorithms on a novel archive of 19 TSER datasets which we have assembled. Our results show that the state-of-the-art TSC algorithm Rocket, when adapted for regression, achieves the highest overall accuracy compared to adaptations of other TSC algorithms and state-of-the-art machine learning (ML) algorithms such as XGBoost, Random Forest and Support Vector Regression. More importantly, we show that much research is needed in this field to improve the accuracy of ML models. We also find evidence that further research has excellent prospects of improving upon these straightforward baselines.

algorithm, artificial intelligence, machine learning, (14 more...)

2006.12672

Country:

Europe > Italy (0.14)
Oceania > Australia > Victoria > Melbourne (0.04)
North America > United States > Wisconsin (0.04)
(3 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Health & Medicine > Diagnostic Medicine (1.00)
Health & Medicine > Health Care Technology (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)

#artificialintelligenceOct-19-2020, 16:35:28 GMT

How to Explain Key Machine Learning Algorithms at an Interview - KDnuggets

Linear Regression involves finding a'line of best fit' that represents a dataset using the least squares method. The least squares method involves finding a linear equation that minimizes the sum of squared residuals. A residual is equal to the actual minus predicted value. To give an example, the red line is a better line of best fit than the green line because it is closer to the points, and thus, the residuals are smaller. Ridge regression, also known as L2 Regularization, is a regression technique that introduces a small amount of bias to reduce overfitting.

artificial intelligence, decision tree, machine learning, (16 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.93)