AITopics

2003.10113

Genre: Research Report > Experimental Study (0.34)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Data Science > Data Mining > Big Data (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.34)

Spyromitros-Xioufis, Eleftherios, Sechidis, Konstantinos, Vlahavas, Ioannis

Multi-target regression via output space quantization

arXiv.org Machine LearningMar-22-2020

Multi-target regression is concerned with the prediction of multiple continuous target variables using a shared set of predictors. Two key challenges in multi-target regression are: (a) modelling target dependencies and (b) scalability to large output spaces. In this paper, a new multi-target regression method is proposed that tries to jointly address these challenges via a novel problem transformation approach. The proposed method, called MRQ, is based on the idea of quantizing the output space in order to transform the multiple continuous targets into one or more discrete ones. Learning on the transformed output space naturally enables modeling of target dependencies while the quantization strategy can be flexibly parameterized to control the trade-off between prediction accuracy and computational efficiency. Experiments on a large collection of benchmark datasets show that MRQ is both highly scalable and also competitive with the state-of-the-art in terms of accuracy. In particular, an ensemble version of MRQ obtains the best overall accuracy, while being an order of magnitude faster than the runner up method.

dataset, mrq, regression, (14 more...)

2003.09896

Country:

North America > United States > Wisconsin > Dane County > Madison (0.04)
North America > United States > District of Columbia > Washington (0.04)
North America > United States > Colorado > Denver County > Denver (0.04)
(11 more...)

Genre: Research Report (1.00)

Industry: Energy (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)

arXiv.org Machine LearningMar-21-2020

BoostTree and BoostForest for Ensemble Learning

Zhao, Changming, Wu, Dongrui, Huang, Jian, Yuan, Ye, Zhang, Hai-Tao

Bootstrap aggregation (Bagging) and boosting are two popular ensemble learning approaches, which combine multiple base learners to generate a composite learner. This article proposes BoostForest, which is an ensemble learning approach using BoostTree as base learners and can be used for both classification and regression. BoostTree constructs a tree by gradient boosting, which trains a linear or nonlinear model at each node. When a new sample comes in, BoostTree first sorts it down to a leaf, then computes the final prediction by summing up the outputs of all models along the path from the root node to that leaf. BoostTree achieves high randomness (diversity) by sampling its parameters randomly from a parameter pool, and selecting a subset of features randomly at node splitting. BoostForest further increases the randomness by bootstrapping the training data in constructing different BoostTrees. BoostForest is compared with four classical ensemble learning approaches on 30 classification and regression datasets, demonstrating that it can generate more accurate and more robust composite learners.

boosttree, dataset, node, (13 more...)

2003.09737

Country:

North America > Canada > Quebec > Montreal (0.04)
North America > United States > Florida > Palm Beach County > Boca Raton (0.04)
Oceania > Australia > Tasmania (0.04)
(10 more...)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.53)

arXiv.org Artificial IntelligenceMar-21-2020

NeuCrowd: Neural Sampling Network for Representation Learning with Crowdsourced Labels

Hao, Yang, Ding, Wenbiao, Liu, Zitao

Representation learning approaches require a massive amount of discriminative training data, which is unavailable in many scenarios, such as healthcare, small city, education, etc. In practice, people refer to crowdsourcing to get annotated labels. However, due to issues like data privacy, budget limitation, shortage of domain-specific annotators, the number of crowdsourced labels are still very limited. Moreover, because of annotators' diverse expertises, crowdsourced labels are often inconsistent. Thus, directly applying existing representation learning algorithms may easily get the overfitting problem and yield suboptimal solutions. In this paper, we propose \emph{NeuCrowd}, a unified framework for representation learning from crowdsourced labels. The proposed framework (1) creates a sufficient number of high-quality \emph{n}-tuplet training samples by utilizing safety-aware sampling and robust anchor generation; and (2) automatically learns a neural sampling network that adaptively learns to select effective samples for representation learning network. The proposed framework is evaluated on both synthetic and real-world data sets. The results show that our approach outperforms a wide range of state-of-the-art baselines in terms of prediction accuracy and AUC\footnote{To encourage the reproducible results, we make our code public on a github repository, i.e., \url{https://github.com/crowd-data-mining/NeuCrowd}}.

anchor, crowdsourced label, representation, (15 more...)

arXiv.org Artificial Intelligence

2003.0966

Country:

Asia > China > Beijing > Beijing (0.04)
Asia > Japan > Shikoku > Kagawa Prefecture > Takamatsu (0.04)

Genre: Research Report > New Finding (0.48)

Industry:

Education (0.94)
Information Technology > Security & Privacy (0.68)

Technology:

Information Technology > Communications > Social Media > Crowdsourcing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

#artificialintelligenceMar-20-2020, 23:31:56 GMT

Beginning with Machine Learning & Data Science in Python

Link: best udemy course Beginning with Machine Learning & Data Science in Python Fundamentals of Data Science: Exploratory Data Analysis (EDA), Regression (Linear & logistic), Visualization, Basic ML by UNP United Network of Professionals What you'll learn You will be able to apply data science algorithms for solving industry problems You will have a clear understanding of industry standards and best practices for predictive model building You will be able to derive key insights from data using exploratory data analysis techniques You will be able to efficiently handle data in a structured way using Pandas You will have a strong foundation of linear regression, multiple regression and logistic regression You will be able to use python scikit-learn for building different types of regression models You will be able to use cross validation techniques for comparing models, select parameters You will know about common pitfalls in modeling like over-fitting, bias-variance trade off etc.. You will be able to regularize models for reliable predictions Description 85% of data science problems are solved using exploratory data analysis (EDA), visualization, regression (linear & logistic). Naturally, 85% of the interview questions comes from these topics as well. This is a concise course created by UNP to focus on what matter most. This course will help you create a solid foundation of the essential topics of data science.

foundation, machine learning & data science, regression, (10 more...)

#artificialintelligence

Genre: Instructional Material > Course Syllabus & Notes (1.00)

Industry: Education (0.59)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)

Jézéquel, Rémi, Gaillard, Pierre, Rudi, Alessandro

Efficient improper learning for online logistic regression

arXiv.org Machine LearningMar-20-2020

We consider the setting of online logistic regression and consider the regret with respect to the l 2 -ball of radius B. It is known (see [Hazan et al., 2014]) that any proper algorithm which has logarithmic regret in the number of samples (denoted n) necessarily suffers an exponential multiplicative constant in B. In this work, we design an efficient improper algorithm that avoids this exponential constant while preserving a logarithmic regret. Indeed, [Foster et al., 2018] showed that the lower bound does not apply to improper algorithms and proposed a strategy based on exponential weights with prohibitive computational complexity. Our new algorithm based on regularized empirical risk minimization with surrogate losses satisfies a regret scaling as O(B log(Bn)) with a per-round time-complexity of order O(d 2).

algorithm, online logistic regression, regularization, (13 more...)

2003.08109

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > France > Île-de-France > Paris > Paris (0.04)

Genre: Research Report > New Finding (0.36)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.72)

Adil, Deeksha, Peng, Richard, Sachdeva, Sushant

Fast, Provably convergent IRLS Algorithm for p-norm Linear Regression

Neural Information Processing SystemsMar-19-2020, 02:31:29 GMT

Linear regression in L_p-norm is a canonical optimization problem that arises in several applications, including sparse recovery, semi-supervised learning, and signal processing. Generic convex optimization algorithms for solving L_p-regression are slow in practice. Iteratively Reweighted Least Squares (IRLS) is an easy to implement family of algorithms for solving these problems that has been studied for over 50 years. However, these algorithms often diverge for p 3, and since the work of Osborne (1985), it has been an open problem whether there is an IRLS algorithm that converges for p 3. We propose p-IRLS, the first IRLS algorithm that provably converges geometrically for any p \in [2,\infty). Our algorithm is simple to implement and is guaranteed to find a high accuracy solution in a sub-linear number of iterations. Our experiments demonstrate that it performs even better than our theoretical bounds, beats the standard Matlab/CVX implementation for solving these problems by 10–50x, and is the fastest among available implementations in the high-accuracy regime.

algorithm, p-norm linear regression, provably convergent irl algorithm, (1 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.65)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.64)

Neural Information Processing SystemsMar-19-2020, 01:48:16 GMT

Causal Regularization

Janzing, Dominik

We argue that regularizing terms in standard regression methods not only help against overfitting finite data, but sometimes also help in getting better causal models. We first consider a multi-dimensional variable linearly influencing a target variable with some multi-dimensional unobserved common cause, where the confounding effect can be decreased by keeping the penalizing term in Ridge and Lasso regression even in the population limit. The reason is a close analogy between overfitting and confounding observed for our toy model. In the case of overfitting, we can choose regularization constants via cross validation, but here we choose the regularization constant by first estimating the strength of confounding, which yielded reasonable results for simulated and real data. Further, we show a'causal generalization bound' which states (subject to our particular model of confounding) that the error made by interpreting any non-linear regression as causal model can be bounded from above whenever functions are taken from a not too rich class.

causal model, causal regularization, regularization constant

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)

Russac, Yoan, Vernade, Claire, Cappé, Olivier

Weighted Linear Bandits for Non-Stationary Environments

Neural Information Processing SystemsMar-19-2020, 01:32:52 GMT

We consider a stochastic linear bandit model in which the available actions correspond to arbitrary context vectors whose associated rewards follow a non-stationary linear regression model. In this setting, the unknown regression parameter is allowed to vary in time. To address this problem, we propose D-LinUCB, a novel optimistic algorithm based on discounted linear regression, where exponential weights are used to smoothly forget the past. This involves studying the deviations of the sequential weighted least-squares estimator under generic assumptions. As a by-product, we obtain novel deviation results that can be used beyond non-stationary environments.

d-linucb, non-stationary environment, weighted linear bandit

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.88)

Salehi, Fariborz, Abbasi, Ehsan, Hassibi, Babak

The Impact of Regularization on High-dimensional Logistic Regression

Neural Information Processing SystemsMar-19-2020, 01:32:40 GMT

Logistic regression is commonly used for modeling dichotomous outcomes. In the classical setting, where the number of observations is much larger than the number of parameters, properties of the maximum likelihood estimator in logistic regression are well understood. Recently, Sur and Candes \cite{sur2018modern} have studied logistic regression in the high-dimensional regime, where the number of observations and parameters are comparable, and show, among other things, that the maximum likelihood estimator is biased. In the high-dimensional regime the underlying parameter vector is often structured (sparse, block-sparse, finite-alphabet, etc.) and so in this paper we study regularized logistic regression (RLR), where a convex regularizer that encourages the desired structure is added to the negative of the log-likelihood function. An advantage of RLR is that it allows parameter recovery even for instances where the (unconstrained) maximum likelihood estimate does not exist.

high-dimensional logistic regression, maximum likelihood estimator, regularization, (3 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)