AITopics | Regression

Collaborating Authors

Regression

News Overviews Instructional Materials AI-Alerts Classics

Logistic Regression and Maximum Entropy explained with examples and code

#artificialintelligenceMay-12-2016, 04:45:33 GMT

Logistic Regression is one of the most powerful classification methods within machine learning and can be used for a wide variety of tasks. Think of pre-policing or predictive analytics in health; it can be used to aid tuberculosis patients, aid breast cancer diagnosis, etc. Think of modeling urban growth, analysing mortgage pre-payments and defaults, forecasting the direction and strength of stock market movement, and even predicting sport outcomes. Reading all of this, the theory[1] of Maximum Entropy Classification might look difficult. In my experience, the average Developer does not believe they can design a proper Maximum Entropy / Logistic Regression Classifier from scratch. I strongly disagree: not only is the mathematics behind is relatively simple, it can also be implemented with a few lines of code.

artificial intelligence, logistic regression and maximum entropy, machine learning, (2 more...)

#artificialintelligence

Genre:

Research Report > New Finding (0.88)
Research Report > Experimental Study (0.88)

Industry: Health & Medicine > Therapeutic Area > Oncology (0.62)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.40)

Add feedback

Ordinal regression - Wikipedia, the free encyclopedia

#artificialintelligenceMay-10-2016, 21:40:21 GMT

In statistics, ordinal regression (also called "ordinal classification") is a type of regression analysis used for predicting an ordinal variable, i.e. a variable whose value exists on an arbitrary scale where only the relative ordering between different values is significant. It can be considered an intermediate problem in between (metric) regression and classification.[1] Ordinal regression turns up often in the social sciences, for example in the modeling of human levels of preference (on a scale from, say, 1–5 for "very poor" through "excellent"), as well as in information retrieval. In machine learning, ordinal regression may also be called ranking learning.[2][a] Ordinal regression can be performed using a generalized linear model (GLM) that fits both a coefficient vector and a set of thresholds to a dataset.

artificial intelligence, machine learning, regression, (9 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.37)

Add feedback

A Complete Tutorial to learn Data Science in R from Scratch

#artificialintelligenceMay-10-2016, 19:56:37 GMT

Adjusted R² measures the goodness of fit of a regression model. Higher the R², better is the model.

artificial intelligence, machine learning, vector, (19 more...)

#artificialintelligence

Genre: Instructional Material > Course Syllabus & Notes (0.68)

Industry: Consumer Products & Services > Food, Beverage, Tobacco & Cannabis (0.46)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.48)

Add feedback

Scaling_synthesized_data

#artificialintelligenceMay-10-2016, 18:21:33 GMT

In particular, I checked out the k-Nearest Neighbors (k-NN) and logistic regression algorithms and saw how scaling numerical data strongly influenced the performance of the former but not that of the latter, as measured, for example, by accuracy (see Glossary below or previous articles for definitions of scaling, k-NN and other relevant terms). The real take home message here was that preprocessing doesn't occur in a vacuum, that is, you can prepocess the heck out of your data but the proof is in the pudding: how well does your model then perform? Scaling numerical data (that is, multiplying all instances of a variable by a constant in order to change that variable's range) has two related purposes: i) if your measurements are in meters and mine are in miles, then, if we both scale our data, they end up being the same & ii) if two variables have vastly different ranges, the one with the larger range may dominate your predictive model, even though it may be less important to your target variable than the variable with the smaller range. What we saw is that this problem identified in ii) occurs with k-NN, which explicitly looks at how close data are to one another but not in logistic regression which, when being trained, will shrink the relevant coefficient to account for the lack of scaling. As the data we used in the previous articles was real-world data, all we could see was how the models performed before and after scaling.

artificial intelligence, machine learning, target variable, (1 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.81)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.58)

Add feedback

Bayesian Optimization for Hyperparameter Tuning - Arimo

#artificialintelligenceMay-9-2016, 02:25:26 GMT

Bayesian Optimization helped us find a hyperparameter configuration that is better than the one found by Random Search for a neural network on the San Francisco Crimes dataset. People who are familiar with Machine Learning might want to fast forward to Section 3 for details. The code to reproduce the experiments can be found here. Hyperparameter tuning may be one of the most tricky, yet interesting, topics in Machine Learning. For most Machine Learning practitioners, mastering the art of tuning hyperparameters requires not only a solid background in Machine Learning algorithms, but also extensive experience working with real-world datasets.

artificial intelligence, machine learning, optimization problem, (15 more...)

#artificialintelligence

Country: North America > United States > California > San Francisco County > San Francisco (0.26)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.53)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.31)

Add feedback

Learning the kernel matrix via predictive low-rank approximations

Stražar, Martin, Curk, Tomaž

arXiv.org Machine LearningMay-9-2016

Efficient and accurate low-rank approximations of multiple data sources are essential in the era of big data. The scaling of kernel-based learning algorithms to large datasets is limited by the O(n^2) computation and storage complexity of the full kernel matrix, which is required by most of the recent kernel learning algorithms. We present the Mklaren algorithm to approximate multiple kernel matrices learn a regression model, which is entirely based on geometrical concepts. The algorithm does not require access to full kernel matrices yet it accounts for the correlations between all kernels. It uses Incomplete Cholesky decomposition, where pivot selection is based on least-angle regression in the combined, low-dimensional feature space. The algorithm has linear complexity in the number of data points and kernels. When explicit feature space induced by the kernel can be constructed, a mapping from the dual to the primal Ridge regression weights is used for model interpretation. The Mklaren algorithm was tested on eight standard regression datasets. It outperforms contemporary kernel matrix approximation approaches when learning with multiple kernels. It identifies relevant kernels, achieving highest explained variance than other multiple kernel learning methods for the same number of iterations. Test accuracy, equivalent to the one using full kernel matrices, was achieved with at significantly lower approximation ranks. A difference in run times of two orders of magnitude was observed when either the number of samples or kernels exceeds 3000.

approximation, kernel, kernel matrix, (16 more...)

arXiv.org Machine Learning

doi: 10.1016/j.neucom.2019.02.030

1601.04366

Country:

Asia > Middle East > Jordan (0.04)
Europe > Slovenia > Central Slovenia > Municipality of Ljubljana > Ljubljana (0.04)
North America > Canada > Ontario > Toronto (0.04)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)

Add feedback

Identification of refugee influx patterns in Greece via model-theoretic analysis of daily arrivals

Georgiou, Harris V.

arXiv.org Machine LearningMay-9-2016

The refugee crisis is perhaps the single most challenging problem for Europe today. Hundreds of thousands of people have already traveled across dangerous sea passages from Turkish shores to Greek islands, resulting in thousands of dead and missing, despite the best rescue efforts from both sides. One of the main reasons is the total lack of any early warning-alerting system, which could provide some preparation time for the prompt and effective deployment of resources at the hot zones. This work is such an attempt for a systemic analysis of the refugee influx in Greece, aiming at (a) the statistical and signal-level characterization of the smuggling networks and (b) the formulation and preliminary assessment of such models for predictive purposes, i.e., as the basis of such an early warning-alerting protocol. To our knowledge, this is the first-ever attempt to design such a system, since this refugee crisis itself and its geographical properties are unique (intense event handling, little or no warning). The analysis employs a wide range of statistical, signal-based and matrix factorization (decomposition) techniques, including linear & linear-cosine regression, spectral analysis, ARMA, SVD, Probabilistic PCA, ICA, K-SVD for Dictionary Learning, as well as fractal dimension analysis. It is established that the behavioral patterns of the smuggling networks closely match (as expected) the regular burst and pause periods of store-and-forward networks in digital communications. There are also major periodic trends in the range of 6.2-6.5 days and strong correlations in lags of four or more days, with distinct preference in the Sunday-Monday 48-hour time frame. These results show that such models can be used successfully for short-term forecasting of the influx intensity, producing an invaluable operational asset for planners, decision-makers and first-responders.

approximation, daily arrival, data series, (16 more...)

arXiv.org Machine Learning

1605.02784

Country:

Asia > Middle East > Republic of Türkiye (0.05)
Atlantic Ocean > Mediterranean Sea > Aegean Sea (0.05)
Africa > Middle East > Libya (0.04)
(15 more...)

Genre: Research Report > New Finding (0.67)

Industry:

Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Government > Regional Government (1.00)
Government > Immigration & Customs (1.00)

Technology:

Information Technology > Software (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications (1.00)
(3 more...)

Add feedback

Communication Lower Bounds for Statistical Estimation Problems via a Distributed Data Processing Inequality

Braverman, Mark, Garg, Ankit, Ma, Tengyu, Nguyen, Huy L., Woodruff, David P.

arXiv.org Machine LearningMay-9-2016

We study the tradeoff between the statistical error and communication cost of distributed statistical estimation problems in high dimensions. In the distributed sparse Gaussian mean estimation problem, each of the $m$ machines receives $n$ data points from a $d$-dimensional Gaussian distribution with unknown mean $\theta$ which is promised to be $k$-sparse. The machines communicate by message passing and aim to estimate the mean $\theta$. We provide a tight (up to logarithmic factors) tradeoff between the estimation error and the number of bits communicated between the machines. This directly leads to a lower bound for the distributed \textit{sparse linear regression} problem: to achieve the statistical minimax error, the total communication is at least $\Omega(\min\{n,d\}m)$, where $n$ is the number of observations that each machine receives and $d$ is the ambient dimension. These lower results improve upon [Sha14,SD'14] by allowing multi-round iterative communication model. We also give the first optimal simultaneous protocol in the dense case for mean estimation. As our main technique, we prove a \textit{distributed data processing inequality}, as a generalization of usual data processing inequalities, which might be of independent interest and useful for other problems.

artificial intelligence, inequality, machine learning, (17 more...)

arXiv.org Machine Learning

1506.07216

Country:

North America > United States (0.45)
North America > Canada (0.28)

Genre: Research Report (0.50)

Industry: Information Technology > Software (0.81)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.34)

Add feedback

Regression, Logistic Regression and Maximum Entropy part 2 (code examples) – Ahmet Taspinar

#artificialintelligenceMay-8-2016, 10:35:41 GMT

In the previous blog we have seen the theory and mathematics behind the Maximum Entropy and Logistic Regression Classifiers. Logistic Regression is one of the most powerful classification methods within machine learning and can be used for a wide variety of tasks. Think of pre-policing or predictive analytics in health; it can be used to aid tuberculosis patients, aid breast cancer diagnosis, etc. Think of modeling urban growth, analysing mortgage pre-payments and defaults, forecasting the direction and strength of stock market movement, and even sports. Reading all of this, the theory[1] of Maximum Entropy Classification might look difficult. In my experience, the average Developer does not believe they can design a proper Maximum Entropy / Logistic Regression Classifier from scratch.

artificial intelligence, logistic regression, machine learning, (14 more...)

#artificialintelligence

Country: North America > United States > Massachusetts (0.05)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Health & Medicine > Therapeutic Area > Oncology (0.55)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.31)

Add feedback

Tweet Acts: A Speech Act Classifier for Twitter

Vosoughi, Soroush (Massachusetts Institute of Technology) | Roy, Deb (Massachusetts Institute of Technology)

AAAI ConferencesMay-8-2016

Speech acts are a way to conceptualize speech as action. This holds true for communication on any platform, including social media platforms such as Twitter. In this paper, we explored speech act recognition on Twitter by treating it as a multi-class classification problem. We created a taxonomy of six speech acts for Twitter and proposed a set of semantic and syntactic features. We trained and tested a logistic regression classifier using a data set of manually labelled tweets. Our method achieved a state-of-the-art performance with an average F1 score of more than 0.70. We also explored classifiers with three different granularities (Twitter-wide, type-specific and topic-specific) in order to find the right balance between generalization and overfitting for our task.

machine learning, social media, speech act classifier, (3 more...)

AAAI Conferences

Tenth International AAAI Conference on Web and Social Media

Industry: Information Technology > Services (0.60)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.87)

Add feedback