AITopics | Regression

Identifying the location of a disturbance and its magnitude is an important component for stable operation of power systems. We study the problem of localizing and estimating a disturbance in the interconnected power system. We take a model-free approach to this problem by using frequency data from generators. Specifically, we develop a logistic regression based method for localization and a linear regression based method for estimation of the magnitude of disturbance. Our model-free approach does not require the knowledge of system parameters such as inertia constants and topology, and is shown to achieve highly accurate localization and estimation performance even in the presence of measurement noise and missing data.

artificial intelligence, disturbance, machine learning, (16 more...)

arXiv.org Machine Learning

1806.01318

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)

Genre: Research Report > New Finding (0.50)

Industry:

Energy > Power Industry (1.00)
Machinery > Industrial Machinery (0.85)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.72)

Add feedback

Similarity encoding for learning with dirty categorical variables

Cerda, Patricio, Varoquaux, Gaël, Kégl, Balázs

arXiv.org Machine LearningJun-4-2018

For statistical learning, categorical variables in a table are usually considered as discrete entities and encoded separately to feature vectors, e.g., with one-hot encoding. "Dirty" non-curated data gives rise to categorical variables with a very high cardinality but redundancy: several categories reflect the same entity. In databases, this issue is typically solved with a deduplication step. We show that a simple approach that exposes the redundancy to the learning algorithm brings significant gains. We study a generalization of one-hot encoding, similarity encoding, that builds feature vectors from similarities across categories. We perform a thorough empirical validation on non-curated tables, a problem seldom studied in machine learning. Results on seven real-world datasets show that similarity encoding brings significant gains in prediction in comparison with known encoding methods for categories or strings, notably one-hot encoding and bag of character n-grams. We draw practical recommendations for encoding dirty categories: 3-gram similarity appears to be a good choice to capture morphological resemblance. For very high-cardinality, dimensionality reduction significantly reduces the computational cost with little loss in performance: random projections or choosing a subset of prototype categories still outperforms classic encoding approaches.

artificial intelligence, category, machine learning, (17 more...)

arXiv.org Machine Learning

1806.00979

Country: North America > United States (1.00)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Health Care Providers & Services (0.93)
Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning > Representation Of Examples (0.55)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

Add feedback

Linear Regression

#artificialintelligenceJun-3-2018, 16:51:36 GMT

Predictive models are extremely useful for forecasting future outcomes and estimating metrics that are impractical to measure. For example, data scientists could use predictive models to forecast crop yields based on rainfall and temperature, or to determine whether patients with certain traits are more likely to react badly to a new medication. Before we talk about linear regression specifically, let's remind ourselves what a typical data science workflow might look like. A lot of the time, we'll start with a question we want to answer, and do something like the following: Linear regression is one of the simplest and most common supervised machine learning algorithms that data scientists use for predictive modeling. In this post, we'll use linear regression to build a model that predicts cherry tree volume from metrics that are much easier for folks who study trees to measure. This post is part of our focus on nature data this month.

artificial intelligence, machine learning, predictor variable, (16 more...)

#artificialintelligence

Genre: Workflow (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)

Add feedback

Causal Inference with Noisy and Missing Covariates via Matrix Factorization

Kallus, Nathan, Mao, Xiaojie, Udell, Madeleine

arXiv.org Machine LearningJun-3-2018

Valid causal inference in observational studies often requires controlling for confounders. However, in practice measurements of confounders may be noisy, and can lead to biased estimates of causal effects. We show that we can reduce the bias caused by measurement noise using a large number of noisy measurements of the underlying confounders. We propose the use of matrix factorization to infer the confounders from noisy covariates, a flexible and principled framework that adapts to missing values, accommodates a wide variety of data types, and can augment many causal inference methods. We bound the error for the induced average treatment effect estimator and show it is consistent in a linear regression setting, using Exponential Family Matrix Completion preprocessing. We demonstrate the effectiveness of the proposed procedure in numerical experiments with both synthetic data and real clinical data.

confounder, covariate, matrix factorization, (12 more...)

arXiv.org Machine Learning

1806.00811

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > New York > New York County > New York City (0.04)

Genre:

Research Report > New Finding (0.68)
Research Report > Experimental Study (0.48)

Industry:

Health & Medicine (1.00)
Education (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.50)

Add feedback

Structural Learning of Multivariate Regression Chain Graphs via Decomposition

Javidian, Mohammad Ali, Valtorta, Marco

arXiv.org Artificial IntelligenceJun-3-2018

We extend the decomposition approach for learning Bayesian networks (BN) proposed by (Xie et al., 2006) to learning multivariate regression chain graphs (MVR CGs), which include BNs as a special case. The same advantages of this decomposition approach hold in the more general setting: reduces complexity and increased power of computational independence tests. Moreover, latent (hidden) variables can be represented in MVR CGs by using bidirected edges, and our algorithm correctly recovers any independence structure that is faithful to a MVR CG, thus greatly extending the range of applications of decomposition-based model selection techniques. While our new algorithm has the same complexity as the one in (Xie et al., 2006) for BNs, it requires larger components for general MVR CGs, to insure that sufficient data is present to estimate parameters.

artificial intelligence, graph, machine learning, (15 more...)

arXiv.org Artificial Intelligence

1806.00882

Country: North America > United States > South Carolina (0.28)

Genre: Research Report (0.40)

Industry: Government (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.61)

Add feedback

Analysis of regularized Nystr\"om subsampling for regression functions of low smoothness

Lu, Shuai, Mathé, Peter, Pereverzyev, Sergiy Jr

arXiv.org Machine LearningJun-3-2018

This paper studies a Nystr\"om type subsampling approach to large kernel learning methods in the misspecified case, where the target function is not assumed to belong to the reproducing kernel Hilbert space generated by the underlying kernel. This case is less understood, in spite of its practical importance. To model such a case, the smoothness of target functions is described in terms of general source conditions. It is surprising that almost for the whole range of the source conditions, describing the misspecified case, the corresponding learning rate bounds can be achieved with just one value of the regularization parameter. This observation allows a formulation of mild conditions under which the plain Nystr\"om subsampling can be realized with subquadratic cost maintaining the guaranteed learning rates.

artificial intelligence, machine learning, source condition, (18 more...)

arXiv.org Machine Learning

1806.00826

Country:

Europe > Austria (0.14)
Asia > China (0.14)

Genre: Research Report (0.84)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.51)

Add feedback

The Logistic Regression Algorithm

#artificialintelligenceJun-2-2018, 05:01:34 GMT

Logistic Regression is one of the most used Machine Learning algorithms for binary classification. It is a simple Algorithm that you can use as a performance baseline, it is easy to implement and it will do well enough in many tasks. Therefore every Machine Learning engineer should be familiar with its concepts. The building block concepts of Logistic Regression can also be helpful in deep learning while building neural networks. In this post, you will learn what Logistic Regression is, how it works, what are advantages and disadvantages and much more.

artificial intelligence, logistic regression, machine learning, (14 more...)

#artificialintelligence

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Health & Medicine (0.30)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.98)

Add feedback

Locally Interpretable Models and Effects based on Supervised Partitioning (LIME-SUP)

Hu, Linwei, Chen, Jie, Nair, Vijayan N., Sudjianto, Agus

arXiv.org Machine LearningJun-2-2018

Supervised Machine Learning (SML) algorithms such as Gradient Boosting, Random Forest, and Neural Networks have become popular in recent years due to their increased predictive performance over traditional statistical methods. This is especially true with large data sets (millions or more observations and hundreds to thousands of predictors). However, the complexity of the SML models makes them opaque and hard to interpret without additional tools. There has been a lot of interest recently in developing global and local diagnostics for interpreting and explaining SML models. In this paper, we propose locally interpretable models and effects based on supervised partitioning (trees) referred to as LIME-SUP. This is in contrast with the KLIME approach that is based on clustering the predictor space. We describe LIME-SUP based on fitting trees to the fitted response (LIM-SUP-R) as well as the derivatives of the fitted response (LIME-SUP-D). We compare the results with KLIME and describe its advantages using simulation and real data.

artificial intelligence, machine learning, partition, (17 more...)

arXiv.org Machine Learning

1806.00663

Genre: Research Report (0.83)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.70)

Add feedback

Signal and Noise Statistics Oblivious Orthogonal Matching Pursuit

Kallummil, Sreejith, Kalyani, Sheetal

arXiv.org Machine LearningJun-2-2018

Orthogonal matching pursuit (OMP) is a widely used algorithm for recovering sparse high dimensional vectors in linear regression models. The optimal performance of OMP requires \textit{a priori} knowledge of either the sparsity of regression vector or noise statistics. Both these statistics are rarely known \textit{a priori} and are very difficult to estimate. In this paper, we present a novel technique called residual ratio thresholding (RRT) to operate OMP without any \textit{a priori} knowledge of sparsity and noise statistics and establish finite sample and large sample support recovery guarantees for the same. Both analytical results and numerical simulations in real and synthetic data sets indicate that RRT has a performance comparable to OMP with \textit{a priori} knowledge of sparsity and noise statistics.

artificial intelligence, machine learning, rrt, (16 more...)

arXiv.org Machine Learning

1806.0065

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.86)

Add feedback

Using Linear Regression for Predictive Modeling in R

@machinelearnbotJun-1-2018, 14:50:19 GMT

Predictive models are extremely useful for forecasting future outcomes and estimating metrics that are impractical to measure. For example, data scientists could use predictive models to forecast crop yields based on rainfall and temperature, or to determine whether patients with certain traits are more likely to react badly to a new medication. Before we talk about linear regression specifically, let's remind ourselves what a typical data science workflow might look like. A lot of the time, we'll start with a question we want to answer, and do something like the following: Linear regression is one of the simplest and most common supervised machine learning algorithms that data scientists use for predictive modeling. In this post, we'll use linear regression to build a model that predicts cherry tree volume from metrics that are much easier for folks who study trees to measure.

artificial intelligence, hypothesis, machine learning, (12 more...)

@machinelearnbot

Genre: Workflow (0.52)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)

Add feedback