AITopics

2009.00606

Country:

Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.04)
North America > United States > Wisconsin > Dane County > Madison (0.04)
North America > United States > New Mexico (0.04)
North America > United States > Iowa (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)

Das, Sarkar Snigdha Sarathi, Ali, Mohammed Eunus, Li, Yuan-Fang, Kang, Yong-Bin, Sellis, Timos

Boosting House Price Predictions using Geo-Spatial Network Embedding

arXiv.org Machine LearningSep-1-2020

Real estate contributes significantly to all major economies around the world. In particular, house prices have a direct impact on stakeholders, ranging from house buyers to financing companies. Thus, a plethora of techniques have been developed for real estate price prediction. Most of the existing techniques rely on different house features to build a variety of prediction models to predict house prices. Perceiving the effect of spatial dependence on house prices, some later works focused on introducing spatial regression models for improving prediction performance. However, they fail to take into account the geo-spatial context of the neighborhood amenities such as how close a house is to a train station, or a highly-ranked school, or a shopping center. Such contextual information may play a vital role in users' interests in a house and thereby has a direct influence on its price. In this paper, we propose to leverage the concept of graph neural networks to capture the geo-spatial context of the neighborhood of a house. In particular, we present a novel method, the Geo-Spatial Network Embedding (GSNE), that learns the embeddings of houses and various types of Points of Interest (POIs) in the form of multipartite networks, where the houses and the POIs are represented as attributed nodes and the relationships between them as edges. Extensive experiments with a large number of regression techniques show that the embeddings produced by our proposed GSNE technique consistently and significantly improve the performance of the house price prediction task regardless of the downstream regression model.

artificial intelligence, machine learning, price prediction, (15 more...)

2009.00254

Country:

Oceania > Australia > Victoria > Melbourne (0.04)
Europe > Poland (0.04)
Oceania > New Zealand (0.04)
(3 more...)

Genre:

Research Report > Promising Solution (0.48)
Research Report > New Finding (0.46)

Industry:

Education (1.00)
Banking & Finance > Real Estate (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

#artificialintelligenceAug-31-2020, 13:20:19 GMT

Linear Regression Coefficients Are Probably Lying to You

Interpreting linear regression coefficients is common to do, because it's so easy. Training a model can be done in a few lines of code, and the results yield statistics that can be stated matter-of-factly: "each additional point on the SAT increases your chances of admission by 0.002%". Whenever you train a linear regression (or logistic regression) model with this intent, be wary: you are treading in dangerous waters. What is linear regression even doing? It multiplies each of the inputs by a value and adds them up -- as an additional degree of freedom, an'intercept' can be added.

artificial intelligence, coefficient, machine learning, (9 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)

#artificialintelligenceAug-31-2020, 08:06:14 GMT

Predicting Car Price: EDA, Regression, Hypothesis Testing

I am predicting the selling price of the car based on various features of the cars, including the present price of the cars. I will be using Multiple Linear Regression for building The model. Let's dive under to understand the variables and use the correlation matrix to make the process easy. Now let's check if we have Outliers in our data. So Rather then removing the outliers values we would like to take log of them.

artificial intelligence, hypothesis, machine learning, (13 more...)

Genre: Research Report > Experimental Study (0.53)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.53)
Information Technology > Artificial Intelligence > Representation & Reasoning > Scientific Discovery (0.43)

#artificialintelligenceAug-31-2020, 08:05:34 GMT

Common Loss functions in machine learning for a Regression model

Machine learning is a pioneer subset of Artificial Intelligence, where Machines learn by itself using the available dataset. For the optimization of any machine learning model, an acceptable loss function must be selected. A Loss function characterizes how well the model performs over the training dataset. Loss functions express the discrepancy between the predictions of the model being trained and also the actual problem instances. If the deviation between predicted result and actual results is too much, then loss function would have a very high value.

artificial intelligence, loss function, machine learning, (18 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.41)

Feng, Dai, Baumgartner, Richard

Random Forest (RF) Kernel for Regression, Classification and Survival

arXiv.org Machine LearningAug-31-2020

Breiman's random forest (RF) can be interpreted as an implicit kernel generator,where the ensuing proximity matrix represents the data-driven RF kernel. Kernel perspective on the RF has been used to develop a principled framework for theoretical investigation of its statistical properties. However, practical utility of the links between kernels and the RF has not been widely explored and systematically evaluated.Focus of our work is investigation of the interplay between kernel methods and the RF. We elucidate the performance and properties of the data driven RF kernels used by regularized linear models in a comprehensive simulation study comprising of continuous, binary and survival targets. We show that for continuous and survival targets, the RF kernels are competitive to RF in higher dimensional scenarios with larger number of noisy features. For the binary target, the RF kernel and RF exhibit comparable performance. As the RF kernel asymptotically converges to the Laplace kernel, we included it in our evaluation. For most simulation setups, the RF and RFkernel outperformed the Laplace kernel. Nevertheless, in some cases the Laplace kernel was competitive, showing its potential value for applications. We also provide the results from real life data sets for the regression, classification and survival to illustrate how these insights may be leveraged in practice.Finally, we discuss further extensions of the RF kernels in the context of interpretable prototype and landmarking classification, regression and survival. We outline future line of research for kernels furnished by Bayesian counterparts of the RF.

artificial intelligence, decision tree learning, machine learning, (19 more...)

2009.00089

Country:

Europe > Austria > Vienna (0.14)
North America > United States > California (0.05)
North America > United States > Iowa (0.04)
(2 more...)

Genre: Research Report (0.82)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.62)

arXiv.org Machine LearningAug-31-2020

Causal Inference in Possibly Nonlinear Factor Models

Feng, Yingjie

This paper develops a general causal inference method for treatment effects models under selection on unobservables. A large set of covariates that admits an unknown, possibly nonlinear factor structure is exploited to control for the latent confounders. The key building block is a local principal subspace approximation procedure that combines $K$-nearest neighbors matching and principal component analysis. Estimators of many causal parameters, including average treatment effects and counterfactual distributions, are constructed based on doubly-robust score functions. Large-sample properties of these estimators are established, which only require relatively mild conditions on the principal subspace approximation. The results are illustrated with an empirical application studying the effect of political connections on stock returns of financial firms, and a Monte Carlo experiment. The main technical and methodological results regarding the general local principal subspace approximation method may be of independent interest.

artificial intelligence, machine learning, sa-2, (18 more...)

2008.13651

Country:

North America > United States > New York (0.04)
North America > United States > Michigan (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(5 more...)

Genre:

Research Report > Experimental Study (0.67)
Research Report > New Finding (0.67)

Industry: Banking & Finance (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.67)

#artificialintelligenceAug-28-2020, 16:01:01 GMT

Heart Disease predictions using Logistic Regression – Sushrut Tendulkar

The main purpose of this post is to explore the different ways in which Logistic Regression can be applied to the dataset and hence understanding how the model actually works. The idea is not to solve the problem itself. This post doesn't focus on getting best score using different models however it assumes that there's only one model available for use. This is part of the series of posts to learn and share the details of Logistic Regression. If you're new to this kindly refer my earlier posts on the same topic: The data set has different features like Demographics, Behavioural which includes current smoker, cigarettes per day and Medical history and our task is to predict if the person has 10 year risk of coronary heart disease.

artificial intelligence, dataset, machine learning, (4 more...)

Genre:

Research Report > New Finding (0.89)
Research Report > Experimental Study (0.89)

Industry: Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)

Coulombe, Philippe Goulet, Leroux, Maxime, Stevanovic, Dalibor, Surprenant, Stéphane

How is Machine Learning Useful for Macroeconomic Forecasting?

arXiv.org Machine LearningAug-28-2020

We move beyond "Is Machine Learning Useful for Macroeconomic Forecasting?" by adding the "how". The current forecasting literature has focused on matching specific variables and horizons with a particularly successful algorithm. In contrast, we study the usefulness of the underlying features driving ML gains over standard macroeconometric methods. We distinguish four so-called features (nonlinearities, regularization, cross-validation and alternative loss function) and study their behavior in both the data-rich and data-poor environments. To do so, we design experiments that allow to identify the "treatment" effects of interest. We conclude that (i) nonlinearity is the true game changer for macroeconomic prediction, (ii) the standard factor model remains the best regularization, (iii) K-fold cross-validation is the best practice and (iv) the $L_2$ is preferred to the $\bar \epsilon$-insensitive in-sample loss. The forecasting gains of nonlinear techniques are associated with high macroeconomic uncertainty, financial stress and housing bubble bursts. This suggests that Machine Learning is useful for macroeconomic forecasting by mostly capturing important nonlinearities that arise in the context of uncertainty and financial frictions.

artificial intelligence, k-fold 0, machine learning, (15 more...)

2008.12477

Country:

North America > United States > Illinois > Cook County > Chicago (0.04)
North America > United States > Pennsylvania (0.04)
North America > Canada > Quebec > Montreal (0.04)
(5 more...)

Genre: Research Report > New Finding (0.93)

Industry:

Banking & Finance > Economy (1.00)
Government (0.92)
Banking & Finance > Real Estate (0.65)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.46)

#artificialintelligenceAug-27-2020, 17:21:15 GMT

Linear Regression Algorithm --Under The Hood Math For Non-Mathematicians

Step 1: We will use the python package NumPy for working with a sample dataset and Matplotlib to plot various graphs for visualisation. Step 2: Let us consider a simple scenario where a single input /independent variable controls the outcome/dependent variable value. In the code below, we have declared two NumPy arrays to hold the values of the independent and dependent variables. Step 3: Let us quickly draw a scatter plot to understand the data points. Our goal is to formulate a linear equation which can predict the dependent variable value with minimum error for an independent/input variable.

artificial intelligence, equation, machine learning, (9 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.90)