Goto

Collaborating Authors

 Regression


Comparing Model Evaluation Techniques Part 3: Regression Models

#artificialintelligence

In this post, I'll take a look at how you can compare regression models. Comparing regression models is perhaps one of the trickiest tasks to complete in the "comparing models" arena; The reason is that there are literally dozens of statistics you can calculate to compare regression models, including: This list isn't exhaustive--there are many other tools, tests and plots at your disposal. Rather than discuss the statistics in detail, I chose to focus this post on comparing a few of the most popular regression model evaluation techniques and discuss when you might want to use them (or when you might not want to). The techniques listed below tend to be on the "easier to use and understand" end of the spectrum, so if you're new to model comparison it's a good place to start. The first question you should be asking is: How well do I know my data?


Airbnb Price Prediction Using Machine Learning and Sentiment Analysis

arXiv.org Machine Learning

Pricing a rental property on Airbnb is a challenging task for the owner as it determines the number of customers for the place. On the other hand, customers have to evaluate an offered price with minimal knowledge of an optimal value for the property. This paper aims to develop a reliable price prediction model using machine learning, deep learning, and natural language processing techniques to aid both the property owners and the customers with price evaluation given minimal available information about the property. Features of the rentals, owner characteristics, and the customer reviews will comprise the predictors, and a range of methods from linear regression to tree-based models, support-vector regression (SVR), K-means Clustering (KMC), and neural networks (NNs) will be used for creating the prediction model.


Tackling Multiple Ordinal Regression Problems: Sparse and Deep Multi-Task Learning Approaches

arXiv.org Machine Learning

Many real-world datasets are labeled with natural orders, i.e., ordinal labels. Ordinal regression is a method to predict ordinal labels that finds a wide range of applications in data-rich science domains, such as medical, social and economic sciences. Most existing approaches work well for a single ordinal regression task. However, they ignore the task relatedness when there are multiple related tasks. Multi-task learning (MTL) provides a framework to encode task relatedness, to bridge data from all tasks, and to simultaneously learn multiple related tasks to improve the generalization performance. Even though MTL methods have been extensively studied, there is barely existing work investigating MTL for data with ordinal labels. We tackle multiple ordinal regression problems via sparse and deep multi-task approaches, i.e., two regularized multi-task ordinal regression (RMTOR) models for small datasets and two deep neural networks based multi-task ordinal regression (DMTOR) models for large-scale datasets. The performance of the proposed multi-task ordinal regression models (MTOR) is demonstrated on three real-world medical datasets for multi-stage disease diagnosis. Our experimental results indicate that our proposed MTOR models markedly improve the prediction performance comparing with single-task learning (STL) ordinal regression models.


Linear Regression - Introduction to Machine Learning using Python and Scikit Learn Chapter 6 1

#artificialintelligence

Welcome to the video series on Introduction to Machine Learning with Scikit-Learn. This video contains Chapter - 6.1. In this chapter, I've explained our first Machine Learning algorithm called Linear Regression using just five data points for easy understanding This video describes what is Linear Regression and how we can use the same using Scikit-learn. In context of this algorithm, I've also explained the unified machine learning algorithm and how generic interface can be used for almost all ML algorithms using scikit-Learn Feel free to connect with me @ YouTube: https://www.youtube.com/CodesBay


A Step Towards Machine Learning Algorithms: Univariate Linear Regression

#artificialintelligence

These days the concept of Machine Learning is evolving rapidly. The understanding of it is so vast and open that everyone is having their independent thoughts about it. Here I am putting mine. This blog is my experience with the learning algorithms. In this blog, we will get to know the basic difference between Artificial Intelligence, Machine Learning, and Deep Learning.


Towards meta-learning for multi-target regression problems

arXiv.org Machine Learning

Several multi-target regression methods were devel-oped in the last years aiming at improving predictive performanceby exploring inter-target correlation within the problem. However, none of these methods outperforms the others for all problems. This motivates the development of automatic approachesto recommend the most suitable multi-target regression method. In this paper, we propose a meta-learning system to recommend the best predictive method for a given multi-target regression problem. We performed experiments with a meta-dataset generated by a total of 648 synthetic datasets. These datasets were created to explore distinct inter-targets characteristics toward recommending the most promising method. In experiments, we evaluated four different algorithms with different biases as meta-learners. Our meta-dataset is composed of 58 meta-features, based on: statistical information, correlation characteristics, linear landmarking, from the distribution and smoothness of the data, and has four different meta-labels. Results showed that induced meta-models were able to recommend the best methodfor different base level datasets with a balanced accuracy superior to 70% using a Random Forest meta-model, which statistically outperformed the meta-learning baselines.


Crime Rate Prediction with Region Risk and Movement Patterns

arXiv.org Machine Learning

--The location-based social network, FourSquare, helps us to understand a city's mass human mobility. It provides data that characterises the volume of movements across regions and Places of Interests (POIs) to explore the crime dynamics of a city. T o fully exploit human movement into crime analysis, we propose the region risk factor which combines monthly aggregated crime and human movement of a region across different time intervals. We then derive a number of features using the region risk factor and conduct extensive experiments with real world data in multiple cities that verify the effectiveness of these features. One of the basic demands for every person in society is a safe and secure living space.


Decentralized Stochastic First-Order Methods for Large-scale Machine Learning

arXiv.org Machine Learning

Decentralized consensus-based optimization is a general computational framework where a network of nodes cooperatively minimizes a sum of locally available cost functions via only local computation and communication. In this article, we survey recent advances on this topic, particularly focusing on decentralized, consensus-based, first-order gradient methods for large-scale stochastic optimization. The class of consensus-based stochastic optimization algorithms is communication-efficient, able to exploit data parallelism, robust in random and adversarial environments, and simple to implement, thus providing scalable solutions to a wide range of large-scale machine learning problems. We review different state-of-the-art decentralized stochastic optimization formulations, different variants of consensus-based procedures, and demonstrate how to obtain decentralized counterparts of centralized stochastic first-order methods. We provide several intuitive illustrations of the main technical ideas as well as applications of the algorithms in the context of decentralized training of machine learning models.


Spatial sensitivity analysis for urban land use prediction with physics-constrained conditional generative adversarial networks

arXiv.org Machine Learning

Accurately forecasting urban development and its environmental and climate impacts critically depends on realistic models of the spatial structure of the built environment, and of its dependence on key factors such as population and economic development. Scenario simulation and sensitivity analysis, i.e., predicting how changes in underlying factors at a given location affect urbanization outcomes at other locations, is currently not achievable at a large scale with traditional urban growth models, which are either too simplistic, or depend on detailed locally-collected socioeconomic data that is not available in most places. Here we develop a framework to estimate, purely from globally-available remote-sensing data and without parametric assumptions, the spatial sensitivity of the (\textit{static}) rate of change of urban sprawl to key macroeconomic development indicators. We formulate this spatial regression problem as an image-to-image translation task using conditional generative adversarial networks (GANs), where the gradients necessary for comparative static analysis are provided by the backpropagation algorithm used to train the model. This framework allows to naturally incorporate physical constraints, e.g., the inability to build over water bodies. To validate the spatial structure of model-generated built environment distributions, we use spatial statistics commonly used in urban form analysis. We apply our method to a novel dataset comprising of layers on the built environment, nightlighs measurements (a proxy for economic development and energy use), and population density for the world's most populous 15,000 cities.


Reservoir Computing Models for Patient-Adaptable ECG Monitoring in Wearable Devices

arXiv.org Machine Learning

The reservoir computing paradigm is employed to classify heartbeat anomalies online based on electrocardiogram signals. Inspired by the principles of information processing in the brain, reservoir computing provides a framework to design, train, and analyze recurrent neural networks (RNNs) for processing time-dependent information. Due to its computational efficiency and the fact that training amounts to a simple linear regression, this supervised learning algorithm has been variously considered as a strategy to implement useful computations not only on digital computers but also on emerging unconventional hardware platforms such as neuromorphic microchips. Here, this biological-inspired learning framework is exploited to devise an accurate patient-adaptive model that has the potential to be integrated into wearable cardiac events monitoring devices. The proposed patient-customized model was trained and tested on ECG recordings selected from the MIT-BIH arrhythmia database. Restrictive inclusion criteria were used to conduct the study only on ECGs including, at least, two classes of heartbeats with highly unequal number of instances. The results of extensive simulations showed this model not only provides accurate, cheap and fast patient-customized heartbeat classifier but also circumvents the problem of "imbalanced classes" when the readout weights are trained using weighted ridge-regression.