Goto

Collaborating Authors

 Regression


The decoupled extended Kalman filter for dynamic exponential-family factorization models

arXiv.org Machine Learning

We specialize the decoupled extended Kalman filter (DEKF) for online parameter learning in factorization models, including factorization machines, matrix and tensor factorization, and illustrate the effectiveness of the approach through simulations. Learning model parameters through the DEKF makes factorization models more broadly useful by allowing for more flexible observations through the entire exponential family, modeling parameter drift, and producing parameter uncertainty estimates that can enable explore/exploit and other applications. We use a more general dynamics of the parameters than the standard DEKF, allowing parameter drift while encouraging reasonable values. We also present an alternate derivation of the regular extended Kalman filter and DEKF that connects these methods to natural gradient methods, and suggests a similarly decoupled version of the iterated extended Kalman filter.


Conditional Sparse $\ell_p$-norm Regression With Optimal Probability

arXiv.org Machine Learning

We consider the following conditional linear regression problem: the task is to identify both (i) a $k$-DNF condition $c$ and (ii) a linear rule $f$ such that the probability of $c$ is (approximately) at least some given bound $\mu$, and $f$ minimizes the $\ell_p$ loss of predicting the target $z$ in the distribution of examples conditioned on $c$. Thus, the task is to identify a portion of the distribution on which a linear rule can provide a good fit. Algorithms for this task are useful in cases where simple, learnable rules only accurately model portions of the distribution. The prior state-of-the-art for such algorithms could only guarantee finding a condition of probability $\Omega(\mu/n^k)$ when a condition of probability $\mu$ exists, and achieved an $O(n^k)$-approximation to the target loss, where $n$ is the number of Boolean attributes. Here, we give efficient algorithms for solving this task with a condition $c$ that nearly matches the probability of the ideal condition, while also improving the approximation to the target loss. We also give an algorithm for finding a $k$-DNF reference class for prediction at a given query point, that obtains a sparse regression fit that has loss within $O(n^k)$ of optimal among all sparse regression parameters and sufficiently large $k$-DNF reference classes containing the query point.


The Science of Where in a Warming Planet: Spatial vs Non-Spatial Machine Learning

#artificialintelligence

The intersection of machine learning and GIS is getting broader as we ask increasingly pragmatic questions related to complex spatial phenomena. Whether it is predicting traffic patterns in L.A. or the probability of being hit by the next big storm, we need answers to critical questions to make impactful decisions. In this blog, we'll explore an essential component needed towards answering such a question: what will the future climate be in U.S.? This question requires calibrating a global climate model with spatially-limited local temperature measurements. In a planet that is constantly warming, calibrating global climate models is vital to answer questions ranging from what will the average temperature be in Redlands in November 2050 to which Canadian cities will be wine country in the future.


Mimic and Classify : A meta-algorithm for Conditional Independence Testing

arXiv.org Machine Learning

Given independent samples generated from the joint distribution $p(\mathbf{x},\mathbf{y},\mathbf{z})$, we study the problem of Conditional Independence (CI-Testing), i.e., whether the joint equals the CI distribution $p^{CI}(\mathbf{x},\mathbf{y},\mathbf{z})= p(\mathbf{z}) p(\mathbf{y}|\mathbf{z})p(\mathbf{x}|\mathbf{z})$ or not. We cast this problem under the purview of the proposed, provable meta-algorithm, "Mimic and Classify", which is realized in two-steps: (a) Mimic the CI distribution close enough to recover the support, and (b) Classify to distinguish the joint and the CI distribution. Thus, as long as we have a good generative model and a good classifier, we potentially have a sound CI Tester. With this modular paradigm, CI Testing becomes amiable to be handled by state-of-the-art, both generative and classification methods from the modern advances in Deep Learning, which in general can handle issues related to curse of dimensionality and operation in small sample regime. We show intensive numerical experiments on synthetic and real datasets where new mimic methods such conditional GANs, Regression with Neural Nets, outperform the current best CI Testing performance in the literature. Our theoretical results provide analysis on the estimation of null distribution as well as allow for general measures, i.e., when either some of the random variables are discrete and some are continuous or when one or more of them are discrete-continuous mixtures.


Multivariate Regression with Neural Networks: Unique, Exact and Generic Models

#artificialintelligence

Michael Nielsen provides a visual demonstration in his web book Neural Networks and Deep Learning that a 1-layer deep neural network can match any function . It is just a matter of the number of neurons to get a prediction that is arbitrarily close โ€“ the more the neurons the better the approximation. There is the Universal Approximation Theorem as well that supplies a rigorous proof of the same.But the known issues with overfitting remain and the obtained network model is only good for the range of the training data. That is, if the training data consisted only of inputs with there would be no reason to expect the obtained network model to work outside of that range. This series of posts are about obtaining network models that are unique, generic and exact.


A Recursive PLS (Partial Least Squares) based Approach for Enterprise Threat Management

arXiv.org Artificial Intelligence

Most of the existing solutions to enterprise threat management are preventive approaches prescribing means to prevent policy violations with varying degrees of success. In this paper we consider the complementary scenario where a number of security violations have already occurred, or security threats, or vulnerabilities have been reported and a security administrator needs to generate optimal response to these security events. We present a principled approach to study and model the human expertise in responding to the emergent threats owing to these security events. A recursive Partial Least Squares based adaptive learning model is defined using a factorial analysis of the security events together with a method for estimating the effect of global context dependent semantic information used by the security administrators. Presented model is theoretically optimal and operationally recursive in nature to deal with the set of security events being generated continuously. We discuss the underlying challenges and ways in which the model could be operationalized in centralized versus decentralized, and real-time versus batch processing modes.


Predicting the next Fibonacci number with Linear Regression in TensorFlow.js

#artificialintelligence

Welcome to the first (or 0th) part of the series! Together we will explore the limits of what is possible (and probably impossible) with the current state of using JavaScript for Machine Learning in the browser! The complete source code can be found on GitHub if you want to follow along. Additionally, I've included a gist showing the complete JavaScript code at the end of the post. Here is a link to a Live Demo, you must open your browser console to see the results.


2018 World Cup Predictions using decision trees

#artificialintelligence

In this study, we predict the outcome of the football matches in the FIFA World Cup 2018 to be held in Russia this summer. We do this using classification models over a dataset of historic football results that includes attributes from the playing teams by rating them in attack, midfield, defence, aggression, pressure, chance creation and building ability. This last training data was a result of merging international matches results with AE games ratings of the teams considering the timeline of the matches with their respective statistics. Final predictions show the four countries with the most chances of getting to the semifinals as France, Brazil, Spain and Germany while giving Spain as the winner. The objective of this study is to build a predictive model that will allow us to make good predictions for the coming World Cup 2018 so we looked for dataset with historic data for match results, for this purpose we chose a dataset from Kaggle with data of almost 40,000 international matches played between 1872 and 2018.


Data Science Predicting The Future

#artificialintelligence

Predictive analytics in data science rest on the shoulders of explanatory data analysis, which is precisely what we were discussing in our previous article โ€“ The What, Where and How of Data for Data Science. We talked about data in data science, and how business intelligence (BI) analysts use it to explain the past. In fact, everything is connected. Once the BI reports and dashboards have been prepared and insights โ€“ extracted from them โ€“ this information becomes the basis for predicting future values. And the accuracy of these predictions lies in the methods used.


Non-Parametric Calibration of Probabilistic Regression

arXiv.org Machine Learning

The task of calibration is to retrospectively adjust the outputs from a machine learning model to provide better probability estimates on the target variable. While calibration has been investigated thoroughly in classification, it has not yet been well-established for regression tasks. This paper considers the problem of calibrating a probabilistic regression model to improve the estimated probability densities over the real-valued targets. We propose to calibrate a regression model through the cumulative probability density, which can be derived from calibrating a multi-class classifier. We provide three non-parametric approaches to solve the problem, two of which provide empirical estimates and the third providing smooth density estimates. The proposed approaches are experimentally evaluated to show their ability to improve the performance of regression models on the predictive likelihood.