Goto

Collaborating Authors

 Undirected Networks


Data Science Skills Set โ€“ Cyber Tales โ€“ Medium

#artificialintelligence

This does not want to be an exhaustive list of skills for data scientists because the field is moving at a stellar speed (and a tool that is relevant today might not be relevant in six months). It is rather an attempt to provide an extensive list of skills and tools that are useful in developing data science projects, and of course not owning one of those skills do not preclude a data scientist to be identified as such. Note: the above is an adapted excerpt from my book "Big Data Analytics: A Management Perspective" (Springer, 2016).


Credit Scoring Models - Open Risk Manual

#artificialintelligence

This page aims to be a comprehensive collection of publicly available models and algorithms used for credit scoring. The credit scoring model collection focuses on the classic one period credit assessment / classification problem that typically produces a credit score and/or a probabilistic estimate of credit risk on the basis of selected characteristics of a borrower. Credit scoring models have been used globally for decades and in a variety of contexts. The significant overlap of credit scoring methodology with other statistical disciplines means that the entire arsenal of statistical methods has been available and tried with varying degrees of success, usability and adoption. We identify here some key model attributes that can help categorize the variety of models.


Linear Additive Markov Processes

arXiv.org Machine Learning

We introduce LAMP: the Linear Additive Markov Process. Transitions in LAMP may be influenced by states visited in the distant history of the process, but unlike higher-order Markov processes, LAMP retains an efficient parameterization. LAMP also allows the specific dependence on history to be learned efficiently from data. We characterize some theoretical properties of LAMP, including its steady-state and mixing time. We then give an algorithm based on alternating minimization to learn LAMP models from data. Finally, we perform a series of real-world experiments to show that LAMP is more powerful than first-order Markov processes, and even holds its own against deep sequential models (LSTMs) with a negligible increase in parameter complexity.


Configurable, Photorealistic Image Rendering and Ground Truth Synthesis by Sampling Stochastic Grammars Representing Indoor Scenes

arXiv.org Machine Learning

We propose the configurable rendering of massive quantities of photorealistic images with ground truth for the purposes of training, benchmarking, and diagnosing computer vision models. In contrast to the conventional (crowd-sourced) manual labeling of ground truth for a relatively modest number of RGB-D images captured by Kinect-like sensors, we devise a non-trivial configurable pipeline of algorithms capable of generating a potentially infinite variety of indoor scenes using a stochastic grammar, specifically, one represented by an attributed spatial And-Or graph. We employ physics-based rendering to synthesize photorealistic RGB images while automatically synthesizing detailed, per-pixel ground truth data, including visible surface depth and normal, object identity and material information, as well as illumination. Our pipeline is configurable inasmuch as it enables the precise customization and control of important attributes of the generated scenes. We demonstrate that our generated scenes achieve a performance similar to the NYU v2 Dataset on pre-trained deep learning models. By modifying pipeline components in a controllable manner, we furthermore provide diagnostics on common scene understanding tasks; eg., depth and surface normal prediction, semantic segmentation, etc.


Spectral Methods for Nonparametric Models

arXiv.org Machine Learning

Nonparametric models are versatile, albeit computationally expensive, tool for modeling mixture models. In this paper, we introduce spectral methods for the two most popular nonparametric models: the Indian Buffet Process (IBP) and the Hierarchical Dirichlet Process (HDP). We show that using spectral methods for the inference of nonparametric models are computationally and statistically efficient. In particular, we derive the lower-order moments of the IBP and the HDP, propose spectral algorithms for both models, and provide reconstruction guarantees for the algorithms. For the HDP, we further show that applying hierarchical models on dataset with hierarchical structure, which can be solved with the generalized spectral HDP, produces better solutions to that of flat models regarding likelihood performance.


Combinatorial Multi-armed Bandits for Real-Time Strategy Games

Journal of Artificial Intelligence Research

Games with large branching factors pose a significant challenge for game tree search algorithms. In this paper, we address this problem with a sampling strategy for Monte Carlo Tree Search (MCTS) algorithms called "naive sampling", based on a variant of the Multi-armed Bandit problem called "Combinatorial Multi-armed Bandits" (CMAB). We analyze the theoretical properties of several variants of naive sampling, and empirically compare it against the other existing strategies in the literature for CMABs. We then evaluate these strategies in the context of real-time strategy (RTS) games, a genre of computer games characterized by their very large branching factors. Our results show that as the branching factor grows, naive sampling outperforms the other sampling strategies.


Optimal Policies for Observing Time Series and Related Restless Bandit Problems

arXiv.org Machine Learning

The trade-off between the cost of acquiring and processing data, and uncertainty due to a lack of data is fundamental in machine learning. A basic instance of this trade-off is the problem of deciding when to make noisy and costly observations of a discrete-time Gaussian random walk, so as to minimise the posterior variance plus observation costs. We present the first proof that a simple policy, which observes when the posterior variance exceeds a threshold, is optimal for this problem. The proof generalises to a wide range of cost functions other than the posterior variance. This result implies that optimal policies for linear-quadratic-Gaussian control with costly observations have a threshold structure. It also implies that the restless bandit problem of observing multiple such time series, has a well-defined Whittle index. We discuss computation of that index, give closed-form formulae for it, and compare the performance of the associated index policy with heuristic policies. The proof is based on a new verification theorem that demonstrates threshold structure for Markov decision processes, and on the relation between binary sequences known as mechanical words and the dynamics of discontinuous nonlinear maps, which frequently arise in physics, control and biology.


The Top 10 Topics in Machine Learning Revisited: A Quantitative Meta-Study

arXiv.org Machine Learning

Which topics of machine learning are most commonly addressed in research? This question was initially answered in 2007 by doing a qualitative survey among distinguished researchers. In our study, we revisit this question from a quantitative perspective. Concretely, we collect 54K abstracts of papers published between 2007 and 2016 in leading machine learning journals and conferences. We then use machine learning in order to determine the top 10 topics in machine learning. We not only include models, but provide a holistic view across optimization, data, features, etc. This quantitative approach allows reducing the bias of surveys. It reveals new and up-to-date insights into what the 10 most prolific topics in machine learning research are. This allows researchers to identify popular topics as well as new and rising topics for their research.


Particle Filtering for PLCA model with Application to Music Transcription

arXiv.org Machine Learning

Automatic Music Transcription (AMT) consists in automatically estimating the notes in an audio recording, through three attributes: onset time, duration and pitch. Probabilistic Latent Component Analysis (PLCA) has become very popular for this task. PLCA is a spectrogram factorization method, able to model a magnitude spectrogram as a linear combination of spectral vectors from a dictionary. Such methods use the Expectation-Maximization (EM) algorithm to estimate the parameters of the acoustic model. This algorithm presents well-known inherent defaults (local convergence, initialization dependency), making EM-based systems limited in their applications to AMT, particularly in regards to the mathematical form and number of priors. To overcome such limits, we propose in this paper to employ a different estimation framework based on Particle Filtering (PF), which consists in sampling the posterior distribution over larger parameter ranges. This framework proves to be more robust in parameter estimation, more flexible and unifying in the integration of prior knowledge in the system. Note-level transcription accuracies of 61.8 $\%$ and 59.5 $\%$ were achieved on evaluation sound datasets of two different instrument repertoires, including the classical piano (from MAPS dataset) and the marovany zither, and direct comparisons to previous PLCA-based approaches are provided. Steps for further development are also outlined.


Unifying the Stochastic Spectral Descent for Restricted Boltzmann Machines with Bernoulli or Gaussian Inputs

arXiv.org Machine Learning

Stochastic gradient descent based algorithms are typically used as the general optimization tools for most deep learning models. A Restricted Boltzmann Machine (RBM) is a probabilistic generative model that can be stacked to construct deep architectures. For RBM with Bernoulli inputs, non-Euclidean algorithm such as stochastic spectral descent (SSD) has been specifically designed to speed up the convergence with improved use of the gradient estimation by sampling methods. However, the existing algorithm and corresponding theoretical justification depend on the assumption that the possible configurations of inputs are finite, like binary variables. The purpose of this paper is to generalize SSD for Gaussian RBM being capable of mod- eling continuous data, regardless of the previous assumption. We propose the gradient descent methods in non-Euclidean space of parameters, via de- riving the upper bounds of logarithmic partition function for RBMs based on Schatten-infinity norm. We empirically show that the advantage and improvement of SSD over stochastic gradient descent (SGD).