Goto

Collaborating Authors

 Directed Networks


A Gentle Introduction to Logistic Regression With Maximum Likelihood Estimation

#artificialintelligence

Logistic regression is a model for binary classification predictive modeling. The parameters of a logistic regression model can be estimated by the probabilistic framework called maximum likelihood estimation. Under this framework, a probability distribution for the target variable (class label) must be assumed and then a likelihood function defined that calculates the probability of observing the outcome given the input data and the model. This function can then be optimized to find the set of parameters that results in the largest sum likelihood over the training dataset. The maximum likelihood approach to fitting a logistic regression model both aids in better understanding the form of the logistic regression model and provides a template that can be used for fitting classification models more generally.


An Introduction to the Powerful Bayes' Theorem for Data Science Professionals

#artificialintelligence

Probability is at the very core of a lot of data science algorithms. In fact, the solutions to so many data science problems are probabilistic in nature โ€“ hence I always advice focusing on learning statistics and probability before jumping into the algorithms. But I've seen a lot of aspiring data scientists shunning statistics, especially Bayesian statistics. It remains incomprehensible to a lot of analysts and data scientists. I'm sure a lot of you are nodding your head at this! Bayes' Theorem, a major aspect of Bayesian Statistics, was created by Thomas Bayes, a monk who lived during the eighteenth century. The very fact that we're still learning about it shows how influential his work has been across centuries!


Prior specification via prior predictive matching: Poisson matrix factorization and beyond

arXiv.org Machine Learning

Hyperparameter optimization for machine learning models is typically carried out by some sort of cross-validation procedure or global optimization, both of which require running the learning algorithm numerous times. We show that for Bayesian hierarchical models there is an appealing alternative that allows selecting good hyperparameters without learning the model parameters during the process at all, facilitated by the prior predictive distribution that marginalizes out the model parameters. We propose an approach that matches suitable statistics of the prior predictive distribution with ones provided by an expert and apply the general concept for matrix factorization models. For some Poisson matrix factorization models we can analytically obtain exact hyperparameters, including the number of factors, and for more complex models we propose a model-independent optimization procedure.


An Active Approach for Model Interpretation

arXiv.org Artificial Intelligence

Model interpretation, or explanation of a machine learning classifier, aims to extract generalizable knowledge from a trained classifier into a human-understandable format, for various purposes such as model assessment, debugging and trust. From a computaional viewpoint, it is formulated as approximating the target classifier using a simpler interpretable model, such as rule models like a decision set/list/tree. Often, this approximation is handled as standard supervised learning and the only difference is that the labels are provided by the target classifier instead of ground truth. This paradigm is particularly popular because there exists a variety of well-studied supervised algorithms for learning an interpretable classifier. However, we argue that this paradigm is suboptimal for it does not utilize the unique property of the model interpretation problem, that is, the ability to generate synthetic instances and query the target classifier for their labels. We call this the active-query property, suggesting that we should consider model interpretation from an active learning perspective. Following this insight, we argue that the active-query property should be employed when designing a model interpretation algorithm, and that the generation of synthetic instances should be integrated seamlessly with the algorithm that learns the model interpretation. In this paper, we demonstrate that by doing so, it is possible to achieve more faithful interpretation with simpler model complexity. As a technical contribution, we present an active algorithm Active Decision Set Induction (ADS) to learn a decision set, a set of if-else rules, for model interpretation. ADS performs a local search over the space of all decision sets. In every iteration, ADS computes confidence intervals for the value of the objective function of all local actions and utilizes active-query to determine the best one.


The Empirical Derivation of the Bayesian Formula Open Data Science Conference

#artificialintelligence

Editor's note: James is a speaker for ODSC London this November! Be sure to check out his talk, "The How, Why, and When of Replacing Engineering Work with Compute Power" there. Deep learning has been made practical through modern computing power, but it is not the only technique benefiting from this large increase in power. Bayesian inference is up and coming technique whose recent progress is powered by the increase in computing power. We can explain the mathematical expression of Bayes formula using an example similar to the great Bayesian Methods for Hackers: Probabilistic Programming and Bayesian Inference to a financial context and using mathematical concepts intuitively arise from code.


A Gentle Introduction to Cross-Entropy for Machine Learning

#artificialintelligence

Cross-entropy is commonly used in machine learning as a loss function. Cross-entropy is a measure from the field of information theory, building upon entropy and generally calculating the difference between two probability distributions. It is closely related to but is different from KL divergence that calculates the relative entropy between two probability distributions, whereas cross-entropy can be thought to calculate the total entropy between the distributions. Cross-entropy is also related to and often confused with logistic loss, called log loss. Although the two measures are derived from a different source, when used as loss functions for classification models, both measures calculate the same quantity and can be used interchangeably. In this tutorial, you will discover cross-entropy for machine learning.


A Gentle Introduction to Cross-Entropy for Machine Learning

#artificialintelligence

Cross-entropy is commonly used in machine learning as a loss function. Cross-entropy is a measure from the field of information theory, building upon entropy and generally calculating the difference between two probability distributions. It is closely related to but is different from KL divergence that calculates the relative entropy between two probability distributions, whereas cross-entropy can be thought to calculate the total entropy between the distributions. Cross-entropy is also related to and often confused with logistic loss, called log loss. Although the two measures are derived from a different source, when used as loss functions for classification models, both measures calculate the same quantity and can be used interchangeably. In this tutorial, you will discover cross-entropy for machine learning.


A Gentle Introduction to Maximum Likelihood Estimation for Machine Learning

#artificialintelligence

Density estimation is the problem of estimating the probability distribution for a sample of observations from a problem domain. There are many techniques for solving density estimation, although a common framework used throughout the field of machine learning is maximum likelihood estimation. Maximum likelihood estimation involves defining a likelihood function for calculating the conditional probability of observing the data sample given a probability distribution and distribution parameters. This approach can be used to search a space of possible distributions and parameters. This flexible probabilistic framework also provides the foundation for many machine learning algorithms, including important methods such as linear regression and logistic regression for predicting numeric values and class labels respectively, but also more generally for deep learning artificial neural networks.


Fair Generative Modeling via Weak Supervision

arXiv.org Machine Learning

Real-world datasets are often biased with respect to key demographic factors such as race and gender. Due to the latent nature of the underlying factors, detecting and mitigating bias is especially challenging for unsupervised machine learning. We present a weakly supervised algorithm for overcoming dataset bias for deep generative models. Our approach requires access to an additional small, unlabeled but unbiased dataset as the supervision signal, thus sidestepping the need for explicit labels on the underlying bias factors. Using this supplementary dataset, we detect the bias in existing datasets via a density ratio technique and learn generative models which efficiently achieve the twin goals of: 1) data efficiency by using training examples from both biased and unbiased datasets for learning, 2) unbiased data generation at test time. Empirically, we demonstrate the efficacy of our approach which reduces bias w.r.t. latent factors by 57.1% on average over baselines for comparable image generation using generative adversarial networks.


Comparison of Different Spike Sorting Subtechniques Based on Rat Brain Basolateral Amygdala Neuronal Activity

arXiv.org Machine Learning

Developing electrophysiological recordings of brain neuronal activity and their analysis provide a basis for exploring the structure of brain function and nervous system investigation. The recorded signals are typically a combination of spikes and noise. High amounts of background noise and possibility of electric signaling recording from several neurons adjacent to the recording site have led scientists to develop neuronal signal processing tools such as spike sorting to facilitate brain data analysis. Spike sorting plays a pivotal role in understanding the electrophysiological activity of neuronal networks. This process prepares recorded data for interpretations of neurons interactions and understanding the overall structure of brain functions. Spike sorting consists of three steps: spike detection, feature extraction, and spike clustering. There are several methods to implement each of spike sorting steps. This paper provides a systematic comparison of various spike sorting sub-techniques applied to real extracellularly recorded data from a rat brain basolateral amygdala. An efficient sorted data resulted from careful choice of spike sorting sub-methods leads to better interpretation of the brain structures connectivity under different conditions, which is a very sensitive concept in diagnosis and treatment of neurological disorders. Here, spike detection is performed by appropriate choice of threshold level via three different approaches. Feature extraction is done through PCA and Kernel PCA methods, which Kernel PCA outperforms. We have applied four different algorithms for spike clustering including K-means, Fuzzy C-means, Bayesian and Fuzzy maximum likelihood estimation. As one requirement of most clustering algorithms, optimal number of clusters is achieved through validity indices for each method. Finally, the sorting results are evaluated using inter-spike interval histograms.