Goto

Collaborating Authors

 Learning Graphical Models


maximum likelihood estimate and logistic regression simplified

#artificialintelligence

Least squares regression can cause impossible estimates such as probabilities that are less than zero and greater than 1.So, when the predicted value is measured as a probability, use Logistic Regression We use the log of the odds rather than the odds directly because an odds ratio cannot be a negative number--but its log can be negative. Notice that we have randomly initialized our coefficients for income and other predictors. These will be adjusted by Solver based on a likelihood function.We will cover them later Column H tells us the predicted probability of the borrower's actual behavior, whether that behavior is repayment or default--not simply, as in Column G, the predicted probability of defaulting on the loan. One property of logarithms is that their sum equals the logarithm of the product of the numbers on which they're based The logarithms of probabilities are always negative numbers, but the closer a probability is to 1.0, the closer its logarithm is to 0.0. I haven't covered cross-validation, which is commonly used to validate a logistic regression equation.If you don't always have a large number of cases to work with, a different approach is to use statistical inference.


How To Use Classification Machine Learning Algorithms in Weka - Machine Learning Mastery

#artificialintelligence

Weka makes a large number of classification algorithms available. The large number of machine learning algorithms available is one of the benefits of using the Weka platform to work through your machine learning problems. In this post you will discover how to use 5 top machine learning algorithms in Weka. How To Use Classification Machine Learning Algorithms in Weka Photo by Don Graham, some rights reserved. We are going to take a tour of 5 top classification algorithms in Weka.


Identifying Depression on Twitter

arXiv.org Machine Learning

Social media has recently emerged as a premier method to disseminate information online. Through these online networks, tens of millions of individuals communicate their thoughts, personal experiences, and social ideals. We therefore explore the potential of social media to predict, even prior to onset, Major Depressive Disorder (MDD) in online personas. We employ a crowdsourced method to compile a list of Twitter users who profess to being diagnosed with depression. Using up to a year of prior social media postings, we utilize a Bag of Words approach to quantify each tweet. Lastly, we leverage several statistical classifiers to provide estimates to the risk of depression. Our work posits a new methodology for constructing our classifier by treating social as a text-classification problem, rather than a behavioral one on social media platforms. By using a corpus of 2.5M tweets, we achieved an 81% accuracy rate in classification, with a precision score of .86. We believe that this method may be helpful in developing tools that estimate the risk of an individual being depressed, can be employed by physicians, concerned individuals, and healthcare agencies to aid in diagnosis, even possibly enabling those suffering from depression to be more proactive about recovering from their mental health.


Latent Variable Discovery Using Dependency Patterns

arXiv.org Machine Learning

The causal discovery of Bayesian networks is an active and important research area, and it is based upon searching the space of causal models for those which can best explain a pattern of probabilistic dependencies shown in the data. However, some of those dependencies are generated by causal structures involving variables which have not been measured, i.e., latent variables. Some such patterns of dependency "reveal" themselves, in that no model based solely upon the observed variables can explain them as well as a model using a latent variable. That is what latent variable discovery is based upon. Here we did a search for finding them systematically, so that they may be applied in latent variable discovery in a more rigorous fashion.


Stochastic Neural Networks with Monotonic Activation Functions

arXiv.org Machine Learning

Siamak Ravanbakhsh, Barnab as P oczos, Jeff Schneider 1 and Dale Schuurmans, Russell Greiner 2 1 Carnegie Mellon University, 5000 Forbes Ave, Pittsburgh, PA 15213 2 University of Alberta, Edmonton, AB T6G 2E8, Canada Abstract We propose a Laplace approximation that creates a stochastic unit from any smooth monotonic activation function, using only Gaussian noise. This paper investigates the application of this stochastic approximation in training a family of Restricted Boltzmann Machines (RBM) that are closely linked to Bregman divergences. This family, that we call exponential family RBM (Exp-RBM), is a subset of the exponential family Harmoniums that expresses family members through a choice of smooth monotonic non-linearity for each neuron. Using contrastive divergence along with our Gaussian approximation, we show that Exp-RBM can learn useful representations using novel stochastic units. 1 Introduction Deep neural networks (LeCun et al., 2015; Bengio, 2009) have produced some of the best results in complex pattern recognition tasks where the training data is abundant. Here, we are interested in deep learning for generative modeling. Recent years has witnessed a surge of interest in directed generative models that are trained using (stochastic) back-propagation ( e.g., Kingma and Welling, 2013; Rezende et al., 2014; Goodfellow et al., 2014). These models are distinct from deep energy-based models - including deep Boltzmann machine (Hinton et al., 2006) and (convolutional) deep belief networkAppearing in Proceedings of the 19th International Conference on Artificial Intelligence and Statistics (AISTATS) 2016, Cadiz, Spain. Although, due to their use of Gaussian noise, the stochastic units that we introduce in this paper can be potentially used with stochastic back-propagation, this paper is limited to applications in RBM.


RAND-WALK: A Latent Variable Model Approach to Word Embeddings

arXiv.org Machine Learning

Semantic word embeddings represent the meaning of a word via a vector, and are created by diverse methods. Many use nonlinear operations on co-occurrence statistics, and have hand-tuned hyperparameters and reweighting methods. This paper proposes a new generative model, a dynamic version of the log-linear topic model of~\citet{mnih2007three}. The methodological novelty is to use the prior to compute closed form expressions for word statistics. This provides a theoretical justification for nonlinear models like PMI, word2vec, and GloVe, as well as some hyperparameter choices. It also helps explain why low-dimensional semantic embeddings contain linear algebraic structure that allows solution of word analogies, as shown by~\citet{mikolov2013efficient} and many subsequent papers. Experimental support is provided for the generative model assumptions, the most important of which is that latent word vectors are fairly uniformly dispersed in space.


Artificial intelligence - Wikipedia, the free encyclopedia

#artificialintelligence

Artificial intelligence (AI) is intelligence exhibited by machines. In computer science, an ideal "intelligent" machine is a flexible rational agent that perceives its environment and takes actions that maximize its chance of success at some goal.[1] Colloquially, the term "artificial intelligence" is applied when a machine mimics "cognitive" functions that humans associate with other human minds, such as "learning" and "problem solving".[2] As machines become increasingly capable, facilities once thought to require intelligence are removed from the definition. For example, optical character recognition is no longer perceived as an exemplar of "artificial intelligence" having become a routine technology.[3] Capabilities still classified as AI include advanced Chess and Go systems and self-driving cars. AI research is divided into subfields[4] that focus on specific problems or on specific approaches or on the use of a particular tool or towards satisfying particular applications. The central problems (or goals) of AI research include reasoning, knowledge, planning, learning, natural language processing (communication), perception and the ability to move and manipulate objects.[5] General intelligence is among the field's long-term goals.[6] Approaches include statistical methods, computational intelligence, soft computing (e.g. machine learning), and traditional symbolic AI. Many tools are used in AI, including versions of search and mathematical optimization, logic, methods based on probability and economics. The AI field draws upon computer science, mathematics, psychology, linguistics, philosophy, neuroscience and artificial psychology. The field was founded on the claim that human intelligence "can be so precisely described that a machine can be made to simulate it."[7] This raises philosophical arguments about the nature of the mind and the ethics of creating artificial beings endowed with human-like intelligence, issues which have been explored by myth, fiction and philosophy since antiquity.[8] Attempts to create artificial intelligence has experienced many setbacks, including the ALPAC report of 1966, the abandonment of perceptrons in 1970, the Lighthill Report of 1973 and the collapse of the Lisp machine market in 1987. In the twenty-first century AI techniques became an essential part of the technology industry, helping to solve many challenging problems in computer science.[9]



Multimodal, high-dimensional, model-based, Bayesian inverse problems with applications in biomechanics

arXiv.org Machine Learning

This paper is concerned with the numerical solution of model-based, Bayesian inverse problems. We are particularly interested in cases where the cost of each likelihood evaluation (forward-model call) is expensive and the number of un- known (latent) variables is high. This is the setting in many problems in com- putational physics where forward models with nonlinear PDEs are used and the parameters to be calibrated involve spatio-temporarily varying coefficients, which upon discretization give rise to a high-dimensional vector of unknowns. One of the consequences of the well-documented ill-posedness of inverse prob- lems is the possibility of multiple solutions. While such information is contained in the posterior density in Bayesian formulations, the discovery of a single mode, let alone multiple, is a formidable task. The goal of the present paper is two- fold. On one hand, we propose approximate, adaptive inference strategies using mixture densities to capture multi-modal posteriors, and on the other, to ex- tend our work in [1] with regards to effective dimensionality reduction techniques that reveal low-dimensional subspaces where the posterior variance is mostly concentrated. We validate the model proposed by employing Importance Sam- pling which confirms that the bias introduced is small and can be efficiently corrected if the analyst wishes to do so. We demonstrate the performance of the proposed strategy in nonlinear elastography where the identification of the mechanical properties of biological materials can inform non-invasive, medical di- agnosis. The discovery of multiple modes (solutions) in such problems is critical in achieving the diagnostic objectives.


Exploiting Big Data in Logistics Risk Assessment via Bayesian Nonparametrics

arXiv.org Machine Learning

In cargo logistics, a key performance measure is transport risk, defined as the deviation of the actual arrival time from the planned arrival time. Neither earliness nor tardiness is desirable for customer and freight forwarders. In this paper, we investigate ways to assess and forecast transport risks using a half-year of air cargo data, provided by a leading forwarder on 1336 routes served by 20 airlines. Interestingly, our preliminary data analysis shows a strong multimodal feature in the transport risks, driven by unobserved events, such as cargo missing flights. To accommodate this feature, we introduce a Bayesian nonparametric model -- the probit stick-breaking process (PSBP) mixture model -- for flexible estimation of the conditional (i.e., state-dependent) density function of transport risk. We demonstrate that using simpler methods, such as OLS linear regression, can lead to misleading inferences. Our model provides a tool for the forwarder to offer customized price and service quotes. It can also generate baseline airline performance to enable fair supplier evaluation. Furthermore, the method allows us to separate recurrent risks from disruption risks. This is important, because hedging strategies for these two kinds of risks are often drastically different.