Directed Networks
From both sides now: the math of linear regression ยท
Linear regression is the most basic and the most widely used technique in machine learning; yet for all its simplicity, studying it can unlock some of the most important concepts in statistics. If you have a basic undestanding of linear regression expressed as \hat{Y} \theta_0 \theta_1X, but don't have a background in statistics and find statements like "ridge regression is equivalent to the maximum a posteriori (MAP) estimate with a zero-mean Gaussian prior" bewildering, then this post is for you. With a superficial goal of understanding that somewhat obtuse statement, its main objective is to explore the topic, starting from the standard formulation of linear regression, moving on to the probabilistic approach (maximum likelihood formulation) and from there to Bayesian linear regression. I'll use the \theta character throughout to refer to the coefficients (weights) of a regression model, either explicitly broken out as \theta_0 and \theta_1 for intercept and slope respectively, or just \theta referring to the vector of coefficients. I'll usually use the expression \theta Tx_i for the prediction a model gives at x_i, the assumption being that a 1 has been added to the vector of values at x_i . 1 In the single predictor case, we know that the least squares fit is the line that minimizes the sum of the squared distances between observed data and predicted values, i.e. it minimizes the Residual Sum of Squares (RSS): These residuals are pretty important in how we reason about our model.
Introduction to Machine Learning & Face Detection in Python
This course is about the fundamental concepts of machine learning, focusing on neural networks, SVM and decision trees. These topics are getting very hot nowadays because these learning algorithms can be used in several fields from software engineering to investment banking. Learning algorithms can recognize patterns which can help detect cancer for example or we may construct algorithms that can have a very very good guess about stock prices movement in the market. In each section we will talk about the theoretical background for all of these algorithms then we are going to implement these problems together. The first chapter is about regression: very easy yet very powerful and widely used machine learning technique.
A Deep Bag-of-Features Model for Music Auto-Tagging
Nam, Juhan, Herrera, Jorge, Lee, Kyogu
Feature learning and deep learning have drawn great attention in recent years as a way of transforming input data into more effective representations using learning algorithms. Such interest has grown in the area of music information retrieval (MIR) as well, particularly in music audio classification tasks such as auto-tagging. In this paper, we present a two-stage learning model to effectively predict multiple labels from music audio. The first stage learns to project local spectral patterns of an audio track onto a high-dimensional sparse space in an unsupervised manner and summarizes the audio track as a bag-of-features. The second stage successively performs the unsupervised learning on the bag-of-features in a layer-by-layer manner to initialize a deep neural network and finally fine-tunes it with the tag labels. Through the experiment, we rigorously examine training choices and tuning parameters, and show that the model achieves high performance on Magnatagatune, a popularly used dataset in music auto-tagging.
Asymptotic Analysis of Objectives based on Fisher Information in Active Learning
Sourati, Jamshid, Akcakaya, Murat, Leen, Todd K., Erdogmus, Deniz, Dy, Jennifer G.
Obtaining labels can be costly and time-consuming. Active learning allows a learning algorithm to intelligently query samples to be labeled for efficient learning. Fisher information ratio (FIR) has been used as an objective for selecting queries in active learning. However, little is known about the theory behind the use of FIR for active learning. There is a gap between the underlying theory and the motivation of its usage in practice. In this paper, we attempt to fill this gap and provide a rigorous framework for analyzing existing FIR-based active learning methods. In particular, we show that FIR can be asymptotically viewed as an upper bound of the expected variance of the log-likelihood ratio. Additionally, our analysis suggests a unifying framework that not only enables us to make theoretical comparisons among the existing querying methods based on FIR, but also allows us to give insight into the development of new active learning approaches based on this objective.
Bayesian multi-tensor factorization
Khan, Suleiman A., Leppรคaho, Eemeli, Kaski, Samuel
We introduce Bayesian multi-tensor factorization, a model that is the first Bayesian formulation for joint factorization of multiple matrices and tensors. The research problem generalizes the joint matrix-tensor factorization problem to arbitrary sets of tensors of any depth, including matrices, can be interpreted as unsupervised multi-view learning from multiple data tensors, and can be generalized to relax the usual trilinear tensor factorization assumptions. The result is a factorization of the set of tensors into factors shared by any subsets of the tensors, and factors private to individual tensors. We demonstrate the performance against existing baselines in multiple tensor factorization tasks in structural toxicogenomics and functional neuroimaging.
The 10 Algorithms Machine Learning Engineers Need to Know
It is no doubt that the sub-field of machine learning / artificial intelligence has increasingly gained more popularity in the past couple of years. As Big Data is the hottest trend in the tech industry at the moment, machine learning is incredibly powerful to make predictions or calculated suggestions based on large amounts of data. Some of the most common examples of machine learning are Netflix's algorithms to make movie suggestions based on movies you have watched in the past or Amazon's algorithms that recommend books based on books you have bought before. So if you want to learn more about machine learning, how do you start? For me, my first introduction is when I took an Artificial Intelligence class when I was studying abroad in Copenhagen. My lecturer is a full-time Applied Math and CS professor at the Technical University of Denmark, in which his research areas are logic and artificial, focusing primarily on the use of logic to model human-like planning, reasoning and problem solving.
Static & DYNAMICAL Machine Learning โ What is the Difference?
In an earlier blog, "Need for DYNAMICAL Machine Learning: Bayesian exact recursive estimation", I introduced the need for Dynamical ML as we now enter the "Walk" stage of "Crawl-Walk-Run" evolution of machine learning. First, I defined Static ML as follows: Given a set of inputs and outputs, find a static map between the two during supervised "Training" and use this static map for business purposes during "Operation". I made the following points using IoT as an example. Dynamical ML solution involves State-Space data model (more below). What more does a Dynamical ML solution offer?
Columbia University Free Online Course on Machine Learning
Columbia University is offering free online course on Machine Learning. It is a subfield of computer science that evolved from the study of pattern recognition and computational learning theory in artificial intelligence. In this course applicants will master the essentials of machine learning and algorithms to help improve learning from data without human intervention. The course will start on January 16, 2017. Columbia University is one of the world's most important centers of research and at the same time a distinctive and distinguished learning environment for undergraduates and graduate students in many scholarly and professional fields.
Learning Bayesian Networks with Incomplete Data by Augmentation
Adel, Tameem, de Campos, Cassio P.
An exact Bayesian network learning algorithm is obtained by recasting the problem into a standard Bayesian network learning problem without missing data. To the best of our knowledge, this is the first exact algorithm for this problem. As expected, the exact algorithm does not scale to large domains. We build on the exact method to create an approximate algorithm using a hill-climbing technique. This algorithm scales to large domains so long as a suitable standard structure learning method for complete data is available. We perform a wide range of experiments to demonstrate the benefits of learning Bayesian networks with such new approach.
A Bayesian Information Criterion for Singular Models
On Wednesday, Mathias Drton and I will be presenting a read paper on Bayesian model choice for singular models at the Royal Statistical Society in London. You can read more about it on the RSS web site, where you can also download a preprint. The paper is scheduled to appear, with the discussion, in Series B of the Journal of the Royal Statistical Society next year. The CRAN package sBIC by Luca Weihs implements the ideas in the paper and includes a series of vignettes that allow you to step through some of the examples in the paper.