Regression
Untangling AdaBoost-based Cost-Sensitive Classification. Part I: Theoretical Perspective
Landesa-Vázquez, Iago, Alba-Castro, José Luis
Boosting algorithms have been widely used to tackle a plethora of problems. In the last few years, a lot of approaches have been proposed to provide standard AdaBoost with cost-sensitive capabilities, each with a different focus. However, for the researcher, these algorithms shape a tangled set with diffuse differences and properties, lacking a unifying analysis to jointly compare, classify, evaluate and discuss those approaches on a common basis. In this series of two papers we aim to revisit the various proposals, both from theoretical (Part I) and practical (Part II) perspectives, in order to analyze their specific properties and behavior, with the final goal of identifying the algorithm providing the best and soundest results.
How To Use Regression Machine Learning Algorithms in Weka
Weka has a large number of regression algorithms available on the platform. The large number of machine learning algorithms supported by Weka is one of the biggest benefits of using the platform. In this post you will discover how to use top regression machine learning algorithms in Weka. How To Use Regression Machine Learning Algorithms in Weka Photo by solarisgirl, some rights reserved. We are going to take a tour of 5 top regression algorithms in Weka.
Information-theoretical label embeddings for large-scale image classification
We consider the problem of predicting to which classes an image belongs, where the number of classes is large (many thousands or tens of thousands) and where each image typically belongs to multiple classes that should all be properly identified: multi-label, massively multi-class classification. In such classification problems, the best practice until now (for instance in use at Google, Inc.) has been to use a deep convolutional neural network such as the ones described in [19] or [18], culminating in a logistic regression layer with a sigmoid cross-entropy loss, with target labels encoded as high-dimensional sparse binary vectors. The use of logistic regression implies an important yet oft overlooked assumption made about the label space: the classes are considered to be statistically independent, each class being treated as an independent dimension in the label space. This is generally not the case in practice: mirroring statistical dependencies found in the real world, label spaces often have a well-defined internal structure, with some labels being more likely to cooccur than other labels. For instance, "sky" and "beach" are frequently cooccurring labels, while "crane" and "manta ray" are rarely cooccurring. The sigmoid cross-entropy loss with sparse binary targets does not allow to leverage such observations about the structure of the label space. 1 There is therefore an opportunity to exploit the internal structure of the label space for gains in training speed, precision, and recall. One simple way to achieve this is to project the labels onto a lower-dimensional manifold -an embedding space-where a distance function between embedded labels would capture useful statistical dependencies. An appropriate loss function may then allow a parametric model trained via stochastic gradient descent to benefit from the structure of the manifold during training and inference.
Time-Contrastive Learning for Latent Variable Models
"Aapo did it again!" - I exclaimed while reading this paper yesterday on the train back home (or at least I thought I was going home until I realised I was sitting on the wrong train the whole time. This gave me a couple more hours to think while traveling on a variety of long-distance buses...) Aapo Hyvärinen is one of my heroes - he did tons of cool work, probably most famous for pseudo-likelihood, score matching and ICA. Time-contrastive learning (TCL) is a technique for learning to extract nonlinear representations from time series data. First, the time series is sliced up into a number of non-overlapping chunks, indexed by \tau . Then, a multivariate logistic regression classifier is trained in a supervised manner to look at a sample taken from the series at an unknown time and predict \tau, the index of the chunk it came from.
How a Technical Co-founder Spends his Time: Minute-by-minute Data for a Year
I'm co-founder and CTO at Overleaf, a successful SaaS startup based in London. From August 2014 to December 2015, I manually tracked all of my work time, minute-by-minute, and analysed the data in R. Like most people who track their time, my goal was to improve my productivity. It gave me data to answer questions about whether I was spending too much or too little time on particular activities, for example user support or client projects. The data showed that my intuition on these questions was often wrong. There were also some less tangible benefits. It was reassuring on a Friday to have an answer to that usually rhetorical question, "where did this week go?" I feel like it also reduced context switching: if I stopped what I was doing to answer an chat message or email, I had to take the time to record it in my time tracker. I think this added friction was a win for overall productivity, perhaps paradoxically. This post documents the (simple) system I built to record my time, how I analysed the data, and the results.
GP-select: Accelerating EM using adaptive subspace preselection
Shelton, Jacquelyn A., Gasthaus, Jan, Dai, Zhenwen, Luecke, Joerg, Gretton, Arthur
We propose a nonparametric procedure to achieve fast inference in generative graphical models when the number of latent states is very large. The approach is based on iterative latent variable preselection, where we alternate between learning a 'selection function' to reveal the relevant latent variables, and use this to obtain a compact approximation of the posterior distribution for EM; this can make inference possible where the number of possible latent states is e.g. exponential in the number of latent variables, whereas an exact approach would be computationally unfeasible. We learn the selection function entirely from the observed data and current EM state via Gaussian process regression. This is by contrast with earlier approaches, where selection functions were manually-designed for each problem setting. We show that our approach performs as well as these bespoke selection functions on a wide variety of inference problems: in particular, for the challenging case of a hierarchical model for object localization with occlusion, we achieve results that match a customized state-of-the-art selection method, at a far lower computational cost.
Learning Shallow Detection Cascades for Wearable Sensor-Based Mobile Health Applications
Dadkhahi, Hamid, Saleheen, Nazir, Kumar, Santosh, Marlin, Benjamin
The field of mobile health aims to leverage recent advances in wearable on-body sensing technology and smart phone computing capabilities to develop systems that can monitor health states and deliver just-in-time adaptive interventions. However, existing work has largely focused on analyzing collected data in the off-line setting. In this paper, we propose a novel approach to learning shallow detection cascades developed explicitly for use in a real-time wearable-phone or wearable-phone-cloud systems. We apply our approach to the problem of cigarette smoking detection from a combination of wrist-worn actigraphy data and respiration chest band data using two and three stage cascades.
Multiple-Instance Logistic Regression with LASSO Penalty
Chen, Ray-Bing, Cheng, Kuang-Hung, Chang, Sheng-Mao, Jeng, Shuen-Lin, Chen, Ping-Yang, Yang, Chun-Hao, Hsia, Chi-Chun
In this work, we consider a manufactory process which can be described by a multiple-instance logistic regression model. In order to compute the maximum likelihood estimation of the unknown coefficient, an expectation-maximization algorithm is proposed, and the proposed modeling approach can be extended to identify the important covariates by adding the coefficient penalty term into the likelihood function. In addition to essential technical details, we demonstrate the usefulness of the proposed method by simulations and real examples.
Sequential Design for Ranking Response Surfaces
We propose and analyze sequential design methods for the problem of ranking several response surfaces. Namely, given $L \ge 2$ response surfaces over a continuous input space $\cal X$, the aim is to efficiently find the index of the minimal response across the entire $\cal X$. The response surfaces are not known and have to be noisily sampled one-at-a-time. This setting is motivated by stochastic control applications and requires joint experimental design both in space and response-index dimensions. To generate sequential design heuristics we investigate stepwise uncertainty reduction approaches, as well as sampling based on posterior classification complexity. We also make connections between our continuous-input formulation and the discrete framework of pure regret in multi-armed bandits. To model the response surfaces we utilize kriging surrogates. Several numerical examples using both synthetic data and an epidemics control problem are provided to illustrate our approach and the efficacy of respective adaptive designs.
From Dependence to Causation
Machine learning is the science of discovering statistical dependencies in data, and the use of those dependencies to perform predictions. During the last decade, machine learning has made spectacular progress, surpassing human performance in complex tasks such as object recognition, car driving, and computer gaming. However, the central role of prediction in machine learning avoids progress towards general-purpose artificial intelligence. As one way forward, we argue that causal inference is a fundamental component of human intelligence, yet ignored by learning algorithms. Causal inference is the problem of uncovering the cause-effect relationships between the variables of a data generating system. Causal structures provide understanding about how these systems behave under changing, unseen environments. In turn, knowledge about these causal dynamics allows to answer "what if" questions, describing the potential responses of the system under hypothetical manipulations and interventions. Thus, understanding cause and effect is one step from machine learning towards machine reasoning and machine intelligence. But, currently available causal inference algorithms operate in specific regimes, and rely on assumptions that are difficult to verify in practice. This thesis advances the art of causal inference in three different ways. First, we develop a framework for the study of statistical dependence based on copulas and random features. Second, we build on this framework to interpret the problem of causal inference as the task of distribution classification, yielding a family of novel causal inference algorithms. Third, we discover causal structures in convolutional neural network features using our algorithms. The algorithms presented in this thesis are scalable, exhibit strong theoretical guarantees, and achieve state-of-the-art performance in a variety of real-world benchmarks.