Goto

Collaborating Authors

 Bayesian Learning


On effective human robot interaction based on recognition and association

arXiv.org Artificial Intelligence

Faces play a magnificent role in human robot interaction, as they do in our daily life. The inherent ability of the human mind facilitates us to recognize a person by exploiting various challenges such as bad illumination, occlusions, pose variation etc. which are involved in face recognition. But it is a very complex task in nature to identify a human face by humanoid robots. The recent literatures on face biometric recognition are extremely rich in its application on structured environment for solving human identification problem. But the application of face biometric on mobile robotics is limited for its inability to produce accurate identification in uneven circumstances. The existing face recognition problem has been tackled with our proposed component based fragmented face recognition framework. The proposed framework uses only a subset of the full face such as eyes, nose and mouth to recognize a person. It's less searching cost, encouraging accuracy and ability to handle various challenges of face recognition offers its applicability on humanoid robots. The second problem in face recognition is the face spoofing, in which a face recognition system is not able to distinguish between a person and an imposter (photo/video of the genuine user). The problem will become more detrimental when robots are used as an authenticator. A depth analysis method has been investigated in our research work to test the liveness of imposters to discriminate them from the legitimate users. The implication of the previous earned techniques has been used with respect to criminal identification with NAO robot. An eyewitness can interact with NAO through a user interface. NAO asks several questions about the suspect, such as age, height, her/his facial shape and size etc., and then making a guess about her/his face.


Sampling-based Bayesian Inference with gradient uncertainty

arXiv.org Artificial Intelligence

Deep neural networks(NNs) have achieved impressive performance, often exceed human performance on many computer vision tasks. However, one of the most challenging issues that still remains is that NNs are overconfident in their predictions, which can be very harmful when this arises in safety critical applications. In this paper, we show that predictive uncertainty can be efficiently estimated when we incorporate the concept of gradients uncertainty into posterior sampling. The proposed method is tested on two different datasets, MNIST for in-distribution confusing examples and notMNIST for out-of-distribution data. We show that our method is able to efficiently represent predictive uncertainty on both datasets.


Training Complex Models with Multi-Task Weak Supervision

arXiv.org Machine Learning

As machine learning models continue to increase in complexity, collecting large hand-labeled training sets has become one of the biggest roadblocks in practice. Instead, weaker forms of supervision that provide noisier but cheaper labels are often used. However, these weak supervision sources have diverse and unknown accuracies, may output correlated labels, and may label different tasks or apply at different levels of granularity. We propose a framework for integrating and modeling such weak supervision sources by viewing them as labeling different related sub-tasks of a problem, which we refer to as the multi-task weak supervision setting. We show that by solving a matrix completion-style problem, we can recover the accuracies of these multi-task sources given their dependency structure, but without any labeled data, leading to higher-quality supervision for training an end model. Theoretically, we show that the generalization error of models trained with this approach improves with the number of unlabeled data points, and characterize the scaling with respect to the task and dependency structures. On three fine-grained classification problems, we show that our approach leads to average gains of 20.2 points in accuracy over a traditional supervised approach, 6.8 points over a majority vote baseline, and 4.1 points over a previously proposed weak supervision method that models tasks separately.


On Marginally Correct Approximations of Dempster-Shafer Belief Functions from Data

arXiv.org Artificial Intelligence

Mathematical Theory of Evidence (MTE), a foundation for reasoning under partial ignorance, is blamed to leave frequencies outside (or aside of) its framework. The seriousness of this accusation is obvious: no experiment may be run to compare the performance of MTE-based models of real world processes against real world data. In this paper we consider this problem from the point of view of conditioning in the MTE. We describe the class of belief functions for which marginal consistency with observed frequencies may be achieved and conditional belief functions are proper belief functions,%\ and deal with implications for (marginal) approximation of general belief functions by this class of belief functions and for inference models in MTE.


MLIC: A MaxSAT-Based framework for learning interpretable classification rules

arXiv.org Artificial Intelligence

The wide adoption of machine learning approaches in the industry, government, medicine and science has renewed the interest in interpretable machine learning: many decisions are too important to be delegated to black-box techniques such as deep neural networks or kernel SVMs. Historically, problems of learning interpretable classifiers, including classification rules or decision trees, have been approached by greedy heuristic methods as essentially all the exact optimization formulations are NP-hard. Our primary contribution is a MaxSAT-based framework, called MLIC, which allows principled search for interpretable classification rules expressible in propositional logic. Our approach benefits from the revolutionary advances in the constraint satisfaction community to solve large-scale instances of such problems. In experimental evaluations over a collection of benchmarks arising from practical scenarios, we demonstrate its effectiveness: we show that the formulation can solve large classification problems with tens or hundreds of thousands of examples and thousands of features, and to provide a tunable balance of accuracy vs. interpretability. Furthermore, we show that in many problems interpretability can be obtained at only a minor cost in accuracy. The primary objective of the paper is to show that recent advances in the MaxSAT literature make it realistic to find optimal (or very high quality near-optimal) solutions to large-scale classification problems. The key goal of the paper is to excite researchers in both interpretable classification and in the CP community to take it further and propose richer formulations, and to develop bespoke solvers attuned to the problem of interpretable ML.


Efficient and Robust Machine Learning for Real-World Systems

arXiv.org Machine Learning

While machine learning is traditionally a resource intensive task, embedded systems, autonomous navigation and the vision of the Internet-of-Things fuel the interest in resource efficient approaches. These approaches require a carefully chosen trade-off between performance and resource consumption in terms of computation and energy. On top of this, it is crucial to treat uncertainty in a consistent manner in all but the simplest applications of machine learning systems. In particular, a desideratum for any real-world system is to be robust in the presence of outliers and corrupted data, as well as being `aware' of its limits, i.e.\ the system should maintain and provide an uncertainty estimate over its own predictions. These complex demands are among the major challenges in current machine learning research and key to ensure a smooth transition of machine learning technology into every day's applications. In this article, we provide an overview of the current state of the art of machine learning techniques facilitating these real-world requirements. First we provide a comprehensive review of resource-efficiency in deep neural networks with focus on techniques for model size reduction, compression and reduced precision. These techniques can be applied during training or as post-processing and are widely used to reduce both computational complexity and memory footprint. As most (practical) neural networks are limited in their ways to treat uncertainty, we contrast them with probabilistic graphical models, which readily serve these desiderata by means of probabilistic inference. In that way, we provide an extensive overview of the current state-of-the-art of robust and efficient machine learning for real-world systems.


Estimation of multivariate asymmetric power GARCH models

arXiv.org Machine Learning

It is now widely accepted that volatility models have to incorporate the so-called leverage effect in order to to model the dynamics of daily financial returns. We suggest a new class of multivariate power transformed asymmetric models. It includes several functional forms of multivariate GARCH models which are of great interest in financial modeling and time series literature. We provide an explicit necessary and sufficient condition to establish the strict stationarity of the model. We derive the asymptotic properties of the quasi-maximum likelihood estimator of the parameters. These properties are established both when the power of the transformation is known or is unknown. The asymptotic results are illustrated by Monte Carlo experiments. An application to real financial data is also proposed. Introduction The ARCH (AutoRegressive Conditional Heteroscedastic) model has been introduced by Engle (1982) in an univariate context. Since this work a lot of extensions have been proposed. A first one has been suggested four years latter, namely the GARCH (Generalised ARCH) model by Bollerslev (1986). This model had for goal to improve modeling by considering the past conditional variance (volatility). Their concept are based on the past conditional heteroscedasticity which depends on the past values of the return. A consequence is the volatility has the same magnitude for a negative or positive return. Financial series have their own characteristics which are usually difficult to reproduce artificially. An important characteristic is the leverage effect which consider negative returns differently than the positive returns. This is in contradiction with the construction of the GARCH model, because it cannot consider the asymmetry. The TGARCH (Threshold GARCH) model introduced by Rabemananjara and Zakoรฏan (1993) improve modeling because it considers the asymmetry since the volatility is determined by the past negative observations and the past positive observations with different weights. Various asymmetric GARCH processes are introduced in the econometric literature, for instance the EGARCH (Exponential GARCH) and the log GARCH models (see Francq et al. (2013) who studied the asymptotic properties of an EGARCH (1, 1) models).


Information geometry for approximate Bayesian computation

arXiv.org Machine Learning

The goal of this paper is to explore the basic Approximate Bayesian Computation (ABC) algorithm via the lens of information theory. ABC is a widely used algorithm in cases where the likelihood of the data is hard to work with or intractable, but one can simulate from it. We use relative entropy ideas to analyze the behavior of the algorithm as a function of the thresholding parameter and of the size of the data. Relative entropy here is data driven as it depends on the values of the observed statistics. We allow different thresholding parameters for each different direction (i.e. for different observed statistic) and compute the weighted effect on each direction. The latter allows to find important directions via sensitivity analysis leading to potentially larger acceptance regions, which in turn brings the computational cost of the algorithm down for the same level of accuracy. In addition, we also investigate the bias of the estimators for generic observables as a function of both the thresholding parameters and the size of the data. Our analysis provides error bounds on performance for positive tolerances and finite sample sizes. Simulation studies complement and illustrate the theoretical results.


An empirical study on hyperparameter tuning of decision trees

arXiv.org Machine Learning

Machine learning algorithms often contain many hyperparameters whose values affect the predictive performance of the induced models in intricate ways. Due to the high number of possibilities for these hyperparameter configurations, and their complex interactions, it is common to use optimization techniques to find settings that lead to high predictive accuracy. However, we lack insight into how to efficiently explore this vast space of configurations: which are the best optimization techniques, how should we use them, and how significant is their effect on predictive or runtime performance? This paper provides a comprehensive approach for investigating the effects of hyperparameter tuning on three Decision Tree induction algorithms, CART, C4.5 and CTree. These algorithms were selected because they are based on similar principles, have presented a high predictive performance in several previous works and induce interpretable classification models. Additionally, they contain many interacting hyperparameters to be adjusted. Experiments were carried out with different tuning strategies to induce models and evaluate the relevance of hyperparameters using 94 classification datasets from OpenML. Experimental results indicate that hyperparameter tuning provides statistically significant improvements for C4.5 and CTree in only one-third of the datasets, and in most of the datasets for CART. Different tree algorithms may present different tuning scenarios, but in general, the tuning techniques required relatively few iterations to find accurate solutions. Furthermore, the best technique for all the algorithms was the Irace. Finally, we find that tuning a specific small subset of hyperparameters contributes most of the achievable optimal predictive performance.


A Technical Survey on Statistical Modelling and Design Methods for Crowdsourcing Quality Control

arXiv.org Machine Learning

Online crowdsourcing provides a scalable and inexpensive means to collect knowledge (e.g. labels) about various types of data items (e.g. text, audio, video). However, it is also known to result in large variance in the quality of recorded responses which often cannot be directly used for training machine learning systems. To resolve this issue, a lot of work has been conducted to control the response quality such that low-quality responses cannot adversely affect the performance of the machine learning systems. Such work is referred to as the quality control for crowdsourcing. Past quality control research can be divided into two major branches: quality control mechanism design and statistical models. The first branch focuses on designing measures, thresholds, interfaces and workflows for payment, gamification, question assignment and other mechanisms that influence workers' behaviour. The second branch focuses on developing statistical models to perform effective aggregation of responses to infer correct responses. The two branches are connected as statistical models (i) provide parameter estimates to support the measure and threshold calculation, and (ii) encode modelling assumptions used to derive (theoretical) performance guarantees for the mechanisms. There are surveys regarding each branch but they lack technical details about the other branch. Our survey is the first to bridge the two branches by providing technical details on how they work together under frameworks that systematically unify crowdsourcing aspects modelled by both of them to determine the response quality. We are also the first to provide taxonomies of quality control papers based on the proposed frameworks. Finally, we specify the current limitations and the corresponding future directions for the quality control research.