Goto

Collaborating Authors

 Performance Analysis


Personalization Effect on Emotion Recognition from Physiological Data: An Investigation of Performance on Different Setups and Classifiers

arXiv.org Machine Learning

The problem of machine emotional intelligence is very broad and multifaceted; one of its challenges being the very fact that is hard to define it in an unambiguous way. There is no unique definition of emotion, and there is neither a specific method nor a particular required dataset that is guaranteed to capture it. One of the most popular emotion definitions is the one of the six basic emotions by Paul Ekman [1]. The original six emotions he proposed are: anger, disgust, fear, happiness, sadness and surprise. Another very popular approach is the 2-dimensional emotion map, where each emotional state is projected on the orthogonal axes of valence and arousal [2]. A third dimension can be added to this space with the axis of dominance, see [3] and its related references.


Text Classification in Microsoft's Azure Machine Learning Studio CrowdFlower

#artificialintelligence

There are lots of great tools out there for building machine learning models and data processing pipelines. Most of these tools, like R, scikit-learn, spark.ml At CrowdFlower, we use many of these resources to varying degrees. However, we also recognize that many people will prefer to approach model building and deployment in a hands-on integrated environment supported by a graphical interface. To this end, we are pleased to showcase an end-to-end model construction process in Microsoft's Azure Machine Learning Studio.


Geometric Mean Metric Learning

arXiv.org Machine Learning

We revisit the task of learning a Euclidean metric from data. We approach this problem from first principles and formulate it as a surprisingly simple optimization problem. Indeed, our formulation even admits a closed form solution. This solution possesses several very attractive properties: (i) an innate geometric appeal through the Riemannian geometry of positive definite matrices; (ii) ease of interpretability; and (iii) computational speed several orders of magnitude faster than the widely used LMNN and ITML methods. Furthermore, on standard benchmark datasets, our closed-form solution consistently attains higher classification accuracy.


[In Depth] Brain scans are prone to false positives, study says

Science

A new study suggests that common settings used in software for analyzing brain scans may lead to false positive results. Researchers led by Anders Eklund, an electrical engineer at Linköping University in Sweden, analyzed functional magnetic resonance imaging (fMRI) data from several public databases. Certain software settings, the team found, could give rise to a false positive result up to 70% of the time. In the context of a typical fMRI experiment, that could lead researchers to wrongly conclude that activity in a certain area of the brain plays a role in a cognitive function such as perception or memory.


AI Boosts Cancer Screens to Nearly 100 Percent Accuracy

#artificialintelligence

Diagnosing cancer is about to get more accurate, with the help of artificial intelligence. Pathologists have diagnosed diseases in more or less the same way for the past 100 years, by laboring over a microscope reviewing biopsy samples on little glass slides. Working almost robotically, they sift through millions of normal cells to identify just a few diseased ones. The task is tedious and prone to human error. But now, scientists and engineers have created a technique that uses artificial intelligence (AI) and can differentiate cancer cells from normal cells almost as well as a top-notch pathologist.


Fifty Shades of Ratings: How to Benefit from a Negative Feedback in Top-N Recommendations Tasks

arXiv.org Machine Learning

Conventional collaborative filtering techniques treat a top-n recommendations problem as a task of generating a list of the most relevant items. This formulation, however, disregards an opposite - avoiding recommendations with completely irrelevant items. Due to that bias, standard algorithms, as well as commonly used evaluation metrics, become insensitive to negative feedback. In order to resolve this problem we propose to treat user feedback as a categorical variable and model it with users and items in a ternary way. We employ a third-order tensor factorization technique and implement a higher order folding-in method to support online recommendations. The method is equally sensitive to entire spectrum of user ratings and is able to accurately predict relevant items even from a negative only feedback. Our method may partially eliminate the need for complicated rating elicitation process as it provides means for personalized recommendations from the very beginning of an interaction with a recommender system. We also propose a modification of standard metrics which helps to reveal unwanted biases and account for sensitivity to a negative feedback. Our model achieves state-of-the-art quality in standard recommendation tasks while significantly outperforming other methods in the cold-start "no-positive-feedback" scenarios.


Causality on Cross-Sectional Data: Stable Specification Search in Constrained Structural Equation Modeling

arXiv.org Machine Learning

Causal modeling has long been an attractive topic for many researchers and in recent decades there has seen a surge in theoretical development and discovery algorithms. Generally discovery algorithms can be divided into two approaches: constraint-based and score-based. The constraint-based approach is able to detect common causes of the observed variables but the use of independence tests makes it less reliable. The score-based approach produces a result that is easier to interpret as it also measures the reliability of the inferred causal relationships, but it is unable to detect common confounders of the observed variables. A drawback of both score-based and constrained-based approaches is the inherent instability in structure estimation. With finite samples small changes in the data can lead to completely different optimal structures. The present work introduces a new hypothesis-free score-based causal discovery algorithm, called stable specification search, that is robust for finite samples based on recent advances in stability selection using subsampling and selection algorithms. Structure search is performed over Structural Equation Models. Our approach uses exploratory search but allows incorporation of prior background knowledge. We validated our approach on one simulated data set, which we compare to the known ground truth, and two real-world data sets for Chronic Fatigue Syndrome and Attention Deficit Hyperactivity Disorder, which we compare to earlier medical studies. The results on the simulated data set show significant improvement over alternative approaches and the results on the real-word data sets show consistency with the hypothesis driven models constructed by medical experts.


Feature Extraction and Automated Classification of Heartbeats by Machine Learning

arXiv.org Machine Learning

We present algorithms for the detection of a class of heart arrhythmias with the goal of eventual adoption by practicing cardiologists. In clinical practice, detection is based on a small number of meaningful features extracted from the heartbeat cycle. However, techniques proposed in the literature use high dimensional vectors consisting of morphological, and time based features for detection. Using electrocardiogram (ECG) signals, we found smaller subsets of features sufficient to detect arrhythmias with high accuracy. The features were found by an iterative step-wise feature selection method. We depart from common literature in the following aspects: 1. As opposed to a high dimensional feature vectors, we use a small set of features with meaningful clinical interpretation, 2. we eliminate the necessity of short-duration patient-specific ECG data to append to the global training data for classification 3. We apply semi-parametric classification procedures (in an ensemble framework) for arrhythmia detection, and 4. our approach is based on a reduced sampling rate of ~ 115 Hz as opposed to 360 Hz in standard literature.


Causal Discovery from Subsampled Time Series Data by Constraint Optimization

arXiv.org Artificial Intelligence

This paper focuses on causal structure estimation from time series data in which measurements are obtained at a coarser timescale than the causal timescale of the underlying system. Previous work has shown that such subsampling can lead to significant errors about the system's causal structure if not properly taken into account. In this paper, we first consider the search for the system timescale causal structures that correspond to a given measurement timescale structure. We provide a constraint satisfaction procedure whose computational performance is several orders of magnitude better than previous approaches. We then consider finite-sample data as input, and propose the first constraint optimization approach for recovering the system timescale causal structure. This algorithm optimally recovers from possible conflicts due to statistical errors. More generally, these advances allow for a robust and non-parametric estimation of system timescale causal structures from subsampled time series data.


From Dependence to Causation

arXiv.org Machine Learning

Machine learning is the science of discovering statistical dependencies in data, and the use of those dependencies to perform predictions. During the last decade, machine learning has made spectacular progress, surpassing human performance in complex tasks such as object recognition, car driving, and computer gaming. However, the central role of prediction in machine learning avoids progress towards general-purpose artificial intelligence. As one way forward, we argue that causal inference is a fundamental component of human intelligence, yet ignored by learning algorithms. Causal inference is the problem of uncovering the cause-effect relationships between the variables of a data generating system. Causal structures provide understanding about how these systems behave under changing, unseen environments. In turn, knowledge about these causal dynamics allows to answer "what if" questions, describing the potential responses of the system under hypothetical manipulations and interventions. Thus, understanding cause and effect is one step from machine learning towards machine reasoning and machine intelligence. But, currently available causal inference algorithms operate in specific regimes, and rely on assumptions that are difficult to verify in practice. This thesis advances the art of causal inference in three different ways. First, we develop a framework for the study of statistical dependence based on copulas and random features. Second, we build on this framework to interpret the problem of causal inference as the task of distribution classification, yielding a family of novel causal inference algorithms. Third, we discover causal structures in convolutional neural network features using our algorithms. The algorithms presented in this thesis are scalable, exhibit strong theoretical guarantees, and achieve state-of-the-art performance in a variety of real-world benchmarks.