Goto

Collaborating Authors

 beholder


Action is in the Eye of the Beholder: Eye-gaze Driven Model for Spatio-Temporal Action Localization

Neural Information Processing Systems

We propose a new weakly-supervised structured learning approach for recognition and spatio-temporal localization of actions in video. As part of the proposed approach we develop a generalization of the Max-Path search algorithm, which allows us to efficiently search over a structured space of multiple spatio-temporal paths, while also allowing to incorporate context information into the model. Instead of using spatial annotations, in the form of bounding boxes, to guide the latent model during training, we utilize human gaze data in the form of a weak supervisory signal. This is achieved by incorporating gaze, along with the classification, into the structured loss within the latent SVM learning framework. Experiments on a challenging benchmark dataset, UCF-Sports, show that our model is more accurate, in terms of classification, and achieves state-of-the-art results in localization. In addition, we show how our model can produce top-down saliency maps conditioned on the classification label and localized latent paths.


In the Eye of the Beholder: Robust Prediction with Causal User Modeling

Neural Information Processing Systems

Accurately predicting the relevance of items to users is crucial to the success of many social platforms. Conventional approaches train models on logged historical data; but recommendation systems, media services, and online marketplaces all exhibit a constant influx of new content---making relevancy a moving target, to which standard predictive models are not robust. In this paper, we propose a learning framework for relevance prediction that is robust to changes in the data distribution. Our key observation is that robustness can be obtained by accounting for \emph{how users causally perceive the environment}. We model users as boundedly-rational decision makers whose causal beliefs are encoded by a causal graph, and show how minimal information regarding the graph can be used to contend with distributional changes. Experiments in multiple settings demonstrate the effectiveness of our approach.


Action is in the Eye of the Beholder: Eye-gaze Driven Model for Spatio-Temporal Action Localization

Neural Information Processing Systems

We propose a new weakly-supervised structured learning approach for recognition and spatio-temporal localization of actions in video. As part of the proposed approach we develop a generalization of the Max-Path search algorithm, which allows us to efficiently search over a structured space of multiple spatio-temporal paths, while also allowing to incorporate context information into the model. Instead of using spatial annotations, in the form of bounding boxes, to guide the latent model during training, we utilize human gaze data in the form of a weak supervisory signal. This is achieved by incorporating gaze, along with the classification, into the structured loss within the latent SVM learning framework. Experiments on a challenging benchmark dataset, UCF-Sports, show that our model is more accurate, in terms of classification, and achieves state-of-the-art results in localization. In addition, we show how our model can produce top-down saliency maps conditioned on the classification label and localized latent paths.


In the Eye of the Beholder: Robust Prediction with Causal User Modeling

Neural Information Processing Systems

Accurately predicting the relevance of items to users is crucial to the success of many social platforms. Conventional approaches train models on logged historical data; but recommendation systems, media services, and online marketplaces all exhibit a constant influx of new content---making relevancy a moving target, to which standard predictive models are not robust. In this paper, we propose a learning framework for relevance prediction that is robust to changes in the data distribution. Our key observation is that robustness can be obtained by accounting for \emph{how users causally perceive the environment}. We model users as boundedly-rational decision makers whose causal beliefs are encoded by a causal graph, and show how minimal information regarding the graph can be used to contend with distributional changes.


Interpretability is in the Mind of the Beholder: A Causal Framework for Human-interpretable Representation Learning

Marconato, Emanuele, Passerini, Andrea, Teso, Stefano

arXiv.org Artificial Intelligence

Focus in Explainable AI is shifting from explanations defined in terms of low-level elements, such as input features, to explanations encoded in terms of interpretable concepts learned from data. How to reliably acquire such concepts is, however, still fundamentally unclear. An agreed-upon notion of concept interpretability is missing, with the result that concepts used by both post-hoc explainers and concept-based neural networks are acquired through a variety of mutually incompatible strategies. Critically, most of these neglect the human side of the problem: a representation is understandable only insofar as it can be understood by the human at the receiving end. The key challenge in Human-interpretable Representation Learning (HRL) is how to model and operationalize this human element. In this work, we propose a mathematical framework for acquiring interpretable representations suitable for both post-hoc explainers and concept-based neural networks. Our formalization of HRL builds on recent advances in causal representation learning and explicitly models a human stakeholder as an external observer. This allows us to derive a principled notion of alignment between the machine representation and the vocabulary of concepts understood by the human. In doing so, we link alignment and interpretability through a simple and intuitive name transfer game, and clarify the relationship between alignment and a well-known property of representations, namely disentanglment. We also show that alignment is linked to the issue of undesirable correlations among concepts, also known as concept leakage, and to content-style separation, all through a general information-theoretic reformulation of these properties. Our conceptualization aims to bridge the gap between the human and algorithmic sides of interpretability and establish a stepping stone for new research on human-interpretable representations.


Training the Untrained Eye with AI to Classify Fine Art

#artificialintelligence

Beauty, it is said, resides in the eye of the beholder. What if that beholder is a machine learning model being trained to describe and classify fine works of art? That's what AI researchers at Zhejiang University of Technology in China are attempting to find out by comparing the ability of different models trained on a growing list of image data sets to classify artwork by genre and style. Whether these models can be trained to respond emotionally remains to be seen. Preliminary results from one study published earlier this month in the journal of the Public Library of Science highlighted the utility of using convolutional neural networks (CNNs) for demanding tasks like art classification.


Action is in the Eye of the Beholder: Eye-gaze Driven Model for Spatio-Temporal Action Localization

Shapovalova, Nataliya, Raptis, Michalis, Sigal, Leonid, Mori, Greg

Neural Information Processing Systems

We propose a new weakly-supervised structured learning approach for recognition and spatio-temporal localization of actions in video. As part of the proposed approach we develop a generalization of the Max-Path search algorithm, which allows us to efficiently search over a structured space of multiple spatio-temporal paths, while also allowing to incorporate context information into the model. Instead of using spatial annotations, in the form of bounding boxes, to guide the latent model during training, we utilize human gaze data in the form of a weak supervisory signal. This is achieved by incorporating gaze, along with the classification, into the structured loss within the latent SVM learning framework. Experiments on a challenging benchmark dataset, UCF-Sports, show that our model is more accurate, in terms of classification, and achieves state-of-the-art results in localization.


The AI that can tell how attractive ANYONE is

Daily Mail - Science & tech

It is an age-old question – what makes someone attractive? We often say things like'beauty is in the eye of the beholder' but while this romantic notion may bring comfort to those dealt a poor hand in life, it also gives the impression that the foundations of attractiveness are elusive and unpredictable. It suggests that what each of us sees as an attractive trait – whether physical or psychological – is so variable that everyone must be looking for something different. Is beauty in the eye of the beholder? Researchers plan to measure dozens of volunteers' characteristics – including humour, intelligence, impulsivity, facial symmetry, strength, and more.


Exploring How to Change the Way the World Literally Sees You

WIRED

Beauty, to borrow a cliché, is in the eye of the beholder. But what if your beholder's eyes could be hacked? In Reality, they can be. The short film--from Revenge writer-director Coralie Fargeat--imagines a future where people can buy an implant that allows them to live in an alternative reality where they can be seen as they want to be seen. Reality, which you can watch in full above, is set in Paris in the future.


In the AI of the beholder: The RobotArt competition entries for 2018 will amaze you

#artificialintelligence

The 3rd annual RobotArt competition is currently underway. Dozens of physical paintings, created by machines, will be judged by professional art critics and the public at-large to determine which team of developers will walk away with the top prize. What it is: RobotArt is the passion project of founder Andrew Conru. The competition runs each year and solicits roboticists and machine learning developers to create physical robot systems capable of painting with brushes and ink. Over the years, developers have used various different systems ranging from neural networks that operate robot arms, to software that translates human brush strokes in real-time to a robot which then attempts to imitate them.