Communications of the ACM

"Like, I thought coding was going to be boring and kind of just make me super-mad. It was going to be like tragic. But now that I've taken this class and I've seen all the things I can do with EarSketch and how that can be applied--like the same general concepts can be applied and expanded on to all these other aspects and different fields--it kind of opened up and made me kind of rethink my career choices like, 'Oh, maybe I actually want to pursue something in like IT or computer science.' Normally, you have like a one-sided opinion or view of coding. You don't really see it as being something creative and so personable. This is a reflection from a high school student in an introductory computer science course.



Today, there are two major paradigms for vision-based autonomous driving systems: mediated perception approaches that parse an entire scene to make a driving decision, and behavior reflex approaches that directly map an input image to a driving action by a regressor. In this paper, we propose a third paradigm: a direct perception based approach to estimate the affordance for driving. We propose to map an input image to a small number of key perception indicators that directly relate to the affordance of a road/traffic state for driving. Our representation provides a set of compact yet complete descriptions of the scene to enable a simple controller to drive autonomously. Falling in between the two extremes of mediated perception and behavior reflex, we argue that our direct perception representation provides the right level of abstraction.

User-Centric Affective Computing of Image Emotion Perceptions

AAAI Conferences

We propose to predict the personalized emotion perceptions of images for each viewer. Different factors that may influence emotion perceptions, including visual content, social context, temporal evolution, and location influence are jointly investigated via the presented rolling multi-task hypergraph learning. For evaluation, we set up a large scale image emotion dataset from Flickr, named Image-Emotion-Social-Net, with over 1 million images and about 8,000 users. Experiments conducted on this dataset demonstrate the superiority of the proposed method, as compared to state-of-the-art.


AAAI Conferences

Emotion affects our understanding of the opinions and sentiments of others. Research has demonstrated that humans are able to recognize emotions in various domains, including speech and music, and that there are potential shared features that shape the emotion in both domains. In this paper, we investigate acoustic and visual features that are relevant to emotion perception in the domains of singing and speaking. We train regression models using two paradigms: (1) within-domain, in which models are trained and tested on the same domain and (2) cross-domain, in which models are trained on one domain and tested on the other domain. This strategy allows us to analyze the similarities and differences underlying the relationship between audio-visual feature expression and emotion perception and how this relationship is affected by domain of expression.

Modeling Clutter Perception using Parametric Proto-object Partitioning

Neural Information Processing Systems

Visual clutter, the perception of an image as being crowded and disordered, affects aspects of our lives ranging from object detection to aesthetics, yet relatively little effort has been made to model this important and ubiquitous percept. Our approach models clutter as the number of proto-objects segmented from an image, with proto-objects defined as groupings of superpixels that are similar in intensity, color, and gradient orientation features. We introduce a novel parametric method of merging superpixels by modeling mixture of Weibull distributions on similarity distance statistics, then taking the normalized number of proto-objects following partitioning as our estimate of clutter perception. We validated this model using a new $\text{90}-$image dataset of realistic scenes rank ordered by human raters for clutter, and showed that our method not only predicted clutter extremely well (Spearman's $\rho 0.81$, $p 0.05$), but also outperformed all existing clutter perception models and even a behavioral object segmentation ground truth. We conclude that the number of proto-objects in an image affects clutter perception more than the number of objects or features.