Rochester Institute of Technology
Multimodal Alignment for Affective Content
Haduong, Nikita (Indiana University) | Nester, David (Eastern Mennonite University) | Vaidyanathan, Preethi (Rochester Institute of Technology) | Prud' (Rochester Institute of Technology) | hommeaux, Emily (Rochester Institute of Technology) | Bailey, Reynold (Rochester Institute of Technology) | Alm, Cecilia O.
Humans routinely extract important information from images and videos, relying on their gaze. In contrast, computational systems still have difficulty annotating important visual information in a human-like manner, in part because human gaze is often not included in the modeling process. Human input is also particularly relevant for processing and interpreting affective visual information. To address this challenge, we captured human gaze, spoken language, and facial expressions simultaneously in an experiment with visual stimuli characterized by subjective and affective content. Observers described the content of complex emotional images and videos depicting positive and negative scenarios and also their feelings about the imagery being viewed. We explore patterns of these modalities, for example by comparing the affective nature of participant-elicited linguistic tokens with image valence. Additionally, we expand a framework for generating automatic alignments between the gaze and spoken language modalities for visual annotation of images. Multimodal alignment is challenging due to their varying temporal offset. We explore alignment robustness when images have affective content and whether image valence influences alignment results. We also study if word frequency-based filtering impacts results, with both the unfiltered and filtered scenarios performing better than baseline comparisons, and with filtering resulting in a substantial decrease in alignment error rate. We provide visualizations of the resulting annotations from multimodal alignment. This work has implications for areas such as image understanding, media accessibility, and multimodal data fusion.
Learning Disentangled Representation from 12-Lead Electrograms: Application in Localizing the Origin of Ventricular Tachycardia
Gyawali, Prashnna K. (Rochester Institute of Technology) | Horacek, B. Milan (Dalhousie University) | Sapp, John L. (Dalhousie University) | Wang, Linwei (Rochester Institute of Technology )
The increasing availability of electrocardiogram (ECG) data has motivated the use of data-driven models for automating various clinical tasks based on ECG data. The development of subject-specific models are limited by the cost and difficulty of obtaining sufficient training data for each individual. The alternative of population model, however, faces challenges caused by the significant inter-subject variations within the ECG data. We address this challenge by investigating for the first time the problem of learning representations for clinically-informative variables while disentangling other factors of variations within the ECG data. In this work, we present a conditional variational autoencoder (VAE) to extract the subject-specific adjustment to the ECG data, conditioned on task-specific representations learned from a deterministic encoder. To encourage the representation for inter-subject variations to be independent from the task-specific representation, maximum mean discrepancy is used to match all the moments between the distributions learned by the VAE conditioning on the code from the deterministic encoder. The learning of the task-specific representation is regularized by a weak supervision in the form of contrastive regularization. We apply the proposed method to a novel yet important clinical task of classifying the origin of ventricular tachycardia (VT) into pre-defined segments, demonstrating the efficacy of the proposed method against the standard VAE.
Constraint Satisfaction Techniques for Combinatorial Problems
Narváez, David E. (Rochester Institute of Technology)
In recent years, constraint satisfaction problems (CSPs) have drawn much attention due to their applications to several areas In its more general form, constraint satisfaction problems of industrial research. This research focus has brought (CSPs) consist of a set of variables X each taking values in a torrent of positive results in areas like SAT solvers, satisfiability a domain D and a set of constraints C involving variables modulo theories, answer set programming, etc. in X and operations over these variables. For instance, in These results often rely on the fact that even though determining Boolean satisfiability problems the domain D takes the form the satisfiability of a constraint program is NPhard, of {, } and the constraints are expressed over the operations many industrial applications exhibit constraints that computers,, . In the case of integer linear programs (ILP), are able to deal with easily. Benchmarks stemming from the domain of the variables is the set of integers, and the these applications often showcase the advantages of the different constraints are inequalities over the operations of addition techniques presented, and seldom are there references and multiplication.
Exploring the Use of Shatter for AllSAT Through Ramsey-Type Problems
Narváez, David E. (Rochester Institute of Technology)
In the context of SAT solvers, Shatter is a popular tool for symmetry breaking on CNF formulas. Nevertheless, little has been said about its use in the context of AllSAT problems. AllSAT has gained much popularity in recent years due to its many applications in domains like model checking, data mining, etc. One example of a particularly transparent application of AllSAT to other fields of computer science is computational Ramsey theory. In this paper we study the effect of incorporating Shatter to the workflow of using Boolean formulas to generate all possible edge colorings of a graph avoiding prescribed monochromatic subgraphs. We identify two drawbacks in the naïve use of Shatter to break the symmetries of Boolean formulas encoding Ramsey-type problems for graphs.
Measuring Catastrophic Forgetting in Neural Networks
Kemker, Ronald (Rochester Institute of Technology) | McClure, Marc (Rochester Institute of Technology) | Abitino, Angelina (Swarthmore College) | Hayes, Tyler L. (Rochester Institute of Technology) | Kanan, Christopher (Rochester Institute of Technology)
Deep neural networks are used in many state-of-the-art systems for machine perception. Once a network is trained to do a specific task, e.g., bird classification, it cannot easily be trained to do new tasks, e.g., incrementally learning to recognize additional bird species or learning an entirely different task such as flower recognition. When new tasks are added, typical deep neural networks are prone to catastrophically forgetting previous tasks. Networks that are capable of assimilating new information incrementally, much like how humans form new memories over time, will be more efficient than re-training the model from scratch each time a new task needs to be learned. There have been multiple attempts to develop schemes that mitigate catastrophic forgetting, but these methods have not been directly compared, the tests used to evaluate them vary considerably, and these methods have only been evaluated on small-scale problems (e.g., MNIST). In this paper, we introduce new metrics and benchmarks for directly comparing five different mechanisms designed to mitigate catastrophic forgetting in neural networks: regularization, ensembling, rehearsal, dual-memory, and sparse-coding. Our experiments on real-world images and sounds show that the mechanism(s) that are critical for optimal performance vary based on the incremental training paradigm and type of data being used, but they all demonstrate that the catastrophic forgetting problem is not yet solved.
Understanding Social Interpersonal Interaction via Synchronization Templates of Facial Events
Li, Rui (Rochester Institute of Technology) | Curhan, Jared (Massachussets Institute of Technology) | Hoque, Mohammed Ehsan (University of Rochester)
Automatic facial expression analysis in inter-personal communication is challenging. Not only because conversation partners' facial expressions mutually influence each other, but also because no correct interpretation of facial expressions is possible without taking social context into account. In this paper, we propose a probabilistic framework to model interactional synchronization between conversation partners based on their facial expressions. Interactional synchronization manifests temporal dynamics of conversation partners' mutual influence. In particular, the model allows us to discover a set of common and unique facial synchronization templates directly from natural interpersonal interaction without recourse to any predefined labeling schemes. The facial synchronization templates represent periodical facial event coordinations shared by multiple conversation pairs in a specific social context. We test our model on two different dyadic conversations of negotiation and job-interview. Based on the discovered facial event coordination, we are able to predict their conversation outcomes with higher accuracy than HMMs and GMMs.
Model AI Assignments 2018
Neller, Todd W. (Gettysburg College) | Butler, Zack (Rochester Institute of Technology) | Derbinsky, Nate (Northeastern University) | Furey, Heidi (Manhattan College) | Martin, Fred (University of Massachusetts Lowell) | Guerzhoy, Michael (University of Toronto) | Anders, Ariel (Massachusetts Institute of Technology) | Eckroth, Joshua (Stetson University)
The Model AI Assignments session seeks to gather and disseminate the best assignment designs of the Artificial Intelligence (AI) Education community. Recognizing that assignments form the core of student learning ex- perience, we here present abstracts of seven AI assign- ments from the 2018 session that are easily adoptable, playfully engaging, and flexible for a variety of instruc- tor needs.
The Complexity of Succinct Elections
Fitzsimmons, Zack (Rochester Institute of Technology) | Hemaspaandra, Edith (Rochester Institute of Technology)
The computational study of elections generally assumes that the preferences of the electorate come in as a list of votes. Depending on the context, it may be much more natural to represent the preferences of the electorate succinctly, as the distinct votes and their counts. Though the succinct representation may be exponentially smaller than the nonsuccinct, we find only one natural case where the complexity increases, in sharp contrast to the case where each voter has a weight, where the complexity usually increases.
The Opacity of Backbones
Hemaspaandra, Lane A. (University of Rochester) | Narváez, David E. (Rochester Institute of Technology)
A backbone of a boolean formula F is a collection S of its variables for which there is a unique partial assignment a S such that F [ a S ] is satisfiable (Monasson et al. 1999; Williams, Gomes, and Selman 2003). This paper studies the nontransparency of backbones. We show that, under the widely believed assumption that integer factoring is hard, there exist sets of boolean formulas that have obvious, nontrivial backbones yet finding the values, a S , of those backbones is intractable. We also show that, under the same assumption, there exist sets of boolean formulas that obviously have large backbones yet producing such a backbone S is intractable. Further, we show that if integer factoring is not merely worst-case hard but is frequently hard, as is widely believed, then the frequency of hardness in our two results is not too much less than that frequency.
Using Co-Captured Face, Gaze, and Verbal Reactions to Images of Varying Emotional Content for Analysis and Semantic Alignment
Gangji, Aliya (Muhlenberg College) | Walden, Trevor (Rochester Institute of Technology) | Vaidyanathan, Preethi (Rochester Institute of Technology) | Prud' (Rochester Institute of Technology) | hommeaux, Emily (Rochester Institute of Technology) | Bailey, Reynold (Rochester Institute of Technology) | Alm, Cecilia O.
Analyzing different modalities of expression can provide insights into the ways that humans interpret, label, and react to images. Such insights have the potential not only to advance our understanding of how humans coordinate these expressive modalities but also to enhance existing methodologies for common AI tasks such as image annotation and classification.We conducted an experiment that co-captured the facial expressions, eye movements, and spoken language data that observers produce while examining images of varying emotional content and responding to description-oriented vs. affect-oriented questions about those images. We analyzed the facial expressions produced by the observers in order to determine the connection between those expressions and an image's emotional content. We also explored the relationship between the valence of an image and the verbal responses to that image, and how that relationship relates to the nature of the prompt, using low-level lexical features and more complex affective features extracted from the observers' verbal responses.Finally, in order to integrate this multimodal data, we extended an existing bitext alignment framework to create meaningful pairings between narrated observations about images and the image regions indicated by eye movement data. The resulting annotations of image regions with words from observers' responses demonstrate the potential of bitext alignment for multimodal data integration and, from an application perspective, for annotation of open-domain images. In addition, we found that while responses to affect-oriented questions appear useful for image understanding, their holistic nature seems less helpful for image region annotation.