Goto

Collaborating Authors

 Shaikh, Samira


The GEM Benchmark: Natural Language Generation, its Evaluation and Metrics

arXiv.org Artificial Intelligence

We introduce GEM, a living benchmark for natural language Generation (NLG), its Evaluation, and Metrics. Measuring progress in NLG relies on a constantly evolving ecosystem of automated metrics, datasets, and human evaluation standards. However, due to this moving target, new models often still evaluate on divergent anglo-centric corpora with well-established, but flawed, metrics. This disconnect makes it challenging to identify the limitations of current models and opportunities for progress. Addressing this limitation, GEM provides an environment in which models can easily be applied to a wide set of corpora and evaluation strategies can be tested. Regular updates to the benchmark will help NLG research become more multilingual and evolve the challenge alongside models. This paper serves as the description of the initial release for which we are organizing a shared task at our ACL 2021 Workshop and to which we invite the entire NLG community to participate.


Modeling Leadership Behavior of Players in Virtual Worlds

AAAI Conferences

In this article, we describe our method of modeling sociolinguistic behaviors of players in massively multi-player online games. The focus of this paper is leadership, as it is manifested by the participants engaged in discussion, and the automated modeling of this complex behavior in virtual worlds. We first approach the research question of modeling from a social science perspective, and ground our models in theories from human communication literature. We then adapt a two-tiered algorithmic model that derives certain mid-level sociolinguistic behaviors--such as Task Control, Topic Control and Disagreement from discourse linguistic indicators--and combines these in a weighted model to reveal the complex role of Leadership. The algorithm is evaluated by comparing its prediction of leaders against ground truth – the participants’ own ratings of leadership of themselves and their conversation peers. We find the algorithm performance to be considerably better than baseline.


Hedge Detection Using a Rewards and Penalties Approach

AAAI Conferences

Semantic and syntactic features found in text can be used in combination to statistically predict linguistic devices such as hedges in online chat. Some features are better indicators than others, and there are cases when multiple features need to be considered together to be useful. Once the features are identified, it becomes an optimization problem to find the best division of data. We have devised a genetic algorithm approach towards detecting hedges in online multi-party chat discourse. A system was created using rewards and penalties for matching features in tokenized text, so optimizing the reward and penalty amounts are the main challenge. Genetic algorithms, a subset of Evolutionary Algorithms, are great for optimization; as they are massively parallel directed searches, and therefore suited to finding the best ratio of integer rewards and penalties. “Evolutionary algorithms (EAs) utilize principles of natural selection and are robust adaptive search schemes suitable for searching nonlinear, discontinuous, and high-dimensional spaces. This class of algorithms is being increasingly applied to obtain optimal or near-optimal solutions to many complex real-world optimization problems” (Bonissone, et. al. 2006) We show results using 10-fold cross validation as commonly used in traditional machine learning. The best performance without further fine tuning is 79% in classifying whether an utterance in chat contains a hedge or not.


Modeling Socio-Cultural Phenomena in Online Multi-Party Discourse

AAAI Conferences

We present in this paper, the application of a novel approach to computational modeling, understanding and detection of social phenomena in online multi-party discourse. A two-tiered approach was developed to detect a collection of social phenomena deployed by participants, such as topic control, task control, disagreement and involvement. We discuss how the mid-level social phenomena can be reliably detected in discourse and these measures can be used to differentiate participants of online discourse. Our approach works across different types of online chat and we show results on two specific data sets.