Goto

Collaborating Authors

 bennet


IdentifyMe: A Challenging Long-Context Mention Resolution Benchmark

Manikantan, Kawshik, Tapaswi, Makarand, Gandhi, Vineet, Toshniwal, Shubham

arXiv.org Artificial Intelligence

Recent evaluations of LLMs on coreference resolution have revealed that traditional output formats and evaluation metrics do not fully capture the models' referential understanding. To address this, we introduce IdentifyMe, a new benchmark for mention resolution presented in a multiple-choice question (MCQ) format, commonly used for evaluating LLMs. IdentifyMe features long narratives and employs heuristics to exclude easily identifiable mentions, creating a more challenging task. The benchmark also consists of a curated mixture of different mention types and corresponding entities, allowing for a fine-grained analysis of model performance. We evaluate both closed- and open source LLMs on IdentifyMe and observe a significant performance gap (20-30%) between the state-of-the-art sub-10B open models vs. closed ones. We observe that pronominal mentions, which have limited surface information, are typically much harder for models to resolve than nominal mentions. Additionally, we find that LLMs often confuse entities when their mentions overlap in nested structures. The highest-scoring model, GPT-4o, achieves 81.9% accuracy, highlighting the strong referential capabilities of state-of-the-art LLMs while also indicating room for further improvement.


Improving Automatic Quotation Attribution in Literary Novels

Vishnubhotla, Krishnapriya, Rudzicz, Frank, Hirst, Graeme, Hammond, Adam

arXiv.org Artificial Intelligence

Current models for quotation attribution in literary novels assume varying levels of available information in their training and test data, which poses a challenge for in-the-wild inference. Here, we approach quotation attribution as a set of four interconnected sub-tasks: character identification, coreference resolution, quotation identification, and speaker attribution. We benchmark state-of-the-art models on each of these sub-tasks independently, using a large dataset of annotated coreferences and quotations in literary novels (the Project Dialogism Novel Corpus). We also train and evaluate models for the speaker attribution task in particular, showing that a simple sequential prediction model achieves accuracy scores on par with state-of-the-art models.


Expert argues against federal AI agency despite growing momentum for idea on Capitol Hill

FOX News

Center for A.I. Safety Director Dan Hendrycks explains concerns about how the rapid growth of artificial intelligence could impact society. People need to change how they're thinking about regulating artificial intelligence, according to a prominent expert in the field, who pushed back on an idea gaining traction among lawmakers to create a new government agency to regulate AI. "Regulation is a really hard question," Andres Sawicki, a professor of law and director of the business of innovation, law, and technology (BILT) concentration at the University of Miami, told Fox News Digital. "The topic of AI is too big to be handled in one big coherent manner." Rather than tackling AI in a sweeping, comprehensive way, Sawicki recommend a more pragmatic, piecemeal approach. "Look specifically and concretely at effects the technology is having, the impact of AI on this or that issue. There shouldn't be a Department of AI to handle this in one big swoop."



Podcast: AI finds its voice

MIT Technology Review

Today's voice assistants are still a far cry from the hyper-intelligent thinking machines we've been musing about for decades. And it's because that technology is actually the combination of three different skills: speech recognition, natural language processing and voice generation. Each of these skills already presents huge challenges. In order to master just the natural language processing part? You pretty much have to recreate human-level intelligence. Deep learning, the technology driving the current AI boom, can train machines to become masters at all sorts of tasks. But it can only learn one at a time. And because most AI models train their skillset on thousands or millions of existing examples, they end up replicating patterns within historical data--including the many bad decisions people have made, like marginalizing people of color and women. Still, systems like the board-game champion AlphaZero and the increasingly convincing fake-text generator GPT-3 have stoked the flames of debate regarding when humans will create an artificial general intelligence--machines that can multitask, think, and reason for themselves. In this episode, we explore how machines learn to communicate--and what it means for the humans on the other end of the conversation. This episode was produced by Jennifer Strong, Emma Cillekens, Anthony Green, Karen Hao and Charlotte Jee.


Colorado at the forefront of AI and what it means for jobs of the future

#artificialintelligence

A group of MIT researchers visited Lockheed Martin this month for a chance to talk about the future of artificial intelligence and automation. Liz Reynolds is the executive director of the MIT Task Force on the Work of the Future and says her job is to focus on the relationship between new technologies and how they will affect jobs. "Colorado is at the forefront of thinking about these things," Reynolds said. "All jobs will be affected by this technology." Earlier this year, U.S. Sen. Michael Bennet, D-Colo., created an artificial intelligence strategy group to take a closer look at how AI is being used in the state and how that will change in the future.


Atari founder, governor play pong with the future of work in the artificial intelligence age

#artificialintelligence

Take it from Gov. Jared Polis, a veteran of the tech startup scene introduced as Colorado's "innovator in chief" Wednesday at a Denver Startup Week panel on the evolution of technology and its impact on everyday life. "I was just at Amazon's new facility in Thornton," Polis said. "Inside, where we used to see human-operated forklifts they have little intelligent robots that are carrying the crates around." The question now, Polis said, is how will public policy take shape around that AI technology so that it supports innovation but keeps human beings relevant in the economy going forward? Polis sat opposite Nolan Bushnell during the session.