roommate
Charlie Kirk Shooting Suspect Charged as Prosecutor Seeks Death Penalty
In the indictment, prosecutors claim Tyler Robinson planned Kirk's killing in advance, citing rooftop surveillance, engraved bullets, and a written note as they seek the death penalty. A TV monitor displays a picture of Tyler Robinson, a suspect in the killing of Charlie Kirk in Orem, Utah. Utah County prosecutors on Tuesday charged Tyler Robinson in the shooting death of conservative activist Charlie Kirk at Utah Valley University, a murder officials say was politically motivated. They intend to seek the death penalty. Utah County Attorney Jeff Gray announced the indictment at a midday news conference, listing charges of aggravated murder, felony discharge of a firearm causing serious bodily injury, and commission of a violent offense in the presence of a child.
- North America > United States > Utah > Utah County > Orem (0.25)
- North America > United States > Texas (0.05)
- North America > United States > New York (0.05)
- (5 more...)
Finding Personalized Good-Enough Solutions to Unsatisfiable Stable Roommates Problems
The Stable Roommates problems are characterized by the preferences of agents over other agents as roommates. A solution is a partition of the agents into pairs that are acceptable to each other (i.e., they are in the preference lists of each other), and the matching is stable (i.e., there do not exist any two agents who prefer each other to their roommates, and thus block the matching). Motivated by real-world applications, and considering that stable roommates problems do not always have solutions, we continue our studies to compute "good-enough" matchings. In addition to the agents' habits and habitual preferences, we consider their networks of preferred friends, and introduce a method to generate personalized solutions to stable roommates problems. We illustrate the usefulness of our method with examples and empirical evaluations.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.04)
- Asia > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.04)
- Questionnaire & Opinion Survey (1.00)
- Personal > Interview (1.00)
PredictaBoard: Benchmarking LLM Score Predictability
Pacchiardi, Lorenzo, Voudouris, Konstantinos, Slater, Ben, Martínez-Plumed, Fernando, Hernández-Orallo, José, Zhou, Lexin, Schellaert, Wout
Despite possessing impressive skills, Large Language Models (LLMs) often fail unpredictably, demonstrating inconsistent success in even basic common sense reasoning tasks. This unpredictability poses a significant challenge to ensuring their safe deployment, as identifying and operating within a reliable "safe zone" is essential for mitigating risks. To address this, we present PredictaBoard, a novel collaborative benchmarking framework designed to evaluate the ability of score predictors (referred to as assessors) to anticipate LLM errors on specific task instances (i.e., prompts) from existing datasets. PredictaBoard evaluates pairs of LLMs and assessors by considering the rejection rate at different tolerance errors. As such, PredictaBoard stimulates research into developing better assessors and making LLMs more predictable, not only with a higher average performance. We conduct illustrative experiments using baseline assessors and state-of-the-art LLMs. PredictaBoard highlights the critical need to evaluate predictability alongside performance, paving the way for safer AI systems where errors are not only minimised but also anticipated and effectively mitigated. Code for our benchmark can be found at https://github.com/Kinds-of-Intelligence-CFI/PredictaBoard
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- North America > Canada (0.04)
- North America > United States > Massachusetts (0.04)
- (6 more...)
- Law (1.00)
- Health & Medicine (0.67)
- Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.46)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.93)
Reinforcement Learning for Long-Horizon Interactive LLM Agents
Chen, Kevin, Cusumano-Towner, Marco, Huval, Brody, Petrenko, Aleksei, Hamburger, Jackson, Koltun, Vladlen, Krähenbühl, Philipp
Interactive digital agents (IDAs) leverage APIs of stateful digital environments to perform tasks in response to user requests. While IDAs powered by instruction-tuned large language models (LLMs) can react to feedback from interface invocations in multi-step exchanges, they have not been trained in their respective digital environments. Prior methods accomplish less than half of tasks in sophisticated benchmarks such as AppWorld. We present a reinforcement learning (RL) approach that trains IDAs directly in their target environments. We formalize this training as a partially observable Markov decision process and derive LOOP, a data- and memory-efficient variant of proximal policy optimization. LOOP uses no value network and maintains exactly one copy of the underlying LLM in memory, making its implementation straightforward and as memory-efficient as fine-tuning a single LLM. A 32-billion-parameter agent trained with LOOP in the AppWorld environment outperforms the much larger OpenAI o1 agent by 9 percentage points (15% relative). To our knowledge, this is the first reported application of RL to IDAs that interact with a stateful, multi-domain, multi-app environment via direct API calls. Our analysis sheds light on the effectiveness of RL in this area, showing that the agent learns to consult the API documentation, avoid unwarranted assumptions, minimize confabulation, and recover from setbacks.
- Workflow (0.95)
- Research Report (0.64)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)
Synergistic Simulations: Multi-Agent Problem Solving with Large Language Models
Sprigler, Asher, Drobek, Alexander, Weinstock, Keagan, Tapsoba, Wendpanga, Childress, Gavin, Dao, Andy, Gral, Lucas
Large Language Models (LLMs) have increasingly demonstrated the ability to facilitate the development of multi-agent systems that allow the interpretation of thoughts and actions generated by each individual. Promising advancements have also been made in LLM-based interaction with existing worlds, particularly in interacting with simulated environments. This paper aims to integrate both aforementioned topics (agents & world interaction) into a single simulation where multiple agents can work together to solve a problem, modeling how groups of humans can often solve problems better than individuals. By showing whether LLMs demonstrate the synergy of human collaboration, it could lead to advancements in the applications of LLMs. We implemented two simulations: a physical studio apartment with two roommates, and another where agents collaborate to complete a programming task. We provide a multi-agent framework, discuss the performance of the agents in each simulation, and discuss potential future additions.
DialSim: A Real-Time Simulator for Evaluating Long-Term Dialogue Understanding of Conversational Agents
Kim, Jiho, Chay, Woosog, Hwang, Hyeonji, Kyung, Daeun, Chung, Hyunseung, Cho, Eunbyeol, Jo, Yohan, Choi, Edward
Recent advancements in Large Language Models (LLMs) have significantly enhanced the capabilities of conversational agents, making them applicable to various fields (e.g., education). Despite their progress, the evaluation of the agents often overlooks the complexities of real-world conversations, such as real-time interactions, multi-party dialogues, and extended contextual dependencies. To bridge this gap, we introduce DialSim, a real-time dialogue simulator. In this simulator, an agent is assigned the role of a character from popular TV shows, requiring it to respond to spontaneous questions using past dialogue information and to distinguish between known and unknown information. Key features of DialSim include evaluating the agent's ability to respond within a reasonable time limit, handling long-term multi-party dialogues, and managing adversarial settings (e.g., swap character names) to challenge the agent's reliance on pre-trained knowledge. We utilized this simulator to evaluate the latest conversational agents and analyze their limitations. Our experiments highlight both the strengths and weaknesses of these agents, providing valuable insights for future improvements in the field of conversational AI.
- Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
- South America > Colombia > Meta Department > Villavicencio (0.04)
- North America > United States > Washington > King County > Seattle (0.04)
- (9 more...)
- Media > Television (1.00)
- Leisure & Entertainment (1.00)
- Information Technology (1.00)
- Education (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.97)
Lawn and order: the evergreen appeal of grass-cutting in video games
Jessica used to come for tea on Tuesdays, and all she wanted to do was cut grass. Because she was a couple of years younger than me, she couldn't encounter a ChuChu or a Bokoblin without dying, so instead she'd spend hours slicing at virtual greenery. At the time, I found it a little annoying. In hindsight, I understand that Jessica was simply following in the footsteps of our ancestors. Grass-cutting has been a mainstay of video games for decades.
How Jensen Huang's Nvidia Is Powering the A.I. Revolution
The revelation that ChatGPT, the astonishing artificial-intelligence chatbot, had been trained on an Nvidia supercomputer spurred one of the largest single-day gains in stock-market history. When the Nasdaq opened on May 25, 2023, Nvidia's value increased by about two hundred billion dollars. A few months earlier, Jensen Huang, Nvidia's C.E.O., had informed investors that Nvidia had sold similar supercomputers to fifty of America's hundred largest companies. By the close of trading, Nvidia was the sixth most valuable corporation on earth, worth more than Walmart and ExxonMobil combined. Huang's business position can be compared to that of Samuel Brannan, the celebrated vender of prospecting supplies in San Francisco in the late eighteen-forties.
- North America > United States > California > San Francisco County > San Francisco (0.25)
- North America > United States > Oregon (0.05)
- Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.05)
- (5 more...)
- Information Technology > Hardware (1.00)
- Education (1.00)