Goto

Collaborating Authors

 Personal


Pioneers of Reinforcement Learning Win the Turing Award

WIRED

In the 1980s, Andrew Barto and Rich Sutton were considered eccentric devotees to an elegant but ultimately doomed idea--having machines learn, as humans and animals do, from experience. Decades on, with the technique they pioneered now increasingly critical to modern artificial intelligence and programs like ChatGPT, Barto and Sutton have been awarded the Turing Award, the highest honor in the field of computer science. Barto, a professor emeritus at the University of Massachusetts Amherst, and Sutton, a professor at the University of Alberta, trailblazed a technique known as reinforcement learning, which involves coaxing a computer to perform tasks through experimentation combined with either positive or negative feedback. "When this work started for me, it was extremely unfashionable," Barto recalls with a smile, speaking over Zoom from his home in Massachusetts. "It's been remarkable that [it has] achieved some influence and some attention," Barto adds.


Open-Source Large Language Models as Multilingual Crowdworkers: Synthesizing Open-Domain Dialogues in Several Languages With No Examples in Targets and No Machine Translation

arXiv.org Artificial Intelligence

The prevailing paradigm in the domain of Open-Domain Dialogue agents predominantly focuses on the English language, encompassing both models and datasets. Furthermore, the financial and temporal investments required for crowdsourcing such datasets for finetuning are substantial, particularly when multiple languages are involved. Fortunately, advancements in Large Language Models (LLMs) have unveiled a plethora of possibilities across diverse tasks. Specifically, instruction-tuning has enabled LLMs to execute tasks based on natural language instructions, occasionally surpassing the performance of human crowdworkers. Additionally, these models possess the capability to function in various languages within a single thread. Consequently, to generate new samples in different languages, we propose leveraging these capabilities to replicate the data collection process. We introduce a pipeline for generating Open-Domain Dialogue data in multiple Target Languages using LLMs, with demonstrations provided in a unique Source Language. By eschewing explicit Machine Translation in this approach, we enhance the adherence to language-specific nuances. We apply this methodology to the PersonaChat dataset. To enhance the openness of generated dialogues and mimic real life scenarii, we added the notion of speech events corresponding to the type of conversation the speakers are involved in and also that of common ground which represents the premises of a conversation.


You Are the Best Reviewer of Your Own Papers: The Isotonic Mechanism

arXiv.org Artificial Intelligence

Machine learning (ML) and artificial intelligence (AI) conferences including NeurIPS and ICML have experienced a significant decline in peer review quality in recent years. To address this growing challenge, we introduce the Isotonic Mechanism, a computationally efficient approach to enhancing the accuracy of noisy review scores by incorporating authors' private assessments of their submissions. Under this mechanism, authors with multiple submissions are required to rank their papers in descending order of perceived quality. Subsequently, the raw review scores are calibrated based on this ranking to produce adjusted scores. We prove that authors are incentivized to truthfully report their rankings because doing so maximizes their expected utility, modeled as an additive convex function over the adjusted scores. Moreover, the adjusted scores are shown to be more accurate than the raw scores, with improvements being particularly significant when the noise level is high and the author has many submissions -- a scenario increasingly prevalent at large-scale ML/AI conferences. We further investigate whether submission quality information beyond a simple ranking can be truthfully elicited from authors. We establish that a necessary condition for truthful elicitation is that the mechanism be based on pairwise comparisons of the author's submissions. This result underscores the optimality of the Isotonic Mechanism, as it elicits the most fine-grained truthful information among all mechanisms we consider. We then present several extensions, including a demonstration that the mechanism maintains truthfulness even when authors have only partial rather than complete information about their submission quality. Finally, we discuss future research directions, focusing on the practical implementation of the mechanism and the further development of a theoretical framework inspired by our mechanism.


SAGE: Steering and Refining Dialog Generation with State-Action Augmentation

arXiv.org Artificial Intelligence

Recent advances in large language models have demonstrated impressive capabilities in task-oriented applications, yet building emotionally intelligent chatbots that can engage in natural, strategic conversations remains a challenge. We present a novel approach called SAGE that uses latent variables to control long-horizon behavior in dialogue generation. At the core of our method is the State-Action Chain (SAC), which augments standard language model fine-tuning by introducing latent variables that encapsulate emotional states and conversational strategies between dialogue turns. During inference, these variables are generated before each response, enabling coarse-grained control over dialogue progression while maintaining natural interaction patterns. We also introduce a self-improvement pipeline that leverages dialogue tree search, LLM-based reward modeling, and targeted fine-tuning to optimize conversational trajectories. Our experimental results show that models trained with this approach demonstrate improved performance in emotional intelligence metrics while maintaining strong capabilities on LLM benchmarks. The discrete nature of our latent variables facilitates search-based strategies and provides a foundation for future applications of reinforcement learning to dialogue systems, where learning can occur at the state level rather than the token level.


The New Yorker Film "I'm Not a Robot" Wins a 2025 Academy Award

The New Yorker

A film released by The New Yorker was among the winners at Sunday's Academy Awards. "I'm Not a Robot," a darkly comic portrayal of a woman trying to convince her computer that she is human, claimed the prize for Best Live Action Short. It is the second film released by the magazine to be honored with an Oscar. The film, written and directed by Victoria Warmerdam, opens with a seemingly typical office scene that quickly unravels. When the protagonist, a music producer, fails a series of CAPTCHA tests, she begins to question her own grip on reality.


Interactive Debugging and Steering of Multi-Agent AI Systems

arXiv.org Artificial Intelligence

Fully autonomous teams of LLM-powered AI agents are emerging that collaborate to perform complex tasks for users. What challenges do developers face when trying to build and debug these AI agent teams? In formative interviews with five AI agent developers, we identify core challenges: difficulty reviewing long agent conversations to localize errors, lack of support in current tools for interactive debugging, and the need for tool support to iterate on agent configuration. Based on these needs, we developed an interactive multi-agent debugging tool, AGDebugger, with a UI for browsing and sending messages, the ability to edit and reset prior agent messages, and an overview visualization for navigating complex message histories. In a two-part user study with 14 participants, we identify common user strategies for steering agents and highlight the importance of interactive message resets for debugging. Our studies deepen understanding of interfaces for debugging increasingly important agentic workflows.


Can AI Model the Complexities of Human Moral Decision-Making? A Qualitative Study of Kidney Allocation Decisions

arXiv.org Artificial Intelligence

A growing body of work in Ethical AI attempts to capture human moral judgments through simple computational models. The key question we address in this work is whether such simple AI models capture {the critical} nuances of moral decision-making by focusing on the use case of kidney allocation. We conducted twenty interviews where participants explained their rationale for their judgments about who should receive a kidney. We observe participants: (a) value patients' morally-relevant attributes to different degrees; (b) use diverse decision-making processes, citing heuristics to reduce decision complexity; (c) can change their opinions; (d) sometimes lack confidence in their decisions (e.g., due to incomplete information); and (e) express enthusiasm and concern regarding AI assisting humans in kidney allocation decisions. Based on these findings, we discuss challenges of computationally modeling moral judgments {as a stand-in for human input}, highlight drawbacks of current approaches, and suggest future directions to address these issues.


Congratulations to the #AAAI2025 outstanding paper award winners

AIHub

The AAAI 2025 outstanding paper awards were announced during the opening ceremony of the 39th Annual AAAI Conference on Artificial Intelligence on Thursday 27 February. Papers are recommended for consideration during the review process by members of the Program Committee. This year, three papers have been selected as outstanding papers, with a further paper being recognised in the special track on AI for social impact. Abstract: A fundamental task in multi-agent systems is to match agents to alternatives (e.g., resources or tasks). Often, this is accomplished by eliciting agents' ordinal rankings over the alternatives instead of their exact numerical utilities.


Reservoir Network with Structural Plasticity for Human Activity Recognition

arXiv.org Artificial Intelligence

--The unprecedented dissemination of edge devices is accompanied by a growing demand for neuromorphic chips that can process time-series data natively without cloud support. Echo state network (ESN) is a class of recurrent neural networks that can be used to identify unique patterns in time-series data and predict future events. It is known for minimal computing resource requirements and fast training, owing to the use of linear optimization solely at the readout stage. In this work, a custom-design neuromorphic chip based on ESN targeting edge devices is proposed. The proposed system supports various learning mechanisms, including structural plasticity and synaptic plasticity, locally on-chip. This provides the network with an additional degree of freedom to continuously learn, adapt, and alter its structure and sparsity level, ensuring high performance and continuous stability. We demonstrate the performance of the proposed system as well as its robustness to noise against real-world time-series datasets while considering various topologies of data movement. An average accuracy of 95.95% and 85.24% are achieved on human activity recognition and prosthetic finger control, respectively. HE last decade has seen significant advancement in neuromorphic computing with a major thrust centered around processing streaming data using recurrent neural networks (RNNs). Despite the fact RNNs demonstrate promising performance in numerous domains including speech recognition [1], computer vision [2], stock trading [3], and medical diagnosis [4], such networks suffer from slow convergence and intensive computations [5]. In order to bypass these challenges, Jaeger and Maass suggest leveraging the rich dynamics offered by the networks' recurrent connections and random parameters and limit the training to the network advanced layers, particularly the readout layer [7]-[9]. With that, the network training and its computation complexity are significantly simplified. There are three classes of RNN networks trained using this approach known as a liquid state machine (LSM) [7], delayed-feedback reservoir [10], [11], and echo state network (ESN) which is going to be the focus of this work. ESN is demonstrated in a variety of tasks, including pattern recognition, anomaly detection [12], spatial-temporal forecasting [13], and modeling dynamic motions in bio-mimic robots [14].


Engadget Podcast: iPhone 16e review and Amazon's AI-powered Alexa

Engadget

The keyword for the iPhone 16e seems to be "compromise." In this episode, Devindra chats with Cherlynn about her iPhone 16e review and try to figure out who this phone is actually for. Also, they dive into Amazon's Alexa event, where we finally learned more about the company's AI-powered voice assistant. Alexa seems useful, but can we trust it? Listen below or subscribe on your podcast app of choice. If you've got suggestions or topics you'd like covered on the show, be sure to email us or drop a note in the comments! And be sure to check out our other podcast, Engadget News! Framework unveils a cheap 2-in-1 laptop and a…modular desktop? Devindra: This week, it's the iPhone 16e, which Cherlynn has reviewed. We're going to get her full thoughts on that thing. And also, Amazon held an AI event this week. We expected a lot of devices, but they spent 75 minutes talking about Alexa plus, which is the AI powered Alexa. Cherlynn: we expected a lot of devices. Cherlynn: one, at least one it's been a while. Devindra: Mr. Panos Panay was there, the father of the service and no devices, just him talking about AI. Cherlynn: Oh, and stay tuned at the end of this episode. Uh, I, we included an interview that I did with, um, the vice president of Alexa to talk more about the new Alexa plus. Devindra: Anyway, folks, if you're enjoying the show, please be sure to subscribe to us on iTunes or your podcaster of choice, leave us a review on iTunes and drop us an email at podcast@engadget.com. You can also join us on our live [00:01:00] stream on Thursday mornings, typically around 11 a. m. Um, you'll see our faces. Sometimes we'll do Q& A and show off devices as well. This week, uh, Sherilyn has the iPhone 16e, which is the least, um, impressive thing to show off. It's just like, Hey, you have an iPhone from 10 years ago, five, a while ago, Devindra: last, was there a single camera back iPhone? Cherlynn: Oh God, before that was 11. So, you know, it's like a flashback. So let's talk about this thing, Sherlynn. And I checked out your review. First of all, you gave it a really, um, I think serviceable score. Your title is what's your acceptable compromise. And really when we were talking about it last week, it really was like compromise seemed like the key word. The thing we kept coming back to was like just one camera, no mag safe, no fast wireless charging. What are your overall thoughts on this thing? Cherlynn: I mean, so that headline is like all thanks to our EIC, Aaron [00:02:00]Souppouris, because I was like, where, where do I go from here? How do I, so, so he's right. It is like, instead of what's in your wallet, it's like, what are you willing to take out your wallet? I'll tell you the story. So yesterday I was at the Amazon devices and services event where there were no devices and A bunch of other reporters had gathered and we were all like, you know, the, like, review's going up soon, right?