mara
Adaptive Multi-Agent Response Refinement in Conversational Systems
Jeong, Soyeong, Elangovan, Aparna, Yilmaz, Emine, Rokhlenko, Oleg
Large Language Models (LLMs) have demonstrated remarkable success in conversational systems by generating human-like responses. However, they can fall short, especially when required to account for personalization or specific knowledge. In real-life settings, it is impractical to rely on users to detect these errors and request a new response. One way to address this problem is to refine the response before returning it to the user. While existing approaches focus on refining responses within a single LLM, this method struggles to consider diverse aspects needed for effective conversations. In this work, we propose refining responses through a multi-agent framework, where each agent is assigned a specific role for each aspect. We focus on three key aspects crucial to conversational quality: factuality, personalization, and coherence. Each agent is responsible for reviewing and refining one of these aspects, and their feedback is then merged to improve the overall response. To enhance collaboration among them, we introduce a dynamic communication strategy. Instead of following a fixed sequence of agents, our approach adaptively selects and coordinates the most relevant agents based on the specific requirements of each query. We validate our framework on challenging conversational datasets, demonstrating that ours significantly outperforms relevant baselines, particularly in tasks involving knowledge or user's persona, or both.
Token-level Accept or Reject: A Micro Alignment Approach for Large Language Models
Zhang, Yang, Yu, Yu, Tang, Bo, Zhu, Yu, Sun, Chuxiong, Wei, Wenqiang, Hu, Jie, Xie, Zipeng, Li, Zhiyu, Xiong, Feiyu, Chung, Edward
With the rapid development of Large Language Models (LLMs), aligning these models with human preferences and values is critical to ensuring ethical and safe applications. However, existing alignment techniques such as RLHF or DPO often require direct fine-tuning on LLMs with billions of parameters, resulting in substantial computational costs and inefficiencies. To address this, we propose Micro token-level Accept-Reject Aligning (MARA) approach designed to operate independently of the language models. MARA simplifies the alignment process by decomposing sentence-level preference learning into token-level binary classification, where a compact three-layer fully-connected network determines whether candidate tokens are "Accepted" or "Rejected" as part of the response. Extensive experiments across seven different LLMs and three open-source datasets show that MARA achieves significant improvements in alignment performance while reducing computational costs. The source code and implementation details are publicly available at https://github.com/IAAR-Shanghai/MARA, and the trained models are released at https://huggingface.co/IAAR-Shanghai/MARA_AGENTS.
General Performance Evaluation for Competitive Resource Allocation Games via Unseen Payoff Estimation
Diamond, N'yoma, Murai, Fabricio
Many high-stakes decision-making problems, such as those found within cybersecurity and economics, can be modeled as competitive resource allocation games. In these games, multiple players must allocate limited resources to overcome their opponent(s), while minimizing any induced individual losses. However, existing means of assessing the performance of resource allocation algorithms are highly disparate and problem-dependent. As a result, evaluating such algorithms is unreliable or impossible in many contexts and applications, especially when considering differing levels of feedback. To resolve this problem, we propose a generalized definition of payoff which uses an arbitrary user-provided function. This unifies performance evaluation under all contexts and levels of feedback. Using this definition, we develop metrics for evaluating player performance, and estimators to approximate them under uncertainty (i.e., bandit or semi-bandit feedback). These metrics and their respective estimators provide a problem-agnostic means to contextualize and evaluate algorithm performance. To validate the accuracy of our estimator, we explore the Colonel Blotto ($\mathcal{CB}$) game as an example. To this end, we propose a graph-pruning approach to efficiently identify feasible opponent decisions, which are used in computing our estimation metrics. Using various resource allocation algorithms and game parameters, a suite of $\mathcal{CB}$ games are simulated and used to compute and evaluate the quality of our estimates. These simulations empirically show our approach to be highly accurate at estimating the metrics associated with the unseen outcomes of an opponent's latent behavior.
Artificial intelligence can now decipher 'world's oldest languages' that were carved into 5,000-year-old stones as fast as Google translate
The mysterious dialect of our ancient ancestors could finally be deciphered in full thanks to artificial intelligence. A million cuneiform tablets still exist in the world, experts estimate, but these writings left behind by ancient Mesopotamians require tedious work by archaeologists to translate and catalog their contents. It has been estimated that 90 percent of cuneiform texts remain untranslated. But now, a team of German researchers has figured out a new way to train computers to recognize cuneiform and even make the contents of millennia-old tablets searchable like a website, making it possible to digitize and assemble larger libraries of these ancient texts. This could unlock previously unknown details about ancient life, as the tablets contained details about feats as significant as temple construction, all the way down to squabbles as petty as customer service complaints.
Prominent Women in Tech Say They Don't Want to Join OpenAI's All-Male Board
Earlier this month, OpenAI's board abruptly fired its popular CEO, Sam Altman. The ouster shocked the tech world and rankled Altman's loyal employees, the vast majority of whom threatened to quit unless their boss was reinstated. After a chaotic five-day exile, Altman got his old job back--with a reconfigured, all-male board overseeing him, led by ex-Salesforce CEO and former Twitter board chair Bret Taylor. Right now, only three people sit on this provisional OpenAI board. Immediately prior to the failed coup, there were six.
Silicon Valley May Never Learn Its Lesson
Over and over during Sam Bankman-Fried's trial, lawyers showed pictures of the FTX founder living his best life. There he was at the Super Bowl flanked by Katy Perry and Orlando Bloom. There he was on a private jet, sleeping with his hands folded. There he was onstage, in shorts and a T-shirt, with Bill Clinton and Tony Blair. The very traits that made him a cause célèbre in Silicon Valley--his intellect, his obsession with scale, his story--turned into liabilities.
Sam Altman is tech's next household name -- if we survive the killer robots
Sam Altman may be tech's next household name, but many Americans probably haven't heard of him. To anyone outside San Francisco, Altman would probably seem like just another young tech CEO. He's a Stanford University dropout who sold a tech startup years ago for a fortune, and he's spent the past decade investing and coaching other entrepreneurs. He posts confident and sunny life advice on Twitter and peppers his conversation with references to line graphs. But in the past three months, Altman, 37, has rocketed to the top of the tech industry's power rankings on the back of OpenAI.
Dan O'Mara: Turning Robotics Education on its Head Sense Think Act Podcast #19
In this episode, Audrow Nash speaks to Dan O'Mara, who is the founder and COO of Circuit Launch and Mechlabs. Circuit Launch is a space for hardware entrepreneurs to work in Oakland, California, and Mechlabs is a project-based course to learn robotics. This interview is mostly about Mechlabs, but talks about the origins of Circuit Launch, including how it is not a maker or coworking space and its business model. For Mechlabs, we talk about several of its aspects that make it different than a university education in robotics, including how there are mentors not instructors, how projects are scoped, and how people are invited to work on what is most interesting to them. We also talk about the future of Mechlabs and how it fits with current universities.
Facial recognition regulation is surprisingly bipartisan
Bipartisanship in modern politics can seem kind of like an unbelievable, mythical creature. But in recent months, as Congress considered regulation of one of the most controversial topics it faces -- how, when, or if to use facial recognition -- we've gotten glimpses of a political unicorn. In two House Oversight and Reform committee hearings last summer, some of the most prominent Republicans and Democrats in the United States Congress joined together in calls for legislative reform. Proponents of regulation ranged from Rep. Alexandria Ocasio-Cortez (D-NY) to Rep. Jim Jordan (R-OH), a frequent Trump supporter on cable news. On Friday, Jordan was also appointed to the House Intelligence Committee to confront witnesses in public presidential impeachment hearings that begin this week.
gym-gazebo2, a toolkit for reinforcement learning using ROS 2 and Gazebo
Lopez, Nestor Gonzalez, Nuin, Yue Leire Erro, Moral, Elias Barba, Juan, Lander Usategui San, Rueda, Alejandro Solano, Vilches, Víctor Mayoral, Kojcev, Risto
This paper presents an upgraded, real world application oriented version of gym-gazebo, the Robot Operating System (ROS) and Gazebo based Reinforcement Learning (RL) toolkit, which complies with OpenAI Gym. The content discusses the new ROS 2 based software architecture and summarizes the results obtained using Proximal Policy Optimization (PPO). Ultimately, the output of this work presents a benchmarking system for robotics that allows different techniques and algorithms to be compared using the same virtual conditions. We have evaluated environments with different levels of complexity of the Modular Articulated Robotic Arm (MARA), reaching accuracies in the millimeter scale. The converged results show the feasibility and usefulness of the gym-gazebo 2 toolkit, its potential and applicability in industrial use cases, using modular robots.