Goto

Collaborating Authors

 avenger


The Avengers: A Simple Recipe for Uniting Smaller Language Models to Challenge Proprietary Giants

Zhang, Yiqun, Li, Hao, Wang, Chenxu, Chen, Linyao, Zhang, Qiaosheng, Ye, Peng, Feng, Shi, Wang, Daling, Wang, Zhen, Wang, Xinrun, Xu, Jia, Bai, Lei, Ouyang, Wanli, Hu, Shuyue

arXiv.org Artificial Intelligence

Proprietary giants are increasingly dominating the race for ever-larger language models. Can open-source, smaller models remain competitive across a broad range of tasks? In this paper, we present the Avengers -- a simple recipe that leverages the collective intelligence of these smaller models. The Avengers builds upon four lightweight operations: (i) embedding: encode queries using a text embedding model; (ii) clustering: group queries based on their semantic similarity; (iii) scoring: scores each model's performance within each cluster; and (iv) voting: improve outputs via repeated sampling and voting. At inference time, each query is embedded and assigned to its nearest cluster. The top-performing model(s) within that cluster are selected to generate the response with repeated sampling. Remarkably, with 10 open-source models (~7B parameters each), the Avengers surpasses GPT-4o, 4.1, and 4.5 in average performance across 15 diverse datasets spanning mathematics, coding, logical reasoning, general knowledge, and affective tasks. In particular, it surpasses GPT-4.1 on mathematics tasks by 18.21% and on code tasks by 7.46%. Furthermore, the Avengers delivers superior out-of-distribution generalization, and remains robust across various embedding models, clustering algorithms, ensemble strategies, and values of its sole parameter -- the number of clusters.


Netflix's Most Expensive Movie Ever Is Here, and It's a Monumental Disaster

Slate

When he got his first glimpse of a movie studio, Orson Welles excitedly proclaimed it "the biggest electric train set any boy ever had." But with a reported budget of more than 300 million, Joe and Anthony Russo's The Electric State makes Welles' train set look like a busted caboose. The most expensive movie in Netflix's history, it's also among the costliest of all time, joining a list that includes the brothers' own Avengers: Infinity War and Avengers: Endgame. If the Russos are the most profligate creators in history--their Amazon series Citadel is also one of the most expensive TV shows ever made--they're among the most successful too. And yet for all the money they're making, and all that they're allowed to spend, they don't seem to be enjoying themselves very much.


MTRAG: A Multi-Turn Conversational Benchmark for Evaluating Retrieval-Augmented Generation Systems

Katsis, Yannis, Rosenthal, Sara, Fadnis, Kshitij, Gunasekara, Chulaka, Lee, Young-Suk, Popa, Lucian, Shah, Vraj, Zhu, Huaiyu, Contractor, Danish, Danilevsky, Marina

arXiv.org Artificial Intelligence

Retrieval-augmented generation (RAG) has recently become a very popular task for Large Language Models (LLMs). Evaluating them on multi-turn RAG conversations, where the system is asked to generate a response to a question in the context of a preceding conversation is an important and often overlooked task with several additional challenges. We present MTRAG: an end-to-end human-generated multi-turn RAG benchmark that reflects several real-world properties across diverse dimensions for evaluating the full RAG pipeline. MTRAG contains 110 conversations averaging 7.7 turns each across four domains for a total of 842 tasks. We also explore automation paths via synthetic data and LLM-as-a-Judge evaluation. Our human and automatic evaluations show that even state-of-the-art LLM RAG systems struggle on MTRAG. We demonstrate the need for strong retrieval and generation systems that can handle later turns, unanswerable questions, non-standalone questions, and multiple domains. MTRAG is available at https://github.com/ibm/mt-rag-benchmark.


Robert Downey Jr. won't let AI recreate his likeness in Hollywood: 'I intend to sue'

FOX News

Robert Downey Jr. praised Jon Favreau for being ambitious in his filmmaking, shouting out many films he has directed, including'The Lion King' and'The Jungle Book.' Robert Downey Jr. might be devoid of iron, but he's sure got some steel. The Academy Award-winning actor, 59, is speaking out about rapid technological advancements and how he plans to fight back if his name and likeness are manipulated by artificial intelligence. "I intend to sue," he told the "On with Kara Swisher" podcast. HOLLYWOOD EXECS WARN AI STEALS JOBS BUT CAN'T DO JOB OF TRUE ARTISTS: 'I WANT TO WORK WITH HUMAN BEINGS' Robert Downey Jr. says he plans to sue if someone manipulates his likeness through artificial intelligence. It all comes back to Downey Jr.'s alter ego, Tony Stark, whose own alter ego is Iron Man.


A Day in the Life of the Guy Who Harassed You on a Dating App

The New Yorker

I wake up and immediately open Bumble. I swipe until I match with a woman who writes in her profile that the "Mamma Mia!" movies are better than the "Avengers" films. I promptly send her a message letting her know that she is wrong. The "Avengers" franchise is worth $14.3 billion, and the childish "Mamma Mia!" movies raked in a measly $1.1 billion. I tell her she can thank me for this information by getting a drink with me tonight.


Can an AI program really write a good movie? Here's a test

The Guardian

The rise of AI programs like ChatGPT has triggered a tidal wave of ethical handwringing, most prominently from within the industries that it threatens to destroy. After all, just because you can get a robot to instantly write code or write contracts or provide customer support for free, should you? Well, the answer from the Writers Guild of America is a qualified yes. This week, the Writers Guild of America proposed that ChatGPT would absolutely be allowed to write scripts in the future, provided that the credit (and the money) goes to the human writer who came up with the prompts in the first place. The proposal paints a scary picture of the future; a future in which even the most human of arts are crushed under the wheels of an unthinking technology.


AI-Vengers

#artificialintelligence

At a recent team meeting, different members were assigned Avenger roles reflecting what they brought to the problem at hand. Since I am not much of an Avengers fan, I did what anyone would do in such a situation: ask their children to explain the relevant concepts. As I learned more about the Avengers' roles and responsibilities, I couldn't help but notice many parallels to AI. For example, Nick Fury is the founder of the Avengers, responsible for bringing them together and protecting them from internal strife and external threats. Before the Avengers formally unite in the 2012 film, Nick Fury has a shadowy role in early Marvel Cinematic Universe (MCU) films such as Thor and Iron Man.


An Approach to Inference-Driven Dialogue Management within a Social Chatbot

Finch, Sarah E., Finch, James D., Huryn, Daniil, Hutsell, William, Huang, Xiaoyuan, He, Han, Choi, Jinho D.

arXiv.org Artificial Intelligence

We present a chatbot implementing a novel dialogue management approach based on logical inference. Instead of framing conversation a sequence of response generation tasks, we model conversation as a collaborative inference process in which speakers share information to synthesize new knowledge in real time. Our chatbot pipeline accomplishes this modelling in three broad stages. The first stage translates user utterances into a symbolic predicate representation. The second stage then uses this structured representation in conjunction with a larger knowledge base to synthesize new predicates using efficient graph matching. In the third and final stage, our bot selects a small subset of predicates and translates them into an English response. This approach lends itself to understanding latent semantics of user inputs, flexible initiative taking, and responses that are novel and coherent with the dialogue context.


Royal Navy unveils concept images for ambitious autonomous fleet

Daily Mail - Science & tech

They may seem like something out of The Avengers film franchise, but these ambitious concepts of revolutionary warships are actually part of the Royal Navy's vision of what the British fleet could look like in the future. Detailed proposals for four potential vehicles, created by young engineers, have been released, including a stealth submarine carrier and a huge flying drone station which would be attached to a helium balloon and based in the stratosphere. The idea is that attack drones shaped like conventional airplanes could then be launched from the station'at a moment's notice' before shooting down towards Earth and potentially gliding just beneath the water in a stealth mode and smashing into an enemy ship. The Royal Navy hasn't disclosed anticipated costs of bringing to life the newly-revealed concepts, which have been described as one expert involved in British defence and security operations as very much'in the realm of speculative thinking'. They have been put forward by young engineers from industry and academia as part of a challenge posed by the UK Naval Engineering Science and Technology (UKNEST), aimed at helping the Royal Navy to develop ideas for an autonomous fleet that could shape how it operates over the next 50 years.


Artificial Intelligence in Film Industry is Sophisticating Production

#artificialintelligence

Artificial intelligence in filmmaking might sound futuristic, but we have reached this place. Technology is already making a significant impact on film production. Today, most of the outperforming movies that come under the visual effects category are using machine learning and AI for filmmaking. Significant pictures like'The Irishman' and'Avengers: Endgame' are no different. It won't be a wonder if the next movie you watch is written by AI, performed by robots, and animated and rendered by a deep learning algorithm.