Goto

Collaborating Authors

 mastermind


The Guilty Pleasure of the Heist

The New Yorker

Elaborate robberies are a Hollywood staple, and the real-life theft at the Louvre has become a phenomenon. Why are we riveted by this particular type of crime? On October 19th, a group of masked men broke into the Louvre in broad daylight and made off with some of France's crown jewels. Suspects are now in custody, but the online fervor is still going strong. On this episode of Critics at Large, Vinson Cunningham, Naomi Fry, and Alexandra Schwartz discuss the sordid satisfaction of watching a heist play out, both onscreen and off.


MastermindEval: A Simple But Scalable Reasoning Benchmark

Golde, Jonas, Haller, Patrick, Barth, Fabio, Akbik, Alan

arXiv.org Artificial Intelligence

Recent advancements in large language models (LLMs) have led to remarkable performance across a wide range of language understanding and mathematical tasks. As a result, increasing attention has been given to assessing the true reasoning capabilities of LLMs, driving research into commonsense, numerical, logical, and qualitative reasoning. However, with the rapid progress of reasoning-focused models such as OpenAI's o1 and DeepSeek's R1, there has been a growing demand for reasoning benchmarks that can keep pace with ongoing model developments. Our benchmark supports two evaluation paradigms: (1) agentic evaluation, in which the model autonomously plays the game, and (2) deductive reasoning evaluation, in which the model is given a pre-played game state with only one possible valid code to infer. In our experimental results we (1) find that even easy Mastermind instances are difficult for current models and (2) demonstrate that the benchmark is scalable to possibly more advanced models in the future Furthermore, we investigate possible reasons why models cannot deduce the final solution and find that current models are limited in deducing the concealed code as the number of statement to combine information from is increasing. Large language models (LLMs) have demonstrated remarkable performance across various text generation tasks, spanning both text and vision modalities (Grattafiori et al., 2024). These models, characterized by their large parameter counts, have proven effective in a wide range of language understanding tasks (Brown et al., 2020; Zhao et al., 2024).


\textsc{Perseus}: Tracing the Masterminds Behind Cryptocurrency Pump-and-Dump Schemes

Fu, Honglin, Feng, Yebo, Wu, Cong, Xu, Jiahua

arXiv.org Artificial Intelligence

Masterminds are entities organizing, coordinating, and orchestrating cryptocurrency pump-and-dump schemes, a form of trade-based manipulation undermining market integrity and causing financial losses for unwitting investors. Previous research detects pump-and-dump activities in the market, predicts the target cryptocurrency, and examines investors and \ac{osn} entities. However, these solutions do not address the root cause of the problem. There is a critical gap in identifying and tracing the masterminds involved in these schemes. In this research, we develop a detection system \textsc{Perseus}, which collects real-time data from the \acs{osn} and cryptocurrency markets. \textsc{Perseus} then constructs temporal attributed graphs that preserve the direction of information diffusion and the structure of the community while leveraging \ac{gnn} to identify the masterminds behind pump-and-dump activities. Our design of \textsc{Perseus} leads to higher F1 scores and precision than the \ac{sota} fraud detection method, achieving fast training and inferring speeds. Deployed in the real world from February 16 to October 9 2024, \textsc{Perseus} successfully detects $438$ masterminds who are efficient in the pump-and-dump information diffusion networks. \textsc{Perseus} provides regulators with an explanation of the risks of masterminds and oversight capabilities to mitigate the pump-and-dump schemes of cryptocurrency.


Deduction Game Framework and Information Set Entropy Search

Meng, Fandi, Lucas, Simon

arXiv.org Artificial Intelligence

We present a game framework tailored for deduction games, enabling structured analysis from the perspective of Shannon entropy variations. Additionally, we introduce a new forward search algorithm, Information Set Entropy Search (ISES), which effectively solves many single-player deduction games. The ISES algorithm, augmented with sampling techniques, allows agents to make decisions within controlled computational resources and time constraints. Experimental results on eight games within our framework demonstrate the significant superiority of our method over the Single Observer Information Set Monte Carlo Tree Search(SO-ISMCTS) algorithm under limited decision time constraints. The entropy variation of game states in our framework enables explainable decision-making, which can also be used to analyze the appeal of deduction games and provide insights for game designers.


AgentQuest: A Modular Benchmark Framework to Measure Progress and Improve LLM Agents

Gioacchini, Luca, Siracusano, Giuseppe, Sanvito, Davide, Gashteovski, Kiril, Friede, David, Bifulco, Roberto, Lawrence, Carolin

arXiv.org Artificial Intelligence

The advances made by Large Language Models (LLMs) have led to the pursuit of LLM agents that can solve intricate, multi-step reasoning tasks. As with any research pursuit, benchmarking and evaluation are key corner stones to efficient and reliable progress. However, existing benchmarks are often narrow and simply compute overall task success. To face these issues, we propose AgentQuest -- a framework where (i) both benchmarks and metrics are modular and easily extensible through well documented and easy-to-use APIs; (ii) we offer two new evaluation metrics that can reliably track LLM agent progress while solving a task. We exemplify the utility of the metrics on two use cases wherein we identify common failure points and refine the agent architecture to obtain a significant performance increase. Together with the research community, we hope to extend AgentQuest further and therefore we make it available under https://github.com/nec-research/agentquest.


Iran identifies alleged mastermind behind Soleimani memorial bombings that left nearly 100 dead: report

FOX News

Iran announced Thursday that it has identified the alleged mastermind behind dual suicide bombing attacks that left nearly 100 people dead at a recent memorial for late Gen. Qassem Soleimani, who was killed years ago by a U.S. drone strike. The IRNA news agency carried a statement by the intelligence ministry saying the main suspect who planned the Jan. 3 attack in Kerman, a city southeast of the Iranian capital of Tehran, was a Tajik national known by his alias Abdollah Tajiki. Tajiki reportedly entered the country in mid-December by crossing Iran's southeast border, and left two days before the attack, after making the bombs. One bomber first detonated his explosives at the ceremony in Kerman, then another attacked 20 minutes later as emergency workers and other people tried to help the wounded from the first explosion, according to The Associated Press. The report identified one of the bombers by his family name of Bozrov, saying the man was 24 years old and had Tajik and Israeli nationality.


On Optimal Strategies for Wordle and General Guessing Games

Cunanan, Michael, Thielscher, Michael

arXiv.org Artificial Intelligence

The recent popularity of Wordle has revived interest in guessing games. We develop a general method for finding optimal strategies for guessing games while avoiding an exhaustive search. Our main contributions are several theorems that build towards a general theory to prove the optimality of a strategy for a guessing game. This work is developed to apply to any guessing game, but we use Wordle as an example to present concrete results.


Retail Robots Are on the Rise--at Every Level of the Industry

#artificialintelligence

On our sidewalks, in our skies, in our every store… Over the next decade, robots will enter the mainstream of retail. As countless robots work behind the scenes to stock shelves, serve customers, and deliver products to our doorstep, the speed of retail will accelerate. These changes are already underway. In this blog, we'll elaborate on how robots are entering the retail ecosystem. On August 3rd, 2016, Domino's Pizza introduced the Domino's Robotic Unit, or "DRU" for short.


After AI, Fashion and Shopping Will Never Be the Same

#artificialintelligence

AI and broadband are eating retail for breakfast. In the first half of 2019, we've seen 19 retailer bankruptcies. And the retail apocalypse is only accelerating. What's coming next is astounding. Why drive when you can speak?


Within 10 Years, We'll Travel by Hyperloop, Rockets, and Avatars

#artificialintelligence

Try Hyperloop, rocket travel, and robotic avatars. Hyperloop is currently working towards 670 mph (1080 kph) passenger pods, capable of zipping us from Los Angeles to downtown Las Vegas in under 30 minutes. Rocket Travel (think SpaceX's Starship) promises to deliver you almost anywhere on the planet in under an hour. Think New York to Shanghai in 39 minutes. As 5G connectivity, hyper-realistic virtual reality, and next-gen robotics continue their exponential progress, the emergence of "robotic avatars" will all but nullify the concept of distance, replacing human travel with immediate remote telepresence.