Goto

Collaborating Authors

 ava


AVA: Attentive VLM Agent for Mastering StarCraft II

Ma, Weiyu, Fu, Yuqian, Zhang, Zecheng, Ghanem, Bernard, Li, Guohao

arXiv.org Artificial Intelligence

We introduce Attentive VLM Agent (AVA), a multimodal StarCraft II agent that aligns artificial agent perception with the human gameplay experience. Traditional frameworks such as SMAC rely on abstract state representations that diverge significantly from human perception, limiting the ecological validity of agent behavior. Our agent addresses this limitation by incorporating RGB visual inputs and natural language observations that more closely simulate human cognitive processes during gameplay. The AVA architecture consists of three integrated components: (1) a vision-language model enhanced with specialized self-attention mechanisms for strategic unit targeting and battlefield assessment, (2) a retrieval-augmented generation system that leverages domain-specific StarCraft II knowledge to inform tactical decisions, and (3) a dynamic role-based task distribution system that enables coordinated multi-agent behavior. The experimental evaluation in our proposed AVACraft environment, which contains 21 multimodal StarCraft II scenarios, demonstrates that AVA powered by foundation models (specifically Qwen-VL and GPT-4o) can execute complex tactical maneuvers without explicit training, achieving comparable performance to traditional MARL methods that require substantial training iterations. This work establishes a foundation for developing human-aligned StarCraft II agents and advances the broader research agenda of multimodal game AI. Our implementation is available at https://github.com/camel-ai/VLM-Play-StarCraft2.


Fine-grained Hallucination Detection and Editing for Language Models

Mishra, Abhika, Asai, Akari, Balachandran, Vidhisha, Wang, Yizhong, Neubig, Graham, Tsvetkov, Yulia, Hajishirzi, Hannaneh

arXiv.org Artificial Intelligence

Large language models (LMs) are prone to generate diverse factually incorrect statements, which are widely called hallucinations. Current approaches predominantly focus on coarse-grained automatic hallucination detection or editing, overlooking nuanced error levels. In this paper, we propose a novel task -- automatic fine-grained hallucination detection -- and present a comprehensive taxonomy encompassing six hierarchically defined types of hallucination. To facilitate evaluation, we introduce a new benchmark that includes fine-grained human judgments on two LM outputs across various domains. Our analysis reveals that ChatGPT and Llama 2-Chat exhibit hallucinations in 60% and 75% of their outputs, respectively, and a majority of these hallucinations fall into categories that have been underexplored. As an initial step to address this, we train FAVA, a retrieval-augmented LM by carefully designing synthetic data generations to detect and correct fine-grained hallucinations. On our benchmark, our automatic and human evaluations show that FAVA significantly outperforms ChatGPT on fine-grained hallucination detection by a large margin though a large room for future improvement still exists. FAVA's suggested edits also improve the factuality of LM-generated text, resulting in 5-10% FActScore improvements.


AVA: Towards Autonomous Visualization Agents through Visual Perception-Driven Decision-Making

Liu, Shusen, Miao, Haichao, Li, Zhimin, Olson, Matthew, Pascucci, Valerio, Bremer, Peer-Timo

arXiv.org Artificial Intelligence

With recent advances in multi-modal foundation models, the previously text-only large language models (LLM) have evolved to incorporate visual input, opening up unprecedented opportunities for various applications in visualization. Our work explores the utilization of the visual perception ability of multi-modal LLMs to develop Autonomous Visualization Agents (AVAs) that can interpret and accomplish user-defined visualization objectives through natural language. We propose the first framework for the design of AVAs and present several usage scenarios intended to demonstrate the general applicability of the proposed paradigm. The addition of visual perception allows AVAs to act as the virtual visualization assistant for domain experts who may lack the knowledge or expertise in fine-tuning visualization outputs. Our preliminary exploration and proof-of-concept agents suggest that this approach can be widely applicable whenever the choices of appropriate visualization parameters require the interpretation of previous visual output. Feedback from unstructured interviews with experts in AI research, medical visualization, and radiology has been incorporated, highlighting the practicality and potential of AVAs. Our study indicates that AVAs represent a general paradigm for designing intelligent visualization systems that can achieve high-level visualization goals, which pave the way for developing expert-level visualization agents in the future.


Somehow, Airline Customer Service Is Getting Even Worse

The Atlantic - Technology

In early 2020, when the coronavirus was still a distant concern, my wife and I booked an AirAsia flight to Bali. At the start of lockdown, we scrambled to secure a refund. We called the airline's customer-support line: no dice. We pleaded with its online chatbot, a lobotomized character named AVA. We sent a Twitter message to the brand on March 17 and received a response seven weeks later that read, in full, "Twitter Feedback."


Judging Adam: Studying the Performance of Optimization Methods on ML4SE Tasks

Pasechnyuk, Dmitry, Prazdnichnykh, Anton, Evtikhiev, Mikhail, Bryksin, Timofey

arXiv.org Artificial Intelligence

Solving a problem with a deep learning model requires researchers to optimize the loss function with a certain optimization method. The research community has developed more than a hundred different optimizers, yet there is scarce data on optimizer performance in various tasks. In particular, none of the benchmarks test the performance of optimizers on source code-related problems. However, existing benchmark data indicates that certain optimizers may be more efficient for particular domains. In this work, we test the performance of various optimizers on deep learning models for source code and find that the choice of an optimizer can have a significant impact on the model quality, with up to two-fold score differences between some of the relatively well-performing optimizers. We also find that RAdam optimizer (and its modification with the Lookahead envelope) is the best optimizer that almost always performs well on the tasks we consider. Our findings show a need for a more extensive study of the optimizers in code-related tasks, and indicate that the ML4SE community should consider using RAdam instead of Adam as the default optimizer for code-related deep learning tasks.


Using coevolution and substitution of the fittest for health and well-being recommender systems

Alcaraz-Herrera, Hugo, Cartlidge, John

arXiv.org Artificial Intelligence

This research explores substitution of the fittest (SF), a technique designed to counteract the problem of disengagement in two-population competitive coevolutionary genetic algorithms. SF is domain-independent and requires no calibration. We first perform a controlled comparative evaluation of SF's ability to maintain engagement and discover optimal solutions in a minimal toy domain. Experimental results demonstrate that SF is able to maintain engagement better than other techniques in the literature. We then address the more complex real-world problem of evolving recommendations for health and well-being. We introduce a coevolutionary extension of EvoRecSys, a previously published evolutionary recommender system. We demonstrate that SF is able to maintain engagement better than other techniques in the literature, and the resultant recommendations using SF are higher quality and more diverse than those produced by EvoRecSys.


Artificial virtuous agents in a multiagent tragedy of the commons

Stenseke, Jakob

arXiv.org Artificial Intelligence

Although virtue ethics has repeatedly been proposed as a suitable framework for the development of artificial moral agents (AMAs), it has been proven difficult to approach from a computational perspective. In this work, we present the first technical implementation of artificial virtuous agents (AVAs) in moral simulations. First, we review previous conceptual and technical work in artificial virtue ethics and describe a functionalistic path to AVAs based on dispositional virtues, bottom-up learning, and top-down eudaimonic reward. We then provide the details of a technical implementation in a moral simulation based on a tragedy of the commons scenario. The experimental results show how the AVAs learn to tackle cooperation problems while exhibiting core features of their theoretical counterpart, including moral character, dispositional virtues, learning from experience, and the pursuit of eudaimonia. Ultimately, we argue that virtue ethics provides a compelling path toward morally excellent machines and that our work provides an important starting point for such endeavors.


Ex Machina: Ava The Final Girl

#artificialintelligence

After I watched Men, I went to see what others had to say about it, and the first place I went to was a recorded conversion about the film on Diregentleman's channel. Toward the end of the conversation, Henry Galley says Men further diminished Garland's previous two films. Personally, I didn't get that in regards to Annihilation, but Ex Machina, on the other hand, I hadn't seen before. I did not watch Garland's directorial debut in 2014. And my reason is that I have been obsessed with pop culture about robotic A.I. ever since I was a kid from Astro Boy (circa.


Study uses AI method for better insight into Crohn's disease - Mental Daily

#artificialintelligence

According to a study, published in the journal Genome Medicine, a team of researchers at Rutgers University was able to develop an artificial intelligence (AI) method that may provide more insight into Crohn's disease. Crohn's disease, an inflammatory bowel disease, is characterized by various traits that can affect any part of the gut. It is estimated that the disease could affect close to 800,000 adults in the U.S., according to the study's co-authors. As such, researchers have pivoted their attention to AI for a more comprehensive understanding of identifying and treating Crohn's disease. For the study, the team investigated genetic signatures associated with the illness in 111 participants.


Artificial Intelligence at Netflix - Two Current Use-Cases

#artificialintelligence

Netflix launched in 1997 as a mail-based DVD rental business. Alongside the growing US DVD market in the late 1990s and early 2000s, Netflix's business grew and the company went public in 2002. Netflix posted its first profit a year later. By 2007, Netflix introduced its streaming service, and by 2013, the company began producing original content. Today, Netflix is one of the world's largest entertainment services with over 200 million paid memberships spanning 190 countries, according to the company's 2020 Annual Report.