dara
True Multimodal In-Context Learning Needs Attention to the Visual Context
Chen, Shuo, Liu, Jianzhe, Han, Zhen, Xia, Yan, Cremers, Daniel, Torr, Philip, Tresp, Volker, Gu, Jindong
Multimodal Large Language Models (MLLMs), built on powerful language backbones, have enabled Multimodal In-Context Learning (MICL)-adapting to new tasks from a few multimodal demonstrations consisting of images, questions, and answers. Despite showing noticeable improvement on standard vision-language datasets, current MLLMs struggle to leverage visual information in the demonstrations. Specifically, they tend to neglect visual cues and over-rely on textual patterns, leading to mere text imitation rather than genuine multimodal adaptation. This behavior makes MICL still unimodal and largely restricts its practical utility. More importantly, this limitation is often concealed by the improved performance on tasks that do not require understanding the visual context. As a result, how to effectively enhance MICL ability and reliably evaluate the MICL performance remains underexplored. To address these issues, we first introduce Dynamic Attention Reallocation (DARA), an efficient fine-tuning strategy that encourages models to attend to the visual context by rebalancing attention across visual and textual tokens. In addition, we present TrueMICL, an MICL-dedicated dataset with both support and test sets that explicitly requires the integration of multimodal information-particularly visual content-for correct task completion. Extensive experiments demonstrate the effectiveness of our holistic solution, showcasing substantial improvements in the true multimodal in-context learning capabilities. Code and datasets are available at https://chenxshuo.github.io/true-micl-colm .
- Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
- North America > Canada (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Asia > China (0.04)
DARA: Decomposition-Alignment-Reasoning Autonomous Language Agent for Question Answering over Knowledge Graphs
Fang, Haishuo, Zhu, Xiaodan, Gurevych, Iryna
Answering Questions over Knowledge Graphs (KGQA) is key to well-functioning autonomous language agents in various real-life applications. To improve the neural-symbolic reasoning capabilities of language agents powered by Large Language Models (LLMs) in KGQA, we propose the DecompositionAlignment-Reasoning Agent (DARA) framework. DARA effectively parses questions into formal queries through a dual mechanism: high-level iterative task decomposition and low-level task grounding. Importantly, DARA can be efficiently trained with a small number of high-quality reasoning trajectories. Our experimental results demonstrate that DARA fine-tuned on LLMs (e.g. Llama-2-7B, Mistral) outperforms both in-context learning-based agents with GPT-4 and alternative fine-tuned agents, across different benchmarks in zero-shot evaluation, making such models more accessible for real-life applications. We also show that DARA attains performance comparable to state-of-the-art enumerating-and-ranking-based methods for KGQA.
- Asia > Indonesia > Sulawesi > North Sulawesi > Manado (0.04)
- North America > Canada > Ontario > Toronto (0.04)
- Asia > Japan (0.04)
- (12 more...)
- Leisure & Entertainment > Games (0.94)
- Government (0.94)
- Leisure & Entertainment > Sports > Football (0.67)
- Leisure & Entertainment > Sports > Hockey (0.51)
On the Analysis of Computational Delays in Reinforcement Learning-based Rate Adaptation Algorithms
Trancoso, Ricardo, Queiros, Ruben, Fontes, Helder, Campos, Rui
Several research works have applied Reinforcement Learning (RL) algorithms to solve the Rate Adaptation (RA) problem in Wi-Fi networks. The dynamic nature of the radio link requires the algorithms to be responsive to changes in link quality. Delays in the execution of the algorithm may be detrimental to its performance, which in turn may decrease network performance. This aspect has been overlooked in the state of the art. In this paper, we present an analysis of common computational delays in RL-based RA algorithms, and propose a methodology that may be applied to reduce these computational delays and increase the efficiency of this type of algorithms. We apply the proposed methodology to an existing RL-based RA algorithm. The obtained experimental results indicate a reduction of one order of magnitude in the execution time of the algorithm, improving its responsiveness to link quality changes.
Text To Image AI Has Created Its Own Secret Language, Researcher Claims
Here's something reassuring to think about: researchers using machine-learning artificial intelligence (AI) often don't know precisely how their algorithms are solving the problems they are tasked with. Take for instance the AI that can identify race from X-rays where no human can see how, or the Facebook AI that began to develop its own language. Joining these may be everyone's favorite text-to-image generator, DALLE-2. Computer Science PhD student Giannis Daras noticed that the DALLE-2 system, which creates images based on a text input prompt, would return nonsense words as text under certain circumstances. "A known limitation of DALLE-2 is that it struggles with text," he wrote in a paper published on pre-print server Arxiv.
Artificial Intelligence Caught Writing Its Own Creepy Language By Researchers
Something creepy recently happened in the world of technology after an artificial intelligence programme reached the pinnacle of independence by writing its own language. Nobody else is capable of fully understanding the language coined by OpenAI's "DALLE-E2" artificial intelligence system. Its job is to generate realistic and/or artistic images based on text descriptions entered by users. OpenAI claims that DALLE-E2 is groundbreaking, for it effectively "learned the relationship between images and the text used to describe them." While all this sounds riveting, DALLE-E2 is on a secret mission.
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.48)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.48)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.48)
Artificial intelligence spotted inventing its own creepy language - S.G.E
An artificial intelligence program has developed its own language and no one can understand it. OpenAI is an artificial intelligence systems developer – their programs are fantastic examples of super-computing but there are quirks. DALLE-E2 is OpenAI's latest AI system – it can generate realistic or artistic images from user-entered text descriptions. DALLE-E2 represents a milestone in machine learning – OpenAI's site says the program "learned the relationship between images and the text used to describe them." A DALLE-E2 demonstration includes interactive keywords for visiting users to play with and generate images – toggling different keywords will result in different images, styles, and subjects.
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.72)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.72)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.72)
Artificial intelligence spotted inventing its own creepy language
An artificial intelligence program has developed its own language and no one can understand it. OpenAI is an artificial intelligence systems developer – their programs are fantastic examples of super-computing but there are quirks. DALLE-E2 is OpenAI's latest AI system – it can generate realistic or artistic images from user-entered text descriptions. DALLE-E2 represents a milestone in machine learning – OpenAI's site says the program "learned the relationship between images and the text used to describe them." A DALLE-E2 demonstration includes interactive keywords for visiting users to play with and generate images – toggling different keywords will result in different images, styles, and subjects.
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.72)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.72)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.72)
DALL·E 2 Artificial Intelligence System Has Created its Own Secret Language Nobody Understands
Photo credit: The Sun Researcher and Computer Science PhD student Giannis Daras has uncovered a secret language that DALL·E 2, a cutting edge text-to-image artificial intelligence generator, has created. It's believed that DALL·E 2 makes up its own words to make sense of the images it generates. Daras then fed these words back to the system and apparently, the AI understood exactly what it was reading. Daras thinks that this is a big security hole for the text-to-image generator, as it could prompt backdoor adversarial attacks or provide ways to circumvent filter. As of now, Natural Language Processing systems filter text prompts that violate the policy rules and gibberish prompts may be eventually used by attackers to bypass these filters.