Praia
Efficient Multi-Agent Collaboration with Tool Use for Online Planning in Complex Table Question Answering
Zhou, Wei, Mesgar, Mohsen, Friedrich, Annemarie, Adel, Heike
Complex table question answering (TQA) aims to answer questions that require complex reasoning, such as multi-step or multi-category reasoning, over data represented in tabular form. Previous approaches demonstrated notable performance by leveraging either closed-source large language models (LLMs) or fine-tuned open-weight LLMs. However, fine-tuning LLMs requires high-quality training data, which is costly to obtain, and utilizing closed-source LLMs poses accessibility challenges and leads to reproducibility issues. In this paper, we propose Multi-Agent Collaboration with Tool use (MACT), a framework that requires neither closed-source models nor fine-tuning. In MACT, a planning agent and a coding agent that also make use of tools collaborate to answer questions. Our experiments on four TQA benchmarks show that MACT outperforms previous SoTA systems on three out of four benchmarks and that it performs comparably to the larger and more expensive closed-source model GPT-4 on two benchmarks, even when using only open-weight models without any fine-tuning. We conduct extensive analyses to prove the effectiveness of MACT's multi-agent collaboration in TQA.
Reversed Attention: On The Gradient Descent Of Attention Layers In GPT
The success of Transformer-based Language Models (LMs) stems from their attention mechanism. While this mechanism has been extensively studied in explainability research, particularly through the attention values obtained during the forward pass of LMs, the backward pass of attention has been largely overlooked. In this work, we study the mathematics of the backward pass of attention, revealing that it implicitly calculates an attention matrix we refer to as "Reversed Attention". We examine the properties of Reversed Attention and demonstrate its ability to elucidate the models' behavior and edit dynamics. In an experimental setup, we showcase the ability of Reversed Attention to directly alter the forward pass of attention, without modifying the model's weights, using a novel method called "attention patching". In addition to enhancing the comprehension of how LM configure attention layers during backpropagation, Reversed Attention maps contribute to a more interpretable backward pass. Our code will be available at: https://github.
An\'alise de ambiguidade lingu\'istica em modelos de linguagem de grande escala (LLMs)
Moraes, Lavínia de Carvalho, Silvério, Irene Cristina, Marques, Rafael Alexandre Sousa, Anaia, Bianca de Castro, de Paula, Dandara Freitas, de Faria, Maria Carolina Schincariol, Cleveston, Iury, Correia, Alana de Santana, Freitag, Raquel Meister Ko
Linguistic ambiguity continues to represent a significant challenge for natural language processing (NLP) systems, notwithstanding the advancements in architectures such as Transformers and BERT. Inspired by the recent success of instructional models like ChatGPT and Gemini (In 2023, the artificial intelligence was called Bard.), this study aims to analyze and discuss linguistic ambiguity within these models, focusing on three types prevalent in Brazilian Portuguese: semantic, syntactic, and lexical ambiguity. We create a corpus comprising 120 sentences, both ambiguous and unambiguous, for classification, explanation, and disambiguation. The models capability to generate ambiguous sentences was also explored by soliciting sets of sentences for each type of ambiguity. The results underwent qualitative analysis, drawing on recognized linguistic references, and quantitative assessment based on the accuracy of the responses obtained. It was evidenced that even the most sophisticated models, such as ChatGPT and Gemini, exhibit errors and deficiencies in their responses, with explanations often providing inconsistent. Furthermore, the accuracy peaked at 49.58 percent, indicating the need for descriptive studies for supervised learning.
US's Blinken begins four-nation Africa tour amid Sahel worries
United States Secretary of State Antony Blinken on Monday said the US is committed to deeper relations with Africa despite global crises as he opened a four-country tour of the continent. Blinken is touring four democracies on the Atlantic Coast – Cape Verde, Ivory Coast, Nigeria and Angola – as security deteriorates in the Sahel and doubts grow about a key US base in neighbouring coup-hit Niger. US President Joe Biden welcomed leaders from Africa in 2022 in a show of newfound attention to the continent. But he did not visit Africa last year as promised. Blinken nonetheless quoted Biden as he vowed, "We are all in when it comes to Africa."
RS5M and GeoRSCLIP: A Large Scale Vision-Language Dataset and A Large Vision-Language Model for Remote Sensing
Zhang, Zilun, Zhao, Tiancheng, Guo, Yulong, Yin, Jianwei
Pre-trained Vision-Language Models (VLMs) utilizing extensive image-text paired data have demonstrated unprecedented image-text association capabilities, achieving remarkable results across various downstream tasks. A critical challenge is how to make use of existing large-scale pre-trained VLMs, which are trained on common objects, to perform the domain-specific transfer for accomplishing domain-related downstream tasks. A critical challenge is how to make use of existing large-scale pre-trained VLMs, which are trained on common objects, to perform the domain-specific transfer for accomplishing domain-related downstream tasks. In this paper, we propose a new framework that includes the Domain pre-trained Vision-Language Model (DVLM), bridging the gap between the General Vision-Language Model (GVLM) and domain-specific downstream tasks. Moreover, we present an image-text paired dataset in the field of remote sensing (RS), RS5M, which has 5 million RS images with English descriptions. The dataset is obtained from filtering publicly available image-text paired datasets and captioning label-only RS datasets with pre-trained VLM. These constitute the first large-scale RS image-text paired dataset. Additionally, we fine-tuned the CLIP model and tried several Parameter-Efficient Fine-Tuning methods on RS5M to implement the DVLM. Experimental results show that our proposed dataset is highly effective for various tasks, and our model GeoRSCLIP improves upon the baseline or previous state-of-the-art model by $3\%\sim20\%$ in Zero-shot Classification (ZSC), $3\%\sim6\%$ in Remote Sensing Cross-Modal Text-Image Retrieval (RSCTIR) and $4\%\sim5\%$ in Semantic Localization (SeLo) tasks. Dataset and models have been released in: \url{https://github.com/om-ai-lab/RS5M}.
Experiential-Informed Data Reconstruction for Fishery Sustainability and Policies in the Azores
Nogueira, Brenda, Menezes, Gui M., Moniz, Nuno
Fishery analysis is critical in maintaining the long-term sustainability of species and the livelihoods of millions of people who depend on fishing for food and income. The fishing gear, or metier, is a key factor significantly impacting marine habitats, selectively targeting species and fish sizes. Analysis of commercial catches or landings by metier in fishery stock assessment and management is crucial, providing robust estimates of fishing efforts and their impact on marine ecosystems. In this paper, we focus on a unique data set from the Azores' fishing data collection programs between 2010 and 2017, where little information on metiers is available and sparse throughout our timeline. Our main objective is to tackle the task of data set reconstruction, leveraging domain knowledge and machine learning methods to retrieve or associate metier-related information to each fish landing. We empirically validate the feasibility of this task using a diverse set of modeling approaches and demonstrate how it provides new insights into different fisheries' behavior and the impact of metiers over time, which are essential for future fish population assessments, management, and conservation efforts.
Flickr Africa: Examining Geo-Diversity in Large-Scale, Human-Centric Visual Data
Naggita, Keziah, LaChance, Julienne, Xiang, Alice
Biases in large-scale image datasets are known to influence the performance of computer vision models as a function of geographic context. To investigate the limitations of standard Internet data collection methods in low- and middle-income countries, we analyze human-centric image geo-diversity on a massive scale using geotagged Flickr images associated with each nation in Africa. We report the quantity and content of available data with comparisons to population-matched nations in Europe as well as the distribution of data according to fine-grained intra-national wealth estimates. Temporal analyses are performed at two-year intervals to expose emerging data trends. Furthermore, we present findings for an ``othering'' phenomenon as evidenced by a substantial number of images from Africa being taken by non-local photographers. The results of our study suggest that further work is required to capture image data representative of African people and their environments and, ultimately, to improve the applicability of computer vision models in a global context.
CDJUR-BR -- A Golden Collection of Legal Document from Brazilian Justice with Fine-Grained Named Entities
Mauricio, Antonio, Pinheiro, Vladia, Furtado, Vasco, Neto, João Araújo Monteiro, Bomfim, Francisco das Chagas Jucá, da Costa, André Câmara Ferreira, Silveira, Raquel, Aragão, Nilsiton
A basic task for most Legal Artificial Intelligence (Legal AI) applications is Named Entity Recognition (NER). However, texts produced in the context of legal practice make references to entities that are not trivially recognized by the currently available NERs. There is a lack of categorization of legislation, jurisprudence, evidence, penalties, the roles of people in a legal process (judge, lawyer, victim, defendant, witness), types of locations (crime location, defendant's address), etc. In this sense, there is still a need for a robust golden collection, annotated with fine-grained entities of the legal domain, and which covers various documents of a legal process, such as petitions, inquiries, complaints, decisions and sentences. In this article, we describe the development of the Golden Collection of the Brazilian Judiciary (CDJUR-BR) contemplating a set of fine-grained named entities that have been annotated by experts in legal documents. The creation of CDJUR-BR followed its own methodology that aimed to attribute a character of comprehensiveness and robustness. Together with the CDJUR-BR repository we provided a NER based on the BERT model and trained with the CDJUR-BR, whose results indicated the prevalence of the CDJUR-BR.