South America
Goal-Driven Query Answering over First- and Second-Order Dependencies with Equality
Tsamoura, Efthymia, Motik, Boris
Query answering over data with dependencies plays a central role in most applications of dependencies. The problem is commonly solved by using a suitable variant of the chase algorithm to compute a universal model of the dependencies and the data and thus explicate all knowledge implicit in the dependencies. After this preprocessing step, an arbitrary conjunctive query over the dependencies and the data can be answered by evaluating it the computed universal model. If, however, the query to be answered is fixed and known in advance, computing the universal model is often inefficient as many inferences made during this process can be irrelevant to a given query. In such cases, a goal-driven approach, which avoids drawing unnecessary inferences, promises to be more efficient and thus preferable in practice. In this paper we present what we believe to be the first technique for goal-driven query answering over first- and second-order dependencies with equality reasoning. Our technique transforms the input dependencies so that applying the chase to the output avoids many inferences that are irrelevant to the query. The transformation proceeds in several steps, which comprise the following three novel techniques. First, we present a variant of the singularisation technique by Marnette [60] that is applicable to second-order dependencies and that corrects an incompleteness of a related formulation by ten Cate et al. [74]. Second, we present a relevance analysis technique that can eliminate from the input dependencies that provably do not contribute to query answers. Third, we present a variant of the magic sets algorithm [19] that can handle second-order dependencies with equality reasoning. We also present the results of an extensive empirical evaluation, which show that goal-driven query answering can be orders of magnitude faster than computing the full universal model.
CrossVIT-augmented Geospatial-Intelligence Visualization System for Tracking Economic Development Dynamics
Bai, Yanbing, Su, Jinhua, Qiao, Bin, Ma, Xiaoran
Timely and accurate economic data is crucial for effective policymaking. Current challenges in data timeliness and spatial resolution can be addressed with advancements in multimodal sensing and distributed computing. We introduce Senseconomic, a scalable system for tracking economic dynamics via multimodal imagery and deep learning. Built on the Transformer framework, it integrates remote sensing and street view images using cross-attention, with nighttime light data as weak supervision. The system achieved an R-squared value of 0.8363 in county-level economic predictions and halved processing time to 23 minutes using distributed computing. Its user-friendly design includes a Vue3-based front end with Baidu maps for visualization and a Python-based back end automating tasks like image downloads and preprocessing. Senseconomic empowers policymakers and researchers with efficient tools for resource allocation and economic planning.
LCFO: Long Context and Long Form Output Dataset and Benchmarking
Costa-jussà, Marta R., Andrews, Pierre, Meglioli, Mariano Coria, Chen, Joy, Chuang, Joe, Dale, David, Ropers, Christophe, Mourachko, Alexandre, Sánchez, Eduardo, Schwenk, Holger, Tran, Tuan, Turkatenko, Arina, Wood, Carleigh
This paper presents the Long Context and Form Output (LCFO) benchmark, a novel evaluation framework for assessing gradual summarization and summary expansion capabilities across diverse domains. LCFO consists of long input documents (5k words average length), each of which comes with three summaries of different lengths (20%, 10%, and 5% of the input text), as well as approximately 15 questions and answers (QA) related to the input content. Notably, LCFO also provides alignments between specific QA pairs and corresponding summaries in 7 domains. The primary motivation behind providing summaries of different lengths is to establish a controllable framework for generating long texts from shorter inputs, i.e. summary expansion. To establish an evaluation metric framework for summarization and summary expansion, we provide human evaluation scores for human-generated outputs, as well as results from various state-of-the-art large language models (LLMs). GPT-4o-mini achieves best human scores among automatic systems in both summarization and summary expansion tasks (~ +10% and +20%, respectively). It even surpasses human output quality in the case of short summaries (~ +7%). Overall automatic metrics achieve low correlations with human evaluation scores (~ 0.4) but moderate correlation on specific evaluation aspects such as fluency and attribution (~ 0.6). The LCFO benchmark offers a standardized platform for evaluating summarization and summary expansion performance, as well as corresponding automatic metrics, thereby providing an important evaluation framework to advance generative AI.
Piecing It All Together: Verifying Multi-Hop Multimodal Claims
Wang, Haoran, Rangapur, Aman, Xu, Xiongxiao, Liang, Yueqing, Gharwi, Haroon, Yang, Carl, Shu, Kai
Existing claim verification datasets often do not require systems to perform complex reasoning or effectively interpret multimodal evidence. To address this, we introduce a new task: multi-hop multimodal claim verification. This task challenges models to reason over multiple pieces of evidence from diverse sources, including text, images, and tables, and determine whether the combined multimodal evidence supports or refutes a given claim. To study this task, we construct MMCV, a large-scale dataset comprising 15k multi-hop claims paired with multimodal evidence, generated and refined using large language models, with additional input from human feedback. We show that MMCV is challenging even for the latest state-of-the-art multimodal large language models, especially as the number of reasoning hops increases. Additionally, we establish a human performance benchmark on a subset of MMCV. We hope this dataset and its evaluation task will encourage future research in multimodal multi-hop claim verification.
Assessing the Robustness of Retrieval-Augmented Generation Systems in K-12 Educational Question Answering with Knowledge Discrepancies
Zheng, Tianshi, Li, Weihan, Bai, Jiaxin, Wang, Weiqi, Song, Yangqiu
Retrieval-Augmented Generation (RAG) systems have demonstrated remarkable potential as question answering systems in the K-12 Education domain, where knowledge is typically queried within the restricted scope of authoritative textbooks. However, the discrepancy between textbooks and the parametric knowledge in Large Language Models (LLMs) could undermine the effectiveness of RAG systems. To systematically investigate the robustness of RAG systems under such knowledge discrepancies, we present EduKDQA, a question answering dataset that simulates knowledge discrepancies in real applications by applying hypothetical knowledge updates in answers and source documents. EduKDQA includes 3,005 questions covering five subjects, under a comprehensive question typology from the perspective of context utilization and knowledge integration. We conducted extensive experiments on retrieval and question answering performance. We find that most RAG systems suffer from a substantial performance drop in question answering with knowledge discrepancies, while questions that require integration of contextual knowledge and parametric knowledge pose a challenge to LLMs.
DiverseAgentEntropy: Quantifying Black-Box LLM Uncertainty through Diverse Perspectives and Multi-Agent Interaction
Feng, Yu, Htut, Phu Mon, Qi, Zheng, Xiao, Wei, Mager, Manuel, Pappas, Nikolaos, Halder, Kishaloy, Li, Yang, Benajiba, Yassine, Roth, Dan
Quantifying the uncertainty in the factual parametric knowledge of Large Language Models (LLMs), especially in a black-box setting, poses a significant challenge. Existing methods, which gauge a model's uncertainty through evaluating self-consistency in responses to the original query, do not always capture true uncertainty. Models might respond consistently to the origin query with a wrong answer, yet respond correctly to varied questions from different perspectives about the same query, and vice versa. In this paper, we propose a novel method, DiverseAgentEntropy, for evaluating a model's uncertainty using multi-agent interaction under the assumption that if a model is certain, it should consistently recall the answer to the original query across a diverse collection of questions about the same original query. We further implement an abstention policy to withhold responses when uncertainty is high. Our method offers a more accurate prediction of the model's reliability and further detects hallucinations, outperforming other self-consistency-based methods. Additionally, it demonstrates that existing models often fail to consistently retrieve the correct answer to the same query under diverse varied questions even when knowing the correct answer.
Align, Generate, Learn: A Novel Closed-Loop Framework for Cross-Lingual In-Context Learning
Rojas, Mateo Alejandro, Carranza, Rafael
Cross-lingual in-context learning (XICL) has emerged as a transformative paradigm for leveraging large language models (LLMs) to tackle multilingual tasks, especially for low-resource languages. However, existing approaches often rely on external retrievers or task-specific fine-tuning, limiting their scalability and generalizability. In this paper, we propose a novel self-supervised framework that harnesses the generative capabilities of LLMs to internally select and utilize task-relevant examples. Our method introduces two key objectives: a retrieval-generation alignment loss to optimize the quality of selected examples and a semantic coherence loss to ensure cross-lingual consistency. Through extensive experiments on multilingual benchmarks, our approach achieves state-of-the-art performance, significantly outperforming existing baselines. Further analysis highlights its robustness across diverse language families and its ability to generalize to unseen tasks. Human evaluations confirm the superior fluency, relevance, and semantic correctness of outputs generated by our method. This work provides a scalable, effective, and generalizable solution for cross-lingual in-context learning.
Full Magnetometer and Gyroscope Bias Estimation using Angular Rates: Theory and Experimental Evaluation of a Factor Graph-Based Approach
Rodríguez-Martínez, Sebastián, Troni, Giancarlo
Despite their widespread use in determining system attitude, Micro-Electro-Mechanical Systems (MEMS) Attitude and Heading Reference Systems (AHRS) are limited by sensor measurement biases. This paper introduces a method called MAgnetometer and GYroscope Calibration (MAGYC), leveraging three-axis angular rate measurements from an angular rate gyroscope to estimate both the hard- and soft-iron biases of magnetometers as well as the bias of gyroscopes. We present two implementation methods of this approach based on batch and online incremental factor graphs. Our method imposes fewer restrictions on instrument movements required for calibration, eliminates the need for knowledge of the local magnetic field magnitude or instrument's attitude, and facilitates integration into factor graph algorithms for Smoothing and Mapping frameworks. We validate the proposed methods through numerical simulations and in-field experimental evaluations with a sensor onboard an underwater vehicle. By implementing the proposed method in field data of a seafloor mapping dive, the dead reckoning-based position estimation error of the underwater vehicle was reduced from 10% to 0.5% of the distance traveled.
Missing Melodies: AI Music Generation and its "Nearly" Complete Omission of the Global South
Mehta, Atharva, Chauhan, Shivam, Choudhury, Monojit
Recent advances in generative AI have sparked renewed interest and expanded possibilities for music generation. However, the performance and versatility of these systems across musical genres are heavily influenced by the availability of training data. We conducted an extensive analysis of over one million hours of audio datasets used in AI music generation research and manually reviewed more than 200 papers from eleven prominent AI and music conferences and organizations (AAAI, ACM, EUSIPCO, EURASIP, ICASSP, ICML, IJCAI, ISMIR, NeurIPS, NIME, SMC) to identify a critical gap in the fair representation and inclusion of the musical genres of the Global South in AI research. Our findings reveal a stark imbalance: approximately 86% of the total dataset hours and over 93% of researchers focus primarily on music from the Global North. However, around 40% of these datasets include some form of non-Western music, genres from the Global South account for only 14.6% of the data. Furthermore, approximately 51% of the papers surveyed concentrate on symbolic music generation, a method that often fails to capture the cultural nuances inherent in music from regions such as South Asia, the Middle East, and Africa. As AI increasingly shapes the creation and dissemination of music, the significant underrepresentation of music genres in datasets and research presents a serious threat to global musical diversity. We also propose some important steps to mitigate these risks and foster a more inclusive future for AI-driven music generation.
Congruence-based Learning of Probabilistic Deterministic Finite Automata
Carrasco, Matías, Mayr, Franz, Yovine, Sergio
This work studies the question of learning probabilistic deterministic automata from language models. For this purpose, it focuses on analyzing the relations defined on algebraic structures over strings by equivalences and similarities on probability distributions. We introduce a congruence that extends the classical Myhill-Nerode congruence for formal languages. This new congruence is the basis for defining regularity over language models. We present an active learning algorithm that computes the quotient with respect to this congruence whenever the language model is regular. The paper also defines the notion of recognizability for language models and shows that it coincides with regularity for congruences. For relations which are not congruences, it shows that this is not the case. Finally, it discusses the impact of this result on learning in the context of language models.