South America
The Fundamental Rights Impact Assessment (FRIA) in the AI Act: Roots, legal obligations and key elements for a model template
What is the context which gave rise to the obligation to carry out a Fundamental Rights Impact Assessment (FRIA) in the AI Act? How has assessment of the impact on fundamental rights been framed by the EU legislator in the AI Act? What methodological criteria should be followed in developing the FRIA? These are the three main research questions that this article aims to address, through both legal analysis of the relevant provisions of the AI Act and discussion of various possible models for assessment of the impact of AI on fundamental rights. The overall objective of this article is to fill existing gaps in the theoretical and methodological elaboration of the FRIA, as outlined in the AI Act. In order to facilitate the future work of EU and national bodies and AI operators in placing this key tool for human-centric and trustworthy AI at the heart of the EU approach to AI design and development, this article outlines the main building blocks of a model template for the FRIA. While this proposal is consistent with the rationale and scope of the AI Act, it is also applicable beyond the cases listed in Article 27 and can serve as a blueprint for other national and international regulatory initiatives to ensure that AI is fully consistent with human rights.
A Guide to Misinformation Detection Datasets
Thibault, Camille, Peloquin-Skulski, Gabrielle, Tian, Jacob-Junqi, Laflamme, Florence, Guan, Yuxiang, Rabbany, Reihaneh, Godbout, Jean-François, Pelrine, Kellin
Misinformation is a complex societal issue, and mitigating solutions are difficult to create due to data deficiencies. To address this problem, we have curated the largest collection of (mis)information datasets in the literature, totaling 75. From these, we evaluated the quality of all of the 36 datasets that consist of statements or claims. We assess these datasets to identify those with solid foundations for empirical work and those with flaws that could result in misleading and non-generalizable results, such as insufficient label quality, spurious correlations, or political bias. We further provide state-of-the-art baselines on all these datasets, but show that regardless of label quality, categorical labels may no longer give an accurate evaluation of detection model performance. We discuss alternatives to mitigate this problem. Overall, this guide aims to provide a roadmap for obtaining higher quality data and conducting more effective evaluations, ultimately improving research in misinformation detection. All datasets and other artifacts are available at https://misinfo-datasets.complexdatalab.com/.
EPIC: Enhancing Privacy through Iterative Collaboration
Chourasia, Prakash, Lonkar, Heramb, Ali, Sarwan, Patterson, Murray
Advancements in genomics technology lead to a rising volume of viral (e.g., SARS-CoV-2) sequence data, resulting in increased usage of machine learning (ML) in bioinformatics. Traditional ML techniques require centralized data collection and processing, posing challenges in realistic healthcare scenarios. Additionally, privacy, ownership, and stringent regulation issues exist when pooling medical data into centralized storage to train a powerful deep learning (DL) model. The Federated learning (FL) approach overcomes such issues by setting up a central aggregator server and a shared global model. It also facilitates data privacy by extracting knowledge while keeping the actual data private. This work proposes a cutting-edge Privacy enhancement through Iterative Collaboration (EPIC) architecture. The network is divided and distributed between local and centralized servers. We demonstrate the EPIC approach to resolve a supervised classification problem to estimate SARS-CoV-2 genomic sequence data lineage without explicitly transferring raw sequence data. We aim to create a universal decentralized optimization framework that allows various data holders to work together and converge to a single predictive model. The findings demonstrate that privacy-preserving strategies can be successfully used with aggregation approaches without materially altering the degree of learning convergence. Finally, we highlight a few potential issues and prospects for study in FL-based approaches to healthcare applications.
On the cohesion and separability of average-link for hierarchical agglomerative clustering
Laber, Eduardo Sany, Bastista, Miguel
Average-link is widely recognized as one of the most popular and effective methods for building hierarchical agglomerative clustering. The available theoretical analyses show that this method has a much better approximation than other popular heuristics, as single-linkage and complete-linkage, regarding variants of Dasgupta's cost function [STOC 2016]. However, these analyses do not separate average-link from a random hierarchy and they are not appealing for metric spaces since every hierarchical clustering has a 1/2 approximation with regard to the variant of Dasgupta's function that is employed for dissimilarity measures [Moseley and Yang 2020]. In this paper, we present a comprehensive study of the performance of average-link in metric spaces, regarding several natural criteria that capture separability and cohesion, and are more interpretable than Dasgupta's cost function and its variants. We also present experimental results with real datasets that, together with our theoretical analyses, suggest that average-link is a better choice than other related methods when both cohesion and separability are important goals.
Findings of the IWSLT 2024 Evaluation Campaign
Ahmad, Ibrahim Said, Anastasopoulos, Antonios, Bojar, Ondřej, Borg, Claudia, Carpuat, Marine, Cattoni, Roldano, Cettolo, Mauro, Chen, William, Dong, Qianqian, Federico, Marcello, Haddow, Barry, Javorský, Dávid, Krubiński, Mateusz, Lam, Tsz Kin, Ma, Xutai, Mathur, Prashant, Matusov, Evgeny, Maurya, Chandresh, McCrae, John, Murray, Kenton, Nakamura, Satoshi, Negri, Matteo, Niehues, Jan, Niu, Xing, Ojha, Atul Kr., Ortega, John, Papi, Sara, Polák, Peter, Pospíšil, Adam, Pecina, Pavel, Salesky, Elizabeth, Sethiya, Nivedita, Sarkar, Balaram, Shi, Jiatong, Sikasote, Claytone, Sperber, Matthias, Stüker, Sebastian, Sudoh, Katsuhito, Thompson, Brian, Turchi, Marco, Waibel, Alex, Watanabe, Shinji, Wilken, Patrick, Zemánek, Petr, Zevallos, Rodolfo
This paper reports on the shared tasks organized by the 21st IWSLT Conference. The shared tasks address 7 scientific challenges in spoken language translation: simultaneous and offline translation, automatic subtitling and dubbing, speech-to-speech translation, dialect and low-resource speech translation, and Indic languages. The shared tasks attracted 18 teams whose submissions are documented in 26 system papers. The growing interest towards spoken language translation is also witnessed by the constantly increasing number of shared task organizers and contributors to the overview paper, almost evenly distributed across industry and academia.
Watching the Watchers: Exposing Gender Disparities in Machine Translation Quality Estimation
Zaranis, Emmanouil, Attanasio, Giuseppe, Agrawal, Sweta, Martins, André F. T.
The automatic assessment of translation quality has recently become crucial across several stages of the translation pipeline, from data curation to training and decoding. Although quality estimation (QE) metrics have been optimized to align with human judgments, no attention has been given to these metrics' potential biases, particularly in reinforcing visibility and usability for some demographic groups over others. This study is the first to investigate gender bias in QE metrics and its downstream impact on machine translation (MT). Focusing on out-of-English translations into languages with grammatical gender, we ask: Do contemporary QE metrics exhibit gender bias? Can the use of contextual information mitigate this bias? How does QE influence gender bias in MT outputs? Experiments with state-of-the-art QE metrics across multiple domains, datasets, and languages reveal significant bias. Masculine-inflected translations score higher than feminine-inflected ones, and gender-neutral translations are penalized. Moreover, context-aware QE metrics reduce errors for masculine-inflected references but fail to address feminine referents, exacerbating gender disparities. Additionally, QE metrics can perpetuate gender bias in MT systems when used in quality-aware decoding. Our findings underscore the need to address gender bias in QE metrics to ensure equitable and unbiased MT systems.
Private Algorithms for Stochastic Saddle Points and Variational Inequalities: Beyond Euclidean Geometry
Bassily, Raef, Guzmán, Cristóbal, Menart, Michael
In this work, we conduct a systematic study of stochastic saddle point problems (SSP) and stochastic variational inequalities (SVI) under the constraint of $(\epsilon,\delta)$-differential privacy (DP) in both Euclidean and non-Euclidean setups. We first consider Lipschitz convex-concave SSPs in the $\ell_p/\ell_q$ setup, $p,q\in[1,2]$. Here, we obtain a bound of $\tilde{O}\big(\frac{1}{\sqrt{n}} + \frac{\sqrt{d}}{n\epsilon}\big)$ on the strong SP-gap, where $n$ is the number of samples and $d$ is the dimension. This rate is nearly optimal for any $p,q\in[1,2]$. Without additional assumptions, such as smoothness or linearity requirements, prior work under DP has only obtained this rate when $p=q=2$ (i.e., only in the Euclidean setup). Further, existing algorithms have each only been shown to work for specific settings of $p$ and $q$ and under certain assumptions on the loss and the feasible set, whereas we provide a general algorithm for DP SSPs whenever $p,q\in[1,2]$. Our result is obtained via a novel analysis of the recursive regularization algorithm. In particular, we develop new tools for analyzing generalization, which may be of independent interest. Next, we turn our attention towards SVIs with a monotone, bounded and Lipschitz operator and consider $\ell_p$-setups, $p\in[1,2]$. Here, we provide the first analysis which obtains a bound on the strong VI-gap of $\tilde{O}\big(\frac{1}{\sqrt{n}} + \frac{\sqrt{d}}{n\epsilon}\big)$. For $p-1=\Omega(1)$, this rate is near optimal due to existing lower bounds. To obtain this result, we develop a modified version of recursive regularization. Our analysis builds on the techniques we develop for SSPs as well as employing additional novel components which handle difficulties arising from adapting the recursive regularization framework to SVIs.
Abstract2Appendix: Academic Reviews Enhance LLM Long-Context Capabilities
Li, Shengzhi, Kampa, Kittipat, Lin, Rongyu, Li, Bohang, Pei, Shichao
Large language models (LLMs) have shown remarkable performance across various tasks, yet their ability to handle long-context reading remains challenging. This study explores the effectiveness of leveraging high-quality academic peer review data for fine-tuning LLMs to enhance their long-context capabilities. We compare the Direct Preference Optimization (DPO) method with the Supervised Fine-Tuning (SFT) method, demonstrating DPO's superiority and data efficiency. Our experiments show that the fine-tuned model achieves a 4.04-point improvement over phi-3 and a 2.6\% increase on the Qasper benchmark using only 2000 samples. Despite facing limitations in data scale and processing costs, this study underscores the potential of DPO and high-quality data in advancing LLM performance. Additionally, the zero-shot benchmark results indicate that aggregated high-quality human reviews are overwhelmingly preferred over LLM-generated responses, even for the most capable models like GPT-4o. This suggests that high-quality human reviews are extremely rich in information, reasoning, and long-context retrieval, capabilities that even the most advanced models have not fully captured. These findings highlight the high utility of leveraging human reviews to further advance the field.
Public Procurement for Responsible AI? Understanding U.S. Cities' Practices, Challenges, and Needs
Johnson, Nari, Silva, Elise, Leon, Harrison, Eslami, Motahhare, Schwanke, Beth, Dotan, Ravit, Heidari, Hoda
Thus, most public-sector AI systems used today are developed by and acquired from private vendors. A growing number of academic and advocacy efforts have pointed out how AI systems procured in the public sector have predominantly targeted narrowly defined notions of efficiency and performance enhancements, resulting in adverse effects that disparately impact marginalized communities[18, 37, 46, 50, 86, 96]. While such incidents have exposed flaws in individual AI systems, they highlight deeper issues in how AI is acquired, used, and governed in the public sector. The AI procurement process encompasses decisions of which AI tools to ask for, adopt or reject, and the manner in which they are developed and deployed: decisions of critical importance for communities who may be harmed by AI. Such decisions not only influence the performance and risks posed by AI systems, but also play a significant role in shaping broader governance practices and ethical standards by which AI operates in the public sector. Interestingly, there is a long history of governments adapting their public procurement practices to enact social change, e.g., by creating processes that prioritize minority-owned businesses [62],
M3DocRAG: Multi-modal Retrieval is What You Need for Multi-page Multi-document Understanding
Cho, Jaemin, Mahata, Debanjan, Irsoy, Ozan, He, Yujie, Bansal, Mohit
Document visual question answering (DocVQA) pipelines that answer questions from documents have broad applications. Existing methods focus on handling single-page documents with multi-modal language models (MLMs), or rely on text-based retrieval-augmented generation (RAG) that uses text extraction tools such as optical character recognition (OCR). However, there are difficulties in applying these methods in real-world scenarios: (a) questions often require information across different pages or documents, where MLMs cannot handle many long documents; (b) documents often have important information in visual elements such as figures, but text extraction tools ignore them. We introduce M3DocRAG, a novel multi-modal RAG framework that flexibly accommodates various document contexts (closed-domain and open-domain), question hops (single-hop and multi-hop), and evidence modalities (text, chart, figure, etc.). M3DocRAG finds relevant documents and answers questions using a multi-modal retriever and an MLM, so that it can efficiently handle single or many documents while preserving visual information. Since previous DocVQA datasets ask questions in the context of a specific document, we also present M3DocVQA, a new benchmark for evaluating open-domain DocVQA over 3,000+ PDF documents with 40,000+ pages. In three benchmarks (M3DocVQA/MMLongBench-Doc/MP-DocVQA), empirical results show that M3DocRAG with ColPali and Qwen2-VL 7B achieves superior performance than many strong baselines, including state-of-the-art performance in MP-DocVQA. We provide comprehensive analyses of different indexing, MLMs, and retrieval models. Lastly, we qualitatively show that M3DocRAG can successfully handle various scenarios, such as when relevant information exists across multiple pages and when answer evidence only exists in images.