Goto

Collaborating Authors

 challenge dataset


Advancing AI Challenges for the United States Department of the Air Force

Prothmann, Christian, Gadepally, Vijay, Kepner, Jeremy, Borchard, Koley, Carlone, Luca, Folcik, Zachary, Grith, J. Daniel, Houle, Michael, How, Jonathan P., Hughes, Nathan, Igbinedion, Ifueko, Jananthan, Hayden, Jayashankar, Tejas, Jones, Michael, Karaman, Sertac, Kurien, Binoy G., Lancho, Alejandro, Lavezzi, Giovanni, Lee, Gary C. F., Leiserson, Charles E., Linares, Richard, McEvoy, Lindsey, Michaleas, Peter, Milner, Chasen, Pentland, Alex, Polyanskiy, Yury, Popovich, Jovan, Price, Jeffrey, Reid, Tim W., Riley, Stephanie, Samsi, Siddharth, Saunders, Peter, Simek, Olga, Veillette, Mark S., Weiss, Amir, Wornell, Gregory W., Rus, Daniela, Ruppel, Scott T.

arXiv.org Artificial Intelligence

The DAF-MIT AI Accelerator is a collaboration between the United States Department of the Air Force (DAF) and the Massachusetts Institute of Technology (MIT). This program pioneers fundamental advances in artificial intelligence (AI) to expand the competitive advantage of the United States in the defense and civilian sectors. In recent years, AI Accelerator projects have developed and launched public challenge problems aimed at advancing AI research in priority areas. Hallmarks of AI Accelerator challenges include large, publicly available, and AI-ready datasets to stimulate open-source solutions and engage the wider academic and private sector AI ecosystem. This article supplements our previous publication, which introduced AI Accelerator challenges. We provide an update on how ongoing and new challenges have successfully contributed to AI research and applications of AI technologies.


ATLANTIS at SemEval-2025 Task 3: Detecting Hallucinated Text Spans in Question Answering

Kobus, Catherine, Lancelot, François, Martin, Marion-Cécile, Amer, Nawal Ould

arXiv.org Artificial Intelligence

This paper presents the contributions of the ATLANTIS team to SemEval-2025 Task 3, focusing on detecting hallucinated text spans in question answering systems. Large Language Models (LLMs) have significantly advanced Natural Language Generation (NLG) but remain susceptible to hallucinations, generating incorrect or misleading content. To address this, we explored methods both with and without external context, utilizing few-shot prompting with a LLM, token-level classification or LLM fine-tuned on synthetic data. Notably, our approaches achieved top rankings in Spanish and competitive placements in English and German. This work highlights the importance of integrating relevant context to mitigate hallucinations and demonstrate the potential of fine-tuned models and prompt engineering.


Clinical trial cohort selection using Large Language Models on n2c2 Challenges

Tai, Chi-en Amy, Tannier, Xavier

arXiv.org Artificial Intelligence

Clinical trials are a critical process in the medical field for introducing new treatments and innovations. However, cohort selection for clinical trials is a time-consuming process that often requires manual review of patient text records for specific keywords. Though there have been studies on standardizing the information across the various platforms, Natural Language Processing (NLP) tools remain crucial for spotting eligibility criteria in textual reports. Recently, pre-trained large language models (LLMs) have gained popularity for various NLP tasks due to their ability to acquire a nuanced understanding of text. In this paper, we study the performance of large language models on clinical trial cohort selection and leverage the n2c2 challenges to benchmark their performance. Our results are promising with regard to the incorporation of LLMs for simple cohort selection tasks, but also highlight the difficulties encountered by these models as soon as fine-grained knowledge and reasoning are required.


Multimodal Multi-turn Conversation Stance Detection: A Challenge Dataset and Effective Model

Niu, Fuqiang, Cheng, Zebang, Fu, Xianghua, Peng, Xiaojiang, Dai, Genan, Chen, Yin, Huang, Hu, Zhang, Bowen

arXiv.org Artificial Intelligence

Stance detection, which aims to identify public opinion towards specific targets using social media data, is an important yet challenging task. With the proliferation of diverse multimodal social media content including text, and images multimodal stance detection (MSD) has become a crucial research area. However, existing MSD studies have focused on modeling stance within individual text-image pairs, overlooking the multi-party conversational contexts that naturally occur on social media. This limitation stems from a lack of datasets that authentically capture such conversational scenarios, hindering progress in conversational MSD. To address this, we introduce a new multimodal multi-turn conversational stance detection dataset (called MmMtCSD). To derive stances from this challenging dataset, we propose a novel multimodal large language model stance detection framework (MLLM-SD), that learns joint stance representations from textual and visual modalities. Experiments on MmMtCSD show state-of-the-art performance of our proposed MLLM-SD approach for multimodal stance detection. We believe that MmMtCSD will contribute to advancing real-world applications of stance detection research.


What is to be gained by ensemble models in analysis of spectroscopic data?

Domijan, Katarina

arXiv.org Artificial Intelligence

Vibrational spectroscopic techniques, including near-infrared (NIR), mid-infrared (MIR), and Raman, use the effect of light to provide information about the constituents of a sample. These low cost, rapid and noninvasive techniques are widely and routinely used in many application domains. Prediction in spectroscopic data is a topic of major interest in chemometric literature, see for example Frizzarin et al. (2021c,b); Singh and Domijan (2019). Numerous advances in statistical machine learning model methodology in the past few decades offer the potential to improve prediction performance over the well-established partial least squares (PLS) approach. Comparative analyses of algorithm prediction ability for spectroscopic data have shown that PLS variants perform strongly Frizzarin et al. (2021b); Singh and Domijan (2019), but that there isn't a single model that will outperform others in all settings.


Scaling NVIDIA's Multi-speaker Multi-lingual TTS Systems with Zero-Shot TTS to Indic Languages

Arora, Akshit, Badlani, Rohan, Kim, Sungwon, Valle, Rafael, Catanzaro, Bryan

arXiv.org Artificial Intelligence

In this paper, we describe the TTS models developed by NVIDIA for the MMITS-VC (Multi-speaker, Multi-lingual Indic TTS with Voice Cloning) 2024 Challenge. In Tracks 1 and 2, we utilize RAD-MMM to perform few-shot TTS by training additionally on 5 minutes of target speaker data. In Track 3, we utilize P-Flow to perform zero-shot TTS by training on the challenge dataset as well as external datasets. We use HiFi-GAN vocoders for all submissions. RAD-MMM performs competitively on Tracks 1 and 2, while P-Flow ranks first on Track 3, with mean opinion score (MOS) 4.4 and speaker similarity score (SMOS) of 3.62.


Delta-LoRA: Fine-Tuning High-Rank Parameters with the Delta of Low-Rank Matrices

Zi, Bojia, Qi, Xianbiao, Wang, Lingzhi, Wang, Jianan, Wong, Kam-Fai, Zhang, Lei

arXiv.org Artificial Intelligence

In this paper, we present Delta-LoRA, which is a novel parameter-efficient approach to fine-tune large language models (LLMs). Such a strategy effectively addresses the limitation that the incremental update of low-rank matrices is inadequate for learning representations capable for downstream tasks. Moreover, as the update of W does not need to compute the gradients of W and store their momentums, Delta-LoRA shares comparable memory requirements and computational costs with LoRA. Extensive experiments show that Delta-LoRA significantly outperforms existing low-rank adaptation methods. We further support these results with comprehensive analyses that underscore the effectiveness of Delta-LoRA. Large Language Models (LLMs) recently have attracted considerable attention due to their remarkable performance across a broad spectrum of downstream tasks. Diverging from conventional Transformers characterized by a scale of millions of parameters, modern LLMs typically scale up to billions of parameters, endowing them with notable advantages such as emergent capabilities and robust generalization as detailed in (Bubeck et al., 2023). However, fine-tuning a LLM with all the learnable parameters (Full Fine-tuning) requires multiple GPUs with high memory demand (Dettmers et al., 2023; Hu et al., 2022), which is unattainable for many companies and research institutions.


Evaluating GPT-3.5 and GPT-4 Models on Brazilian University Admission Exams

Nunes, Desnes, Primi, Ricardo, Pires, Ramon, Lotufo, Roberto, Nogueira, Rodrigo

arXiv.org Artificial Intelligence

The present study aims to explore the capabilities of Language Models (LMs) in tackling high-stakes multiple-choice tests, represented here by the Exame Nacional do Ensino M\'edio (ENEM), a multidisciplinary entrance examination widely adopted by Brazilian universities. This exam poses challenging tasks for LMs, since its questions may span into multiple fields of knowledge, requiring understanding of information from diverse domains. For instance, a question may require comprehension of both statistics and biology to be solved. This work analyzed responses generated by GPT-3.5 and GPT-4 models for questions presented in the 2009-2017 exams, as well as for questions of the 2022 exam, which were made public after the training of the models was completed. Furthermore, different prompt strategies were tested, including the use of Chain-of-Thought (CoT) prompts to generate explanations for answers. On the 2022 edition, the best-performing model, GPT-4 with CoT, achieved an accuracy of 87%, largely surpassing GPT-3.5 by 11 points. The code and data used on experiments are available at https://github.com/piresramon/gpt-4-enem.


Atrial Fibrillation Detection Using RR-Intervals for Application in Photoplethysmographs

Smith, Georgia, Wang, Yishi

arXiv.org Artificial Intelligence

Atrial Fibrillation is a common form of irregular heart rhythm that can be very dangerous. Our primary goal is to analyze Atrial Fibrillation data within ECGs to develop a model based only on RR-Intervals, or the length between heart-beats, to create a real time classification model for Atrial Fibrillation to be implemented in common heart-rate monitors on the market today. Physionet's MIT-BIH Atrial Fibrillation Database \cite{goldberger2000physiobank} and 2017 Challenge Database \cite{clifford2017af} were used to identify patterns of Atrial Fibrillation and test classification models on. These two datasets are very different. The MIT-BIH database contains long samples taken with a medical grade device, which is not useful for simulating a consumer device, but is useful for Atrial Fibrillation pattern detection. The 2017 Challenge database includes short ($<60sec$) samples taken with a portable device and reveals many of the challenges of Atrial Fibrillation classification in a real-time device. We developed multiple SVM models with three sets of extracted features as predictor variables which gave us moderately high accuracies with low computational intensity. With robust filtering techniques already applied in many Photoplethysmograph-based consumer heart-rate monitors, this method can be used to develop a reliable real time model for Atrial Fibrillation detection in consumer-grade heart-rate monitors.


Detecting Dementia from Speech and Transcripts using Transformers

Ilias, Loukas, Askounis, Dimitris, Psarras, John

arXiv.org Artificial Intelligence

Alzheimer's disease (AD) constitutes a neurodegenerative disease with serious consequences to peoples' everyday lives, if it is not diagnosed early since there is no available cure. Alzheimer's is the most common cause of dementia, which constitutes a general term for loss of memory. Due to the fact that dementia affects speech, existing research initiatives focus on detecting dementia from spontaneous speech. However, little work has been done regarding the conversion of speech data to Log-Mel spectrograms and Mel-frequency cepstral coefficients (MFCCs) and the usage of pretrained models. Concurrently, little work has been done in terms of both the usage of transformer networks and the way the two modalities, i.e., speech and transcripts, are combined in a single neural network. To address these limitations, first we represent speech signal as an image and employ several pretrained models, with Vision Transformer (ViT) achieving the highest evaluation results. Secondly, we propose multimodal models. More specifically, our introduced models include Gated Multimodal Unit in order to control the influence of each modality towards the final classification and crossmodal attention so as to capture in an effective way the relationships between the two modalities. Extensive experiments conducted on the ADReSS Challenge dataset demonstrate the effectiveness of the proposed models and their superiority over state-of-the-art approaches.