Overview
PROMPTHEUS: A Human-Centered Pipeline to Streamline SLRs with LLMs
Torres, João Pedro Fernandes, Mulligan, Catherine, Jorge, Joaquim, Moreira, Catarina
The growing volume of academic publications poses significant challenges for researchers conducting timely and accurate Systematic Literature Reviews, particularly in fast-evolving fields like artificial intelligence. This growth of academic literature also makes it increasingly difficult for lay people to access scientific knowledge effectively, meaning academic literature is often misrepresented in the popular press and, more broadly, in society. Traditional SLR methods are labor-intensive and error-prone, and they struggle to keep up with the rapid pace of new research. To address these issues, we developed \textit{PROMPTHEUS}: an AI-driven pipeline solution that automates the SLR process using Large Language Models. We aimed to enhance efficiency by reducing the manual workload while maintaining the precision and coherence required for comprehensive literature synthesis. PROMPTHEUS automates key stages of the SLR process, including systematic search, data extraction, topic modeling using BERTopic, and summarization with transformer models. Evaluations conducted across five research domains demonstrate that PROMPTHEUS reduces review time, achieves high precision, and provides coherent topic organization, offering a scalable and effective solution for conducting literature reviews in an increasingly crowded research landscape. In addition, such tools may reduce the increasing mistrust in science by making summarization more accessible to laypeople. The code for this project can be found on the GitHub repository at https://github.com/joaopftorres/PROMPTHEUS.git
LLM Gesticulator: Leveraging Large Language Models for Scalable and Controllable Co-Speech Gesture Synthesis
Pang, Haozhou, Ding, Tianwei, He, Lanshan, Tao, Ming, Zhang, Lu, Gan, Qi
In this work, we present LLM Gesticulator, an LLM-based audio-driven co-speech gesture generation framework that synthesizes full-body animations that are rhythmically aligned with the input audio while exhibiting natural movements and editability. Compared to previous work, our model demonstrates substantial scalability. As the size of the backbone LLM model increases, our framework shows proportional improvements in evaluation metrics (a.k.a. scaling law). Our method also exhibits strong controllability where the content, style of the generated gestures can be controlled by text prompt. To the best of our knowledge, LLM gesticulator is the first work that use LLM on the co-speech generation task. Evaluation with existing objective metrics and user studies indicate that our framework outperforms prior works.
AI-Driven Approaches for Glaucoma Detection -- A Comprehensive Review
Hagiwara, Yuki, Ciora, Octavia-Andreea, Monnet, Maureen, Lancho, Gino, Lorenz, Jeanette Miriam
The diagnosis of glaucoma plays a critical role in the management and treatment of this vision-threatening disease. Glaucoma is a group of eye diseases that cause blindness by damaging the optic nerve at the back of the eye. Often called "silent thief of sight", it exhibits no symptoms during the early stages. Therefore, early detection is crucial to prevent vision loss. With the rise of Artificial Intelligence (AI), particularly Deep Learning (DL) techniques, Computer-Aided Diagnosis (CADx) systems have emerged as promising tools to assist clinicians in accurately diagnosing glaucoma early. This paper aims to provide a comprehensive overview of AI techniques utilized in CADx systems for glaucoma diagnosis. Through a detailed analysis of current literature, we identify key gaps and challenges in these systems, emphasizing the need for improved safety, reliability, interpretability, and explainability. By identifying research gaps, we aim to advance the field of CADx systems especially for the early diagnosis of glaucoma, in order to prevent any potential loss of vision.
Large Language Model-based Augmentation for Imbalanced Node Classification on Text-Attributed Graphs
Wang, Leyao, Wang, Yu, Ni, Bo, Zhao, Yuying, Derr, Tyler
Node classification on graphs frequently encounters the challenge of class imbalance, leading to biased performance and posing significant risks in real-world applications. Although several data-centric solutions have been proposed, none of them focus on Text-Attributed Graphs (TAGs), and therefore overlook the potential of leveraging the rich semantics encoded in textual features for boosting the classification of minority nodes. Given this crucial gap, we investigate the possibility of augmenting graph data in the text space, leveraging the textual generation power of Large Language Models (LLMs) to handle imbalanced node classification on TAGs. Specifically, we propose a novel approach called LA-TAG (LLM-based Augmentation on Text-Attributed Graphs), which prompts LLMs to generate synthetic texts based on existing node texts in the graph. Furthermore, to integrate these synthetic text-attributed nodes into the graph, we introduce a text-based link predictor to connect the synthesized nodes with the existing nodes. Our experiments across multiple datasets and evaluation metrics show that our framework significantly outperforms traditional non-textual-based data augmentation strategies and specific node imbalance solutions. This highlights the promise of using LLMs to resolve imbalance issues on TAGs.
Development of CODO: A Comprehensive Tool for COVID-19 Data Representation, Analysis, and Visualization
Dutta, Biswanath, Bain, Debanjali
Artificial intelligence (AI) has become indispensable for managing and processing the vast amounts of data generated during the COVID-19 pandemic. Ontology, which formalizes knowledge within a domain using standardized vocabularies and relationships, plays a crucial role in AI by enabling automated reasoning, data integration, semantic interoperability, and extracting meaningful insights from extensive datasets. The diversity of COVID-19 datasets poses challenges in comprehending this information for both human and machines. Existing COVID-19 ontologies are designed to address specific aspects of the pandemic but lack comprehensive coverage across all essential dimensions. To address this gap, CODO, an integrated ontological model has been developed encompassing critical facets of COVID-19 information such as aetiology, epidemiology, transmission, pathogenesis, diagnosis, prevention, genomics, therapeutic safety, and more. This paper reviews CODO since its inception in 2020, detailing its developments and highlighting CODO as a tool for the aggregation, representation, analysis, and visualization of diverse COVID-19 data. The major contribution of this paper is to provide a summary of the development of CODO, and outline the overall development and evaluation approach. By adhering to best practices and leveraging W3C standards, CODO ensures data integration and semantic interoperability, supporting effective navigation of COVID-19 complexities across various domains.
Trustworthy XAI and Application
Nasim, MD Abdullah Al, Biswas, Parag, Rashid, Abdur, Biswas, Angona, Gupta, Kishor Datta
One of today's most significant and transformative technologies is the rapidly developing field of artificial intelligence (AI). Deined as a computer system that simulates human cognitive processes, AI is present in many aspects of our daily lives, from the self-driving cars on the road to the intelligence (AI) because some AI systems are so complex and opaque. With millions of parameters and layers, these system-deep neural networks in particular-make it difficult for humans to comprehend accountability, prejudice, and justice are raised by the opaqueness of its decision-making process. AI has a lot of potential, but it also comes with a lot of difficulties and moral dilemmas. In the context of explainable artificial intelligence (XAI), trust is crucial as it ensures that AI systems behave consistently, fairly, and ethically. In the present article, we explore XAI, reliable XAI, and several practical uses for reliable XAI. Once more, we go over the three main components-transparency, explainability, and trustworthiness of XAI-that we determined are pertinent in this situation. We present an overview of recent scientific studies that employ trustworthy XAI in various application fields. In the end, trustworthiness is crucial for establishing and maintaining trust between humans and AI systems, facilitating the integration of AI systems into various applications and domains for the benefit of society.
Assessment of Transformer-Based Encoder-Decoder Model for Human-Like Summarization
Nair, Sindhu, Rao, Y. S., Shankarmani, Radha
In recent times, extracting valuable information from large text is making significant progress. Especially in the current era of social media, people expect quick bites of information. Automatic text summarization seeks to tackle this by slimming large texts down into more manageable summaries. This important research area can aid in decision-making by digging out salient content from large text. With the progress in deep learning models, significant work in language models has emerged. The encoder-decoder framework in deep learning has become the central approach for automatic text summarization. This work leverages transformer-based BART model for human-like summarization which is an open-ended problem with many challenges. On training and fine-tuning the encoder-decoder model, it is tested with diverse sample articles and the quality of summaries of diverse samples is assessed based on human evaluation parameters. Further, the finetuned model performance is compared with the baseline pretrained model based on evaluation metrics like ROUGE score and BERTScore. Additionally, domain adaptation of the model is required for improved performance of abstractive summarization of dialogues between interlocutors. On investigating, the above popular evaluation metrics are found to be insensitive to factual errors. Further investigation of the summaries generated by finetuned model is done using the contemporary evaluation metrics of factual consistency like WeCheck and SummaC. Empirical results on BBC News articles highlight that the gold standard summaries written by humans are more factually consistent by 17% than the abstractive summaries generated by finetuned model.
From PINNs to PIKANs: Recent Advances in Physics-Informed Machine Learning
Toscano, Juan Diego, Oommen, Vivek, Varghese, Alan John, Zou, Zongren, Daryakenari, Nazanin Ahmadi, Wu, Chenxi, Karniadakis, George Em
Physics-Informed Neural Networks (PINNs) have emerged as a key tool in Scientific Machine Learning since their introduction in 2017, enabling the efficient solution of ordinary and partial differential equations using sparse measurements. Over the past few years, significant advancements have been made in the training and optimization of PINNs, covering aspects such as network architectures, adaptive refinement, domain decomposition, and the use of adaptive weights and activation functions. A notable recent development is the Physics-Informed Kolmogorov-Arnold Networks (PIKANS), which leverage a representation model originally proposed by Kolmogorov in 1957, offering a promising alternative to traditional PINNs. In this review, we provide a comprehensive overview of the latest advancements in PINNs, focusing on improvements in network design, feature expansion, optimization techniques, uncertainty quantification, and theoretical insights. We also survey key applications across a range of fields, including biomedicine, fluid and solid mechanics, geophysics, dynamical systems, heat transfer, chemical engineering, and beyond. Finally, we review computational frameworks and software tools developed by both academia and industry to support PINN research and applications.
Procedural Content Generation in Games: A Survey with Insights on Emerging LLM Integration
Maleki, Mahdi Farrokhi, Zhao, Richard
Procedural Content Generation (PCG) is defined as the automatic creation of game content using algorithms. PCG has a long history in both the game industry and the academic world. It can increase player engagement and ease the work of game designers. While recent advances in deep learning approaches in PCG have enabled researchers and practitioners to create more sophisticated content, it is the arrival of Large Language Models (LLMs) that truly disrupted the trajectory of PCG advancement. This survey explores the differences between various algorithms used for PCG, including search-based methods, machine learning-based methods, other frequently used methods (e.g., noise functions), and the newcomer, LLMs. We also provide a detailed discussion on combined methods. Furthermore, we compare these methods based on the type of content they generate and the publication dates of their respective papers. Finally, we identify gaps in the existing academic work and suggest possible directions for future research.
Visual Question Answering in Ophthalmology: A Progressive and Practical Perspective
Chen, Xiaolan, Chen, Ruoyu, Xu, Pusheng, Zhang, Weiyi, Shang, Xianwen, He, Mingguang, Shi, Danli
Accurate diagnosis of ophthalmic diseases relies heavily on the interpretation of multimodal ophthalmic images, a process often time-consuming and expertise-dependent. Visual Question Answering (VQA) presents a potential interdisciplinary solution by merging computer vision and natural language processing to comprehend and respond to queries about medical images. This review article explores the recent advancements and future prospects of VQA in ophthalmology from both theoretical and practical perspectives, aiming to provide eye care professionals with a deeper understanding and tools for leveraging the underlying models. Additionally, we discuss the promising trend of large language models (LLM) in enhancing various components of the VQA framework to adapt to multimodal ophthalmic tasks. Despite the promising outlook, ophthalmic VQA still faces several challenges, including the scarcity of annotated multimodal image datasets, the necessity of comprehensive and unified evaluation methods, and the obstacles to achieving effective real-world applications. This article highlights these challenges and clarifies future directions for advancing ophthalmic VQA with LLMs. The development of LLM-based ophthalmic VQA systems calls for collaborative efforts between medical professionals and AI experts to overcome existing obstacles and advance the diagnosis and care of eye diseases. Keywords: Ophthalmic Visual Question Answering, Large Language Models, Multimodal Image Interpretation, Report Generation, Generative Artificial Intelligence Introduction Accurate diagnosis of ophthalmic diseases often relies on the comprehensive analysis of multimodal ophthalmic images, including color fundus photographs (CFP), optical coherence tomography (OCT), fundus fluorescein angiography (FFA), scanning laser ophthalmoscopy (SLO), anterior segment photographs and corneal topography, etc.