Overview
User Simulation in the Era of Generative AI: User Modeling, Synthetic Data Generation, and System Evaluation
Balog, Krisztian, Zhai, ChengXiang
User simulation is an emerging interdisciplinary topic with multiple critical applications in the era of Generative AI. It involves creating an intelligent agent that mimics the actions of a human user interacting with an AI system, enabling researchers to model and analyze user behaviour, generate synthetic data for training, and evaluate interactive AI systems in a controlled and reproducible manner. User simulation has profound implications for diverse fields and plays a vital role in the pursuit of Artificial General Intelligence. This paper provides an overview of user simulation, highlighting its key applications, connections to various disciplines, and outlining future research directions to advance this increasingly important technology.
A Survey on Algorithmic Developments in Optimal Transport Problem with Applications
Optimal Transport (OT) has established itself as a robust framework for quantifying differences between distributions, with applications that span fields such as machine learning, data science, and computer vision. This paper offers a detailed examination of the OT problem, beginning with its theoretical foundations, including the classical formulations of Monge and Kantorovich and their extensions to modern computational techniques. It explores cutting-edge algorithms, including Sinkhorn iterations, primal-dual strategies, and reduction-based approaches, emphasizing their efficiency and scalability in addressing high-dimensional problems. The paper also highlights emerging trends, such as integrating OT into machine learning frameworks, the development of novel problem variants, and ongoing theoretical advancements. Applications of OT are presented across a range of domains, with particular attention to its innovative application in time series data analysis via Optimal Transport Warping (OTW), a robust alternative to methods like Dynamic Time Warping. Despite the significant progress made, challenges related to scalability, robustness, and ethical considerations remain, necessitating further research. The paper underscores OT's potential to bridge theoretical depth and practical utility, fostering impactful advancements across diverse disciplines.
HypeRL: Parameter-Informed Reinforcement Learning for Parametric PDEs
Botteghi, Nicolò, Fresca, Stefania, Guo, Mengwu, Manzoni, Andrea
In this work, we devise a new, general-purpose reinforcement learning strategy for the optimal control of parametric partial differential equations (PDEs). Such problems frequently arise in applied sciences and engineering and entail a significant complexity when control and/or state variables are distributed in high-dimensional space or depend on varying parameters. Traditional numerical methods, relying on either iterative minimization algorithms or dynamic programming, while reliable, often become computationally infeasible. Indeed, in either way, the optimal control problem must be solved for each instance of the parameters, and this is out of reach when dealing with high-dimensional time-dependent and parametric PDEs. In this paper, we propose HypeRL, a deep reinforcement learning (DRL) framework to overcome the limitations shown by traditional methods. HypeRL aims at approximating the optimal control policy directly. Specifically, we employ an actor-critic DRL approach to learn an optimal feedback control strategy that can generalize across the range of variation of the parameters. To effectively learn such optimal control laws, encoding the parameter information into the DRL policy and value function neural networks (NNs) is essential. To do so, HypeRL uses two additional NNs, often called hypernetworks, to learn the weights and biases of the value function and the policy NNs. We validate the proposed approach on two PDE-constrained optimal control benchmarks, namely a 1D Kuramoto-Sivashinsky equation and a 2D Navier-Stokes equations, by showing that the knowledge of the PDE parameters and how this information is encoded, i.e., via a hypernetwork, is an essential ingredient for learning parameter-dependent control policies that can generalize effectively to unseen scenarios and for improving the sample efficiency of such policies.
The Role of Machine Learning in Congenital Heart Disease Diagnosis: Datasets, Algorithms, and Insights
Khan, Khalil, Ullah, Farhan, Syed, Ikram, Ullah, Irfan
Congenital heart disease is among the most common fetal abnormalities and birth defects. Despite identifying numerous risk factors influencing its onset, a comprehensive understanding of its genesis and management across diverse populations remains limited. Recent advancements in machine learning have demonstrated the potential for leveraging patient data to enable early congenital heart disease detection. Over the past seven years, researchers have proposed various data-driven and algorithmic solutions to address this challenge. This paper presents a systematic review of congential heart disease recognition using machine learning, conducting a meta-analysis of 432 references from leading journals published between 2018 and 2024. A detailed investigation of 74 scholarly works highlights key factors, including databases, algorithms, applications, and solutions. Additionally, the survey outlines reported datasets used by machine learning experts for congenital heart disease recognition. Using a systematic literature review methodology, this study identifies critical challenges and opportunities in applying machine learning to congenital heart disease.
Ethical Concerns of Generative AI and Mitigation Strategies: A Systematic Mapping Study
Huang, Yutan, Arora, Chetan, Houng, Wen Cheng, Kanij, Tanjila, Madulgalla, Anuradha, Grundy, John
The evolution of Generative AI, particularly Large Language Models (LLMs), has seen remarkable advancements since 2020 with the introduction of models like Chat-GPT and Bard. LLMs have revolutionized tasks, such as writing assistance, code generation, and customer support automation, by leveraging vast amounts of data to generate coherent and contextually relevant natural language (NL) responses [1, 2]. As a subset of Generative AI--systems designed to create new content--LLMs go beyond traditional AI techniques, which focus primarily on analyzing existing data. LLMs, in contrast, are capable of generating text, images, and music that mimic human creativity [3]. This capability is powered by advancements in neural network architectures, especially transformers, which enable LLMs to learn the nuances of human language and produce semantically accurate content [4].
The State of Post-Hoc Local XAI Techniques for Image Processing: Challenges and Motivations
Poh, Rech Leong Tian, Keoh, Sye Loong, Li, Liying
As complex AI systems further prove to be an integral part of our lives, a persistent and critical problem is the underlying black-box nature of such products and systems. In pursuit of productivity enhancements, one must not forget the need for various technology to boost the overall trustworthiness of such AI systems. One example, which is studied extensively in this work, is the domain of Explainable Artificial Intelligence (XAI). Research works in this scope are centred around the objective of making AI systems more transparent and interpretable, to further boost reliability and trust in using them. In this work, we discuss the various motivation for XAI and its approaches, the underlying challenges that XAI faces, and some open problems that we believe deserve further efforts to look into. We also provide a brief discussion of various XAI approaches for image processing, and finally discuss some future directions, to hopefully express and motivate the positive development of the XAI research space.
Diffusion as Shader: 3D-aware Video Diffusion for Versatile Video Generation Control
Gu, Zekai, Yan, Rui, Lu, Jiahao, Li, Peng, Dou, Zhiyang, Si, Chenyang, Dong, Zhen, Liu, Qifeng, Lin, Cheng, Liu, Ziwei, Wang, Wenping, Liu, Yuan
Diffusion models have demonstrated impressive performance in generating high-quality videos from text prompts or images. However, precise control over the video generation process, such as camera manipulation or content editing, remains a significant challenge. Existing methods for controlled video generation are typically limited to a single control type, lacking the flexibility to handle diverse control demands. In this paper, we introduce Diffusion as Shader (DaS), a novel approach that supports multiple video control tasks within a unified architecture. Our key insight is that achieving versatile video control necessitates leveraging 3D control signals, as videos are fundamentally 2D renderings of dynamic 3D content. Unlike prior methods limited to 2D control signals, DaS leverages 3D tracking videos as control inputs, making the video diffusion process inherently 3D-aware. This innovation allows DaS to achieve a wide range of video controls by simply manipulating the 3D tracking videos. A further advantage of using 3D tracking videos is their ability to effectively link frames, significantly enhancing the temporal consistency of the generated videos. With just 3 days of fine-tuning on 8 H800 GPUs using less than 10k videos, DaS demonstrates strong control capabilities across diverse tasks, including mesh-to-video generation, camera control, motion transfer, and object manipulation.
Explainability in Neural Networks for Natural Language Processing Tasks
Mersha, Melkamu, Bitewa, Mingiziem, Abay, Tsion, Kalita, Jugal
Neural networks are widely regarded as black-box models, creating significant challenges in understanding their inner workings, especially in natural language processing (NLP) applications. To address this opacity, model explanation techniques like Local Interpretable Model-Agnostic Explanations (LIME) have emerged as essential tools for providing insights into the behavior of these complex systems. This study leverages LIME to interpret a multi-layer perceptron (MLP) neural network trained on a text classification task. By analyzing the contribution of individual features to model predictions, the LIME approach enhances interpretability and supports informed decision-making. Despite its effectiveness in offering localized explanations, LIME has limitations in capturing global patterns and feature interactions. This research highlights the strengths and shortcomings of LIME and proposes directions for future work to achieve more comprehensive interpretability in neural NLP models.
ViLBias: A Comprehensive Framework for Bias Detection through Linguistic and Visual Cues , presenting Annotation Strategies, Evaluation, and Key Challenges
Raza, Shaina, Saleh, Caesar, Hasan, Emrul, Ogidi, Franklin, Powers, Maximus, Chatrath, Veronica, Lotif, Marcelo, Javadi, Roya, Zahid, Anam, Khazaie, Vahid Reza
The integration of Large Language Models (LLMs) and Vision-Language Models (VLMs) opens new avenues for addressing complex challenges in multimodal content analysis, particularly in biased news detection. This study introduces VLBias, a framework that leverages state-of-the-art LLMs and VLMs to detect linguistic and visual biases in news content. We present a multimodal dataset comprising textual content and corresponding images from diverse news sources. We propose a hybrid annotation framework that combines LLM-based annotations with human review to ensure high-quality labeling while reducing costs and enhancing scalability. Our evaluation compares the performance of state-of-the-art SLMs and LLMs for both modalities (text and images) and the results reveal that while SLMs are computationally efficient, LLMs demonstrate superior accuracy in identifying subtle framing and text-visual inconsistencies. Furthermore, empirical analysis shows that incorporating visual cues alongside textual data improves bias detection accuracy by 3 to 5%. This study provides a comprehensive exploration of LLMs, SLMs, and VLMs as tools for detecting multimodal biases in news content and highlights their respective strengths, limitations, and potential for future applications
Incentivized Symbiosis: A Paradigm for Human-Agent Coevolution
Chaffer, Tomer Jordi, Goldston, Justin, A., Gemach D. A. T. I
Cooperation is vital to our survival and progress. Evolutionary game theory offers a lens to understand the structures and incentives that enable cooperation to be a successful strategy. As artificial intelligence agents become integral to human systems, the dynamics of cooperation take on unprecedented significance. The convergence of human-agent teaming, contract theory, and decentralized frameworks like Web3, grounded in transparency, accountability, and trust, offers a foundation for fostering cooperation by establishing enforceable rules and incentives for humans and AI agents. We conceptualize Incentivized Symbiosis as a social contract between humans and AI, inspired by Web3 principles and encoded in blockchain technology, to define and enforce rules, incentives, and consequences for both parties. By exploring this paradigm, we aim to catalyze new research at the intersection of systems thinking in AI, Web3, and society, fostering innovative pathways for cooperative human-agent coevolution.