Generative AI
Diffusion Models in Low-Level Vision: A Survey
He, Chunming, Shen, Yuqi, Fang, Chengyu, Xiao, Fengyang, Tang, Longxiang, Zhang, Yulun, Zuo, Wangmeng, Guo, Zhenhua, Li, Xiu
Deep generative models have garnered significant attention in low-level vision tasks due to their generative capabilities. Among them, diffusion model-based solutions, characterized by a forward diffusion process and a reverse denoising process, have emerged as widely acclaimed for their ability to produce samples of superior quality and diversity. This ensures the generation of visually compelling results with intricate texture information. Despite their remarkable success, a noticeable gap exists in a comprehensive survey that amalgamates these pioneering diffusion model-based works and organizes the corresponding threads. This paper proposes the comprehensive review of diffusion model-based techniques. We present three generic diffusion modeling frameworks and explore their correlations with other deep generative models, establishing the theoretical foundation. Following this, we introduce a multi-perspective categorization of diffusion models, considering both the underlying framework and the target task. Additionally, we summarize extended diffusion models applied in other tasks, including medical, remote sensing, and video scenarios. Moreover, we provide an overview of commonly used benchmarks and evaluation metrics. We conduct a thorough evaluation, encompassing both performance and efficiency, of diffusion model-based techniques in three prominent tasks. Finally, we elucidate the limitations of current diffusion models and propose seven intriguing directions for future research. This comprehensive examination aims to facilitate a profound understanding of the landscape surrounding denoising diffusion models in the context of low-level vision tasks. A curated list of diffusion model-based techniques in over 20 low-level vision tasks can be found at https://github.com/ChunmingHe/awesome-diffusion-models-in-low-level-vision.
garak: A Framework for Security Probing Large Language Models
Derczynski, Leon, Galinkin, Erick, Martin, Jeffrey, Majumdar, Subho, Inie, Nanna
As Large Language Models (LLMs) are deployed and integrated into thousands of applications, the need for scalable evaluation of how models respond to adversarial attacks grows rapidly. However, LLM security is a moving target: models produce unpredictable output, are constantly updated, and the potential adversary is highly diverse: anyone with access to the internet and a decent command of natural language. Further, what constitutes a security weak in one context may not be an issue in a different context; one-fits-all guardrails remain theoretical. In this paper, we argue that it is time to rethink what constitutes ``LLM security'', and pursue a holistic approach to LLM security evaluation, where exploration and discovery of issues are central. To this end, this paper introduces garak (Generative AI Red-teaming and Assessment Kit), a framework which can be used to discover and identify vulnerabilities in a target LLM or dialog system. garak probes an LLM in a structured fashion to discover potential vulnerabilities. The outputs of the framework describe a target model's weaknesses, contribute to an informed discussion of what composes vulnerabilities in unique contexts, and can inform alignment and policy discussions for LLM deployment.
Effective Generative AI: The Human-Algorithm Centaur
Saghafian, Soroush, Idan, Lihi
Advanced analytics science methods have enabled combining the power of artificial and human intelligence, creating \textit{centaurs} that allow superior decision-making. Centaurs are hybrid human-algorithm AI models that combine both formal analytics and human intuition in a symbiotic manner within their learning and reasoning process. We argue that the future of AI development and use in many domains needs to focus on centaurs as opposed to traditional AI approaches. This paradigm shift from traditional AI methods to centaur-based AI methods raises some fundamental questions: How are centaurs different from traditional human-in-the-loop methods? What are the most effective methods for creating centaurs? When should centaurs be used, and when should the lead be given to traditional AI models? Doesn't the incorporation of human intuition -- which at times can be misleading -- in centaurs' decision-making process degrade its performance compared to traditional AI methods? This work aims to address these fundamental questions, focusing on recent advancements in generative AI, and especially in Large Language Models (LLMs), as a main case study to illustrate centaurs' critical essentiality to future AI endeavors.
Scorecards for Synthetic Medical Data Evaluation and Reporting
Zamzmi, Ghada, Subbaswamy, Adarsh, Sizikova, Elena, Margerrison, Edward, Delfino, Jana, Badano, Aldo
A key challenge for the safe and effective development and evaluation of medical AI devices is the limited availability of high-quality patient data [1] and the limitations to data sharing due to well-founded privacy concerns. Further, data collection is time-consuming, costly, and sometimes unfeasible for rare and underrepresented populations. Synthetic medical data (SMD)- artificial data partially or fully generated using computational techniques to mimic the properties and relationships seen in patient data [2]- holds promise for addressing these emerging challenges. SMD has gained attention due to recent advances in generative deep learning techniques [3]. Methods, such as Generative Adversarial Networks (GANs) and Denoising Diffusion Probabilistic Models, have the capacity to approximate the complex distributions of medical data and create SMD distributions that align with patient data. Generative AI models hold promise for producing large quantities of medical data at scale, which could supplement the scarce patient data currently available for medical AI development and evaluation.
Enhancing Supermarket Robot Interaction: A Multi-Level LLM Conversational Interface for Handling Diverse Customer Intents
Nandkumar, Chandran, Peternel, Luka
This paper presents the design and evaluation of a novel multi-level LLM interface for supermarket robots to assist customers. The proposed interface allows customers to convey their needs through both generic and specific queries. While state-of-the-art systems like OpenAI's GPTs are highly adaptable and easy to build and deploy, they still face challenges such as increased response times and limitations in strategic control of the underlying model for tailored use-case and cost optimisation. Driven by the goal of developing faster and more efficient conversational agents, this paper advocates for using multiple smaller, specialised LLMs fine-tuned to handle different user queries based on their specificity and user intent. We compare this approach to a specialised GPT model powered by GPT-4 Turbo, using the Artificial Social Agent Questionnaire (ASAQ) and qualitative participant feedback in a counterbalanced within-subjects experiment. Our findings show that our multi-LLM chatbot architecture outperformed the benchmarked GPT model across all 13 measured criteria, with statistically significant improvements in four key areas: performance, user satisfaction, user-agent partnership, and self-image enhancement. The paper also presents a method for supermarket robot navigation by mapping the final chatbot response to correct shelf numbers, enabling the robot to sequentially navigate towards the respective products, after which lower-level robot perception, control, and planning can be used for automated object retrieval. We hope this work encourages more efforts into using multiple, specialised smaller models instead of relying on a single powerful, but more expensive and slower, model.
Visual Hallucinations of Multi-modal Large Language Models
Huang, Wen, Liu, Hongbin, Guo, Minxin, Gong, Neil Zhenqiang
Visual hallucination (VH) means that a multi-modal LLM (MLLM) imagines incorrect details about an image in visual question answering. Existing studies find VH instances only in existing image datasets, which results in biased understanding of MLLMs' performance under VH due to limited diversity of such VH instances. In this work, we propose a tool called VHTest to generate a diverse set of VH instances. Specifically, VHTest finds some initial VH instances in existing image datasets (e.g., COCO), generates a text description for each VH mode, and uses a text-to-image generative model (e.g., DALL-E-3) to generate VH images based on the text descriptions. We collect a benchmark dataset with 1,200 VH instances in 8 VH modes using VHTest. We find that existing MLLMs such as GPT-4V, LLaVA-1.5, and MiniGPT-v2 hallucinate for a large fraction of the instances in our benchmark. Moreover, we find that fine-tuning an MLLM using our benchmark dataset reduces its likelihood to hallucinate without sacrificing its performance on other benchmarks. Our benchmarks are publicly available: https://github.com/wenhuang2000/VHTest.
An investigation into the scientific landscape of the conversational and generative artificial intelligence, and human-chatbot interaction in education and research
Akpan, Ikpe Justice, Kobara, Yawo M., Owolabi, Josiah, Akpam, Asuama, Offodile, Onyebuchi Felix
Artificial intelligence (AI) as a disruptive technology is not new. However, its recent evolution, engineered by technological transformation, big data analytics, and quantum computing, produces conversational and generative AI (CGAI/GenAI) and human-like chatbots that disrupt conventional operations and methods in different fields. This study investigates the scientific landscape of CGAI and human-chatbot interaction/collaboration and evaluates use cases, benefits, challenges, and policy implications for multidisciplinary education and allied industry operations. The publications trend showed that just 4% (n=75) occurred during 2006-2018, while 2019-2023 experienced astronomical growth (n=1763 or 96%). The prominent use cases of CGAI (e.g., ChatGPT) for teaching, learning, and research activities occurred in computer science [multidisciplinary and AI] (32%), medical/healthcare (17%), engineering (7%), and business fields (6%). The intellectual structure shows strong collaboration among eminent multidisciplinary sources in business, Information Systems, and other areas. The thematic structure of SLP highlights prominent CGAI use cases, including improved user experience in human-computer interaction, computer programs/code generation, and systems creation. Widespread CGAI usefulness for teachers, researchers, and learners includes syllabi/course content generation, testing aids, and academic writing. The concerns about abuse and misuse (plagiarism, academic integrity, privacy violations) and issues about misinformation, danger of self-diagnoses, and patient privacy in medical/healthcare applications are prominent. Formulating strategies and policies to address potential CGAI challenges in teaching/learning and practice are priorities. Developing discipline-based automatic detection of GenAI contents to check abuse is proposed.
What's in an embedding? Would a rose by any embedding smell as sweet?
Large Language Models (LLMs) are often criticized for lacking true "understanding" and the ability to "reason" with their knowledge, being seen merely as autocomplete systems. We believe that this assessment might be missing a nuanced insight. We suggest that LLMs do develop a kind of empirical "understanding" that is "geometry"-like, which seems adequate for a range of applications in NLP, computer vision, coding assistance, etc. However, this "geometric" understanding, built from incomplete and noisy data, makes them unreliable, difficult to generalize, and lacking in inference capabilities and explanations, similar to the challenges faced by heuristics-based expert systems decades ago. To overcome these limitations, we suggest that LLMs should be integrated with an "algebraic" representation of knowledge that includes symbolic AI elements used in expert systems. This integration aims to create large knowledge models (LKMs) that not only possess "deep" knowledge grounded in first principles, but also have the ability to reason and explain, mimicking human expert capabilities. To harness the full potential of generative AI safely and effectively, a paradigm shift is needed from LLM to more comprehensive LKM.
Applications of Generative AI in Healthcare: algorithmic, ethical, legal and societal considerations
Okonji, Onyekachukwu R., Yunusov, Kamol, Gordon, Bonnie
Generative AI is rapidly transforming medical imaging and text analysis, offering immense potential for enhanced diagnosis and personalized care. However, this transformative technology raises crucial ethical, societal, and legal questions. This paper delves into these complexities, examining issues of accuracy, informed consent, data privacy, and algorithmic limitations in the context of generative AI's application to medical imaging and text. We explore the legal landscape surrounding liability and accountability, emphasizing the need for robust regulatory frameworks. Furthermore, we dissect the algorithmic challenges, including data biases, model limitations, and workflow integration. By critically analyzing these challenges and proposing responsible solutions, we aim to foster a roadmap for ethical and responsible implementation of generative AI in healthcare, ensuring its transformative potential serves humanity with utmost care and precision.
Explain the Black Box for the Sake of Science: Revisiting the Scientific Method in the Era of Generative Artificial Intelligence
The scientific method is the cornerstone of human progress across all branches of the natural and applied sciences, from understanding the human body to explaining how the universe works. The scientific method is based on identifying systematic rules or principles that describe the phenomenon of interest in a reproducible way that can be validated through experimental evidence. In the era of artificial intelligence (AI), there are discussions on how AI systems may discover new knowledge. We argue that, before the advent of artificial general intelligence, human complex reasoning for scientific discovery remains of vital importance. Yet, AI can be leveraged for scientific discovery via explainable AI. More specifically, knowing what data AI systems used to make decisions can be a point of contact with domain experts and scientists, that can lead to divergent or convergent views on a given scientific problem. Divergent views may spark further scientific investigations leading to new scientific knowledge. Convergent views may instead reassure that the AI system is operating within bounds deemed reasonable to humans. The latter point addresses the trustworthiness requirement that is indispensable for critical applications in the applied sciences, such as medicine.