Generative AI
Generative AI for Education (GAIED): Advances, Opportunities, and Challenges
Denny, Paul, Gulwani, Sumit, Heffernan, Neil T., Käser, Tanja, Moore, Steven, Rafferty, Anna N., Singla, Adish
This survey article has grown out of the GAIED (pronounced "guide") workshop organized by the authors at the NeurIPS 2023 conference. We organized the GAIED workshop as part of a community-building effort to bring together researchers, educators, and practitioners to explore the potential of generative AI for enhancing education. This article aims to provide an overview of the workshop activities and highlight several future research directions in the area of GAIED.
Leak, Cheat, Repeat: Data Contamination and Evaluation Malpractices in Closed-Source LLMs
Balloccu, Simone, Schmidtová, Patrícia, Lango, Mateusz, Dušek, Ondřej
Natural Language Processing (NLP) research is increasingly focusing on the use of Large Language Models (LLMs), with some of the most popular ones being either fully or partially closed-source. The lack of access to model details, especially regarding training data, has repeatedly raised concerns about data contamination among researchers. Several attempts have been made to address this issue, but they are limited to anecdotal evidence and trial and error. Additionally, they overlook the problem of \emph{indirect} data leaking, where models are iteratively improved by using data coming from users. In this work, we conduct the first systematic analysis of work using OpenAI's GPT-3.5 and GPT-4, the most prominently used LLMs today, in the context of data contamination. By analysing 255 papers and considering OpenAI's data usage policy, we extensively document the amount of data leaked to these models during the first year after the model's release. We report that these models have been globally exposed to $\sim$4.7M samples from 263 benchmarks. At the same time, we document a number of evaluation malpractices emerging in the reviewed papers, such as unfair or missing baseline comparisons and reproducibility issues. We release our results as a collaborative project on https://leak-llm.github.io/, where other researchers can contribute to our efforts.
Compressed Context Memory For Online Language Model Interaction
Kim, Jang-Hyun, Yeom, Junyoung, Yun, Sangdoo, Song, Hyun Oh
This paper presents a context key/value compression method for Transformer language models in online scenarios, where the context continually expands. As the context lengthens, the attention process demands increasing memory and computations, which in turn reduces the throughput of the language model. To address this challenge, we propose a compressed context memory system that continually compresses the accumulating attention key/value pairs into a compact memory space, facilitating language model inference in a limited memory space of computing environments. Our compression process involves integrating a lightweight conditional LoRA into the language model's forward pass during inference, without the need for fine-tuning the model's entire set of weights. We achieve efficient training by modeling the recursive compression process as a single parallelized forward computation. Through evaluations on conversation, personalization, and multi-task learning, we demonstrate that our approach achieves the performance level of a full context model with 5 smaller context memory size. We further demonstrate the applicability of our approach in a streaming setting with an unlimited context length, outperforming the sliding window approach. Transformer language models have exhibited exceptional language processing capabilities, achieving remarkable results in various applications (Vaswani et al., 2017). In particular, the attention mechanism, which encompasses the entire context window, enables the language models to respond with a nuanced understanding of context. With this contextual understanding, services like ChatGPT or Bard can generate responses customized to individual users through online interactions (OpenAI, 2023; Manyika, 2023). In this online scenario, the context used for language model inference accumulates over time, raising an important challenge in efficiently handling this growing context. A straightforward approach is to deal with previous contexts as a prompt, which leads to a continual increase in inference time and memory usage due to the growing length of contexts. Alternately, caching the attention hidden states of Transformer would be impractical (Dai et al., 2019), as the caching capacity and attention costs increase with the accumulation of contexts. Recent studies propose compressing contextual information into concise sequences of token embeddings or attention keys/values (denoted as KV) (Chevalier et al., 2023; Mu et al., 2023). However, those methods primarily focus on fixed-context scenarios and are not designed for dynamically changing contexts.
Diffusion Models: A Comprehensive Survey of Methods and Applications
Yang, Ling, Zhang, Zhilong, Song, Yang, Hong, Shenda, Xu, Runsheng, Zhao, Yue, Zhang, Wentao, Cui, Bin, Yang, Ming-Hsuan
Diffusion models have emerged as a powerful new family of deep generative models with record-breaking performance in many applications, including image synthesis, video generation, and molecule design. In this survey, we provide an overview of the rapidly expanding body of work on diffusion models, categorizing the research into three key areas: efficient sampling, improved likelihood estimation, and handling data with special structures. We also discuss the potential for combining diffusion models with other generative models for enhanced results. We further review the wide-ranging applications of diffusion models in fields spanning from computer vision, natural language generation, temporal data modeling, to interdisciplinary applications in other scientific disciplines. This survey aims to provide a contextualized, in-depth look at the state of diffusion models, identifying the key areas of focus and pointing to potential areas for further exploration. Github: https://github.com/YangLing0818/Diffusion-Models-Papers-Survey-Taxonomy.
Inside OpenAI's Plan to Make AI More 'Democratic'
He was surrounded by seven staff from the world's leading artificial intelligence lab, which had launched ChatGPT a few months earlier. One of them was Wojciech Zaremba, an OpenAI co-founder. For over a decade, Megill had been toiling in relative obscurity as the co-founder of Polis, a nonprofit open-source tech platform for carrying out public deliberations. Democracy, in Megill's view, had barely evolved in hundreds of years even as the world around it had transformed unrecognizably. Each voter has a multitude of beliefs they must distill down into a single signal: one vote, every few years. The heterogeneity of every individual gets lost and distorted, with the result that democratic systems often barely reflect the will of the people and tend toward polarization.
GenLens: A Systematic Evaluation of Visual GenAI Model Outputs
Lin, Tica, Pfister, Hanspeter, Wang, Jui-Hsien
The rapid development of generative AI (GenAI) models in computer vision necessitates effective evaluation methods to ensure their quality and fairness. Existing tools primarily focus on dataset quality assurance and model explainability, leaving a significant gap in GenAI output evaluation during model development. Current practices often depend on developers' subjective visual assessments, which may lack scalability and generalizability. This paper bridges this gap by conducting a formative study with GenAI model developers in an industrial setting. Our findings led to the development of GenLens, a visual analytic interface designed for the systematic evaluation of GenAI model outputs during the early stages of model development. GenLens offers a quantifiable approach for overviewing and annotating failure cases, customizing issue tags and classifications, and aggregating annotations from multiple users to enhance collaboration. A user study with model developers reveals that GenLens effectively enhances their workflow, evidenced by high satisfaction rates and a strong intent to integrate it into their practices. This research underscores the importance of robust early-stage evaluation tools in GenAI development, contributing to the advancement of fair and high-quality GenAI models.
Toward Human-AI Alignment in Large-Scale Multi-Player Games
Sharma, Sugandha, Davidson, Guy, Khetarpal, Khimya, Kanervisto, Anssi, Arora, Udit, Hofmann, Katja, Momennejad, Ida
Achieving human-AI alignment in complex multi-agent games is crucial for creating trustworthy AI agents that enhance gameplay. We propose a method to evaluate this alignment using an interpretable task-sets framework, focusing on high-level behavioral tasks instead of low-level policies. Our approach has three components. First, we analyze extensive human gameplay data from Xbox's Bleeding Edge (100K+ games), uncovering behavioral patterns in a complex task space. This task space serves as a basis set for a behavior manifold capturing interpretable axes: fight-flight, explore-exploit, and solo-multi-agent. Second, we train an AI agent to play Bleeding Edge using a Generative Pretrained Causal Transformer and measure its behavior. Third, we project human and AI gameplay to the proposed behavior manifold to compare and contrast. This allows us to interpret differences in policy as higher-level behavioral concepts, e.g., we find that while human players exhibit variability in fight-flight and explore-exploit behavior, AI players tend towards uniformity. Furthermore, AI agents predominantly engage in solo play, while humans often engage in cooperative and competitive multi-agent patterns. These stark differences underscore the need for interpretable evaluation, design, and integration of AI in human-aligned applications. Our study advances the alignment discussion in AI and especially generative AI research, offering a measurable framework for interpretable human-agent alignment in multiplayer gaming.
An Inpainting-Infused Pipeline for Attire and Background Replacement
Perche-Mahlow, Felipe Rodrigues, Felipe-Zanella, André, Cruz-Castañeda, William Alberto, Amadeus, Marcellus
The extraordinary advancement in Generative Artificial Intelligence (GenAI) has caused a transformative shift in our approach to complex tasks incorporating various modalities such as text, audio, video, and image generation. GenAI, as a broad category, excels at creating synthetic data that can closely mimic real-world phenomena, showcasing its prowess in diverse creative applications. In text generation, models like OpenAI's GPT (Generative Pre-trained Transformer) [OpenAI, 2023] are revolutionizing how society writes. These models, trained on massive corpora of text data, exhibit an impressive ability to understand context, generate coherent paragraphs, and even complete sentences in a very consistent way [Roumeliotis and Tselikas, 2023]. The ability to produce fluent and relevant textual content has established applications in natural language processing, content creation, and even automated writing [Huang and Tan, 2023]. Audio generation models, exemplified by technologies such as Tacotron [Wang et al., 2017] and WaveNet [Oord et al., 2016], have significantly advanced our ability to synthesize realistic speech patterns. These models take advantage of deep neural networks to capture the intricacies of human speech, producing natural-sounding voices and musical compositions with nuanced variations in tone, pitch, and rhythm [Ning et al., 2019]. Image generation, a focal point of our discussion, has witnessed the evolution of models such as DALL-E [Betker et al., 2023, Ramesh et al., 2021], MidJourney [mid, 2022], and Stable Diffusion [Rombach et al., 2022], which can generate diverse and intricate images from textual prompts.
Multi-Lingual Malaysian Embedding: Leveraging Large Language Models for Semantic Representations
Zolkepli, Husein, Razak, Aisyah, Adha, Kamarul, Nazhan, Ariff
In this work, we present a comprehensive exploration of finetuning Malaysian language models, specifically Llama2 and Mistral, on embedding tasks involving negative and positive pairs. We release two distinct models tailored for Semantic Similarity and Retrieval-Augmented Generation (RAG). For Semantic Similarity, our 600 million parameter Llama2 model outperforms OpenAI text-embedding-ada-002 across all recall@k metrics for b.cari.com.my, c.cari.com.my, Malay news, and Malaysian Twitter test sets. In the realm of RAG models, our approach proves competitive with OpenAI text-embedding-ada-002 in the Malaysian context. Notably, our 2 billion parameter Llama2 model achieves superior Recall@5, Recall@10 for the "Melayu" keyword research papers dataset and excels in Recall@3, Recall@5, and Recall@10 for the lom.agc.gov.my dataset. These findings underscore the effectiveness of our finetuning strategy and highlight the performance gains in both Semantic Similarity and RAG tasks. All models released at https://huggingface.co/collections/mesolitica/malaysian-embedding-6523612bfe5881ad35f81b99
DisDet: Exploring Detectability of Backdoor Attack on Diffusion Models
Sui, Yang, Phan, Huy, Xiao, Jinqi, Zhang, Tianfang, Tang, Zijie, Shi, Cong, Wang, Yan, Chen, Yingying, Yuan, Bo
In the exciting generative AI era, the diffusion model has emerged as a very powerful and widely adopted content generation and editing tool for various data modalities, making the study of their potential security risks very necessary and critical. Very recently, some pioneering works have shown the vulnerability of the diffusion model against backdoor attacks, calling for in-depth analysis and investigation of the security challenges of this popular and fundamental AI technique. In this paper, for the first time, we systematically explore the detectability of the poisoned noise input for the backdoored diffusion models, an important performance metric yet little explored in the existing works. Starting from the perspective of a defender, we first analyze the properties of the trigger pattern in the existing diffusion backdoor attacks, discovering the important role of distribution discrepancy in Trojan detection. Based on this finding, we propose a low-cost trigger detection mechanism that can effectively identify the poisoned input noise. We then take a further step to study the same problem from the attack side, proposing a backdoor attack strategy that can learn the unnoticeable trigger to evade our proposed detection scheme. Empirical evaluations across various diffusion models and datasets demonstrate the effectiveness of the proposed trigger detection and detection-evading attack strategy. For trigger detection, our distribution discrepancy-based solution can achieve a 100\% detection rate for the Trojan triggers used in the existing works. For evading trigger detection, our proposed stealthy trigger design approach performs end-to-end learning to make the distribution of poisoned noise input approach that of benign noise, enabling nearly 100\% detection pass rate with very high attack and benign performance for the backdoored diffusion models.