Generative AI
Generative AI Toolkit -- a framework for increasing the quality of LLM-based applications over their whole life cycle
Kohl, Jens, Gloger, Luisa, Costa, Rui, Kruse, Otto, Luitz, Manuel P., Katz, David, Barbeito, Gonzalo, Schweier, Markus, French, Ryan, Schroeder, Jonas, Riedl, Thomas, Perri, Raphael, Mostafa, Youssef
Since their introduction LLM have gained widespread traction in different domains. They can be used as stand-alone products, but also to augment existing software products such as applications (also called agentic functions) or machine learning agents (also called LLM-based agents) to increase their capabilities. In this section, we show challenges during development and operation of LLM-based applications on three examples. Users interact with LLM-based applications by entering input into the LLM, the so-called prompt. Jang et al. showed in 2023 that the LLM's output is very sensitive to variations of the prompt [1]. Thus, the task of finding the best prompt to generate expected or best output leads to manual, trial-and-error-prompt experimenting - a method well known as prompt-engineering (cf. White et al. in 2023 for ChatGPT [2] or a survey of prompt techniques by Schulhoff et al. in 2024 [3]). Additionally, the outputs of an LLM-based application can not only vary, but also be wrong without telling a user ("hallucination", cf.
Tuning Music Education: AI-Powered Personalization in Learning Music
Sanganeria, Mayank, Gala, Rohan
Recent AI-driven step-function advances in several longstanding problems in music technology are opening up new avenues to create the next generation of music education tools. Creating personalized, engaging, and effective learning experiences is a continuously evolving challenge in music education. Here we present two case studies using such advances in music technology to address these challenges. In our first case study we showcase an application that uses Automatic Chord Recognition to generate personalized exercises from audio tracks, connecting traditional ear training with real-world musical contexts. In the second case study we prototype adaptive piano method books that use Automatic Music Transcription to generate exercises at different skill levels while retaining a close connection to musical interests. These applications demonstrate how recent AI developments can democratize access to high-quality music education and promote rich interaction with music in the age of generative AI. We hope this work inspires other efforts in the community, aimed at removing barriers to access to high-quality music education and fostering human participation in musical expression.
Scaling of Search and Learning: A Roadmap to Reproduce o1 from Reinforcement Learning Perspective
Zeng, Zhiyuan, Cheng, Qinyuan, Yin, Zhangyue, Wang, Bo, Li, Shimin, Zhou, Yunhua, Guo, Qipeng, Huang, Xuanjing, Qiu, Xipeng
OpenAI o1 represents a significant milestone in Artificial Inteiligence, which achieves expert-level performances on many challanging tasks that require strong reasoning ability.OpenAI has claimed that the main techinique behinds o1 is the reinforcement learining. Recent works use alternative approaches like knowledge distillation to imitate o1's reasoning style, but their effectiveness is limited by the capability ceiling of the teacher model. Therefore, this paper analyzes the roadmap to achieving o1 from the perspective of reinforcement learning, focusing on four key components: policy initialization, reward design, search, and learning. Policy initialization enables models to develop human-like reasoning behaviors, equipping them with the ability to effectively explore solution spaces for complex problems. Reward design provides dense and effective signals via reward shaping or reward modeling, which is the guidance for both search and learning. Search plays a crucial role in generating high-quality solutions during both training and testing phases, which can produce better solutions with more computation. Learning utilizes the data generated by search for improving policy, which can achieve the better performance with more parameters and more searched data. Existing open-source projects that attempt to reproduce o1 can be seem as a part or a variant of our roadmap. Collectively, these components underscore how learning and search drive o1's advancement, making meaningful contributions to the development of LLM.
GenAIOps for GenAI Model-Agility
Ueno, Ken, Kogo, Makoto, Kawatsu, Hiromi, Uchiumi, Yohsuke, Tatsubori, Michiaki
AI-agility, with which an organization can be quickly adapted to its business priorities, is desired even for the development and operations of generative AI (GenAI) applications. Especially in this paper, we discuss so-called GenAI Model-agility, which we define as the readiness to be flexibly adapted to base foundation models as diverse as the model providers and versions. First, for handling issues specific to generative AI, we first define a methodology of GenAI application development and operations, as GenAIOps, to identify the problem of application quality degradation caused by changes to the underlying foundation models. We study prompt tuning technologies, which look promising to address this problem, and discuss their effectiveness and limitations through case studies using existing tools.
NVIDIA's latest compact generative AI supercomputer is also its cheapest
NVIDIA has just revealed the Jetson Orin Nano Super Developer Kit, which is the successor to its Jetson Orin Nano kit from 2022. This new compact generative AI supercomputer can fit into the palm of your hand. Included in the developer kit is an 8GB Jetson Orin Nano system-on-module and a reference carrier board. In terms of performance, the Jetson Orin Nano Super can reach 68 trillion operations per second (TOPS), a 70 percent increase from its predecessor. NVIDIA also claims a 1.7 times improvement in generative AI inference performance and a 50 percent bandwidth increase to 102GB per second.
Progressive Monitoring of Generative Model Training Evolution
Prasad, Vidya, Vilanova, Anna, Pezzotti, Nicola
While deep generative models (DGMs) have gained popularity, their susceptibility to biases and other inefficiencies that lead to undesirable outcomes remains an issue. With their growing complexity, there is a critical need for early detection of issues to achieve desired results and optimize resources. Hence, we introduce a progressive analysis framework to monitor the training process of DGMs. Our method utilizes dimensionality reduction techniques to facilitate the inspection of latent representations, the generated and real distributions, and their evolution across training iterations. This monitoring allows us to pause and fix the training method if the representations or distributions progress undesirably. This approach allows for the analysis of a models' training dynamics and the timely identification of biases and failures, minimizing computational loads. We demonstrate how our method supports identifying and mitigating biases early in training a Generative Adversarial Network (GAN) and improving the quality of the generated data distribution.
Breaking the Programming Language Barrier: Multilingual Prompting to Empower Non-Native English Learners
Prather, James, Reeves, Brent N., Denny, Paul, Leinonen, Juho, MacNeil, Stephen, Luxton-Reilly, Andrew, Orvalho, João, Alipour, Amin, Alfageeh, Ali, Amarouche, Thezyrie, Kimmel, Bailey, Wright, Jared, Blake, Musa, Barbre, Gweneth
Non-native English speakers (NNES) face multiple barriers to learning programming. These barriers can be obvious, such as the fact that programming language syntax and instruction are often in English, or more subtle, such as being afraid to ask for help in a classroom full of native English speakers. However, these barriers are frustrating because many NNES students know more about programming than they can articulate in English. Advances in generative AI (GenAI) have the potential to break down these barriers because state of the art models can support interactions in multiple languages. Moreover, recent work has shown that GenAI can be highly accurate at code generation and explanation. In this paper, we provide the first exploration of NNES students prompting in their native languages (Arabic, Chinese, and Portuguese) to generate code to solve programming problems. Our results show that students are able to successfully use their native language to solve programming problems, but not without some difficulty specifying programming terminology and concepts. We discuss the challenges they faced, the implications for practice in the short term, and how this might transform computing education globally in the long term.
Optimized two-stage AI-based Neural Decoding for Enhanced Visual Stimulus Reconstruction from fMRI Data
Veronese, Lorenzo, Moglia, Andrea, Mainardi, Luca, Cerveri, Pietro
AI-based neural decoding reconstructs visual perception by leveraging generative models to map brain activity, measured through functional MRI (fMRI), into latent hierarchical representations. Traditionally, ridge linear models transform fMRI into a latent space, which is then decoded using latent diffusion models (LDM) via a pre-trained variational autoencoder (VAE). Due to the complexity and noisiness of fMRI data, newer approaches split the reconstruction into two sequential steps, the first one providing a rough visual approximation, the second on improving the stimulus prediction via LDM endowed by CLIP embeddings. This work proposes a non-linear deep network to improve fMRI latent space representation, optimizing the dimensionality alike. Experiments on the Natural Scenes Dataset showed that the proposed architecture improved the structural similarity of the reconstructed image by about 2\% with respect to the state-of-the-art model, based on ridge linear transform. The reconstructed image's semantics improved by about 4\%, measured by perceptual similarity, with respect to the state-of-the-art. The noise sensitivity analysis of the LDM showed that the role of the first stage was fundamental to predict the stimulus featuring high structural similarity. Conversely, providing a large noise stimulus affected less the semantics of the predicted stimulus, while the structural similarity between the ground truth and predicted stimulus was very poor. The findings underscore the importance of leveraging non-linear relationships between BOLD signal and the latent representation and two-stage generative AI for optimizing the fidelity of reconstructed visual stimuli from noisy fMRI data.
Class-RAG: Real-Time Content Moderation with Retrieval Augmented Generation
Chen, Jianfa, Shen, Emily, Bavalatti, Trupti, Lin, Xiaowen, Wang, Yongkai, Hu, Shuming, Subramanyam, Harihar, Vepuri, Ksheeraj Sai, Jiang, Ming, Qi, Ji, Chen, Li, Jiang, Nan, Jain, Ankit
Recent advances in Generative AI technology have enabled new generations of product applications, such as text generation OpenAI (2023); Anthropic (2023); Dubey (2024), text-to-image generation Ramesh et al. (2021); Dai et al. (2023); Rombach et al. (2022), and text-to-video generation Meta (2024). Consequently, the pace of model development must be matched by the development of safety systems which are properly equipped to mitigate novel harms, ensuring the system's overall integrity and preventing the use of Generative AI products from being exploited by bad actors to disseminate misinformation, glorify violence, and proliferate sexual content Foundation (2023). To achieve this goal, traditional model fine-tuning approaches are often employed, with classifiers learning patterns from labeled content moderation text data leveraged as guardrails OpenAI (2023). However, there are many challenges associated with automating content moderation with fine-tuning. First, content moderation is a highly subjective task, meaning that inter-annotator agreement in labeled data is low, due to different interpretations of policy guidelines, especially on borderline cases Markov et al. (2023). Second, it is impossible to enforce a universal taxonomy of harm, not only due to the subjectivity of the task, but due to the impact of systems scaling to new locales, new audiences, and new use cases, with different guidelines and different gradients of harm defined on those guidelines Shen et al. (2024). Third, the fine-tuning development cycle, which encompasses data collection, annotation, and model experimentation, is not ideally suited to the content moderation domain, where mitigations must land as quickly as possible once vulnerabilities are established. To address these challenges of subjectivity and inflexibility as a result of scale, we propose a Classification approach to content moderation which employs Retrieval-Augmented Generation (Class-RAG) to add context to elicit reasoning for content classification. While RAG Lewis et al. (2020) is often used for knowledge-intensive tasks where factual citation is key, we find that a RAG-based solution offers a distinct value proposition for the classification task of content moderation, not only due to its ability to enhance accuracy with few-shot learning, but because of its ability to make real-time knowledge updates, which is critical in our domain for
Generative AI in Medicine
Shanmugam, Divya, Agrawal, Monica, Movva, Rajiv, Chen, Irene Y., Ghassemi, Marzyeh, Jacobs, Maia, Pierson, Emma
Excitement about the promise of generative AI in medicine has inspired an explosion of new applications. Generative models have the potential to change how care is delivered (1-5), the roles and responsibilities of care providers (6, 7), and the communication pathways between patients and providers (8, 9). Further upstream, generative models have shown promise in improving scientific discovery in medicine (through both clinical trials (10, 11) and observational research (12, 13)) and facilitating medical education (8, 14). These developments are a direct result of technical advances in generative AI, which have drastically increased the ability to generate realistic language and images, and raise important questions about how to integrate generative models into medicine. Generative AI is the latest in a series of technical advances that have driven major shifts in medicine. Past significant advances include the adoption of electronic health records (EHRs); the integration of robotics into telesurgeries (15); and the incorporation of predictive models and continuous monitoring as foundational infrastructure for new diagnostic tools (16, 17).