Generative AI
Getting from Generative AI to Trustworthy AI: What LLMs might learn from Cyc
Generative AI, the most popular current approach to AI, consists of large language models (LLMs) that are trained to produce outputs that are plausible, but not necessarily correct. Although their abilities are often uncanny, they are lacking in aspects of reasoning, leading LLMs to be less than completely trustworthy. Furthermore, their results tend to be both unpredictable and uninterpretable. We lay out 16 desiderata for future AI, and discuss an alternative approach to AI which could theoretically address many of the limitations associated with current approaches: AI educated with curated pieces of explicit knowledge and rules of thumb, enabling an inference engine to automatically deduce the logical entailments of all that knowledge. Even long arguments produced this way can be both trustworthy and interpretable, since the full step-by-step line of reasoning is always available, and for each step the provenance of the knowledge used can be documented and audited. There is however a catch: if the logical language is expressive enough to fully represent the meaning of anything we can say in English, then the inference engine runs much too slowly. That's why symbolic AI systems typically settle for some fast but much less expressive logic, such as knowledge graphs. We describe how one AI system, Cyc, has developed ways to overcome that tradeoff and is able to reason in higher order logic in real time. We suggest that any trustworthy general AI will need to hybridize the approaches, the LLM approach and more formal approach, and lay out a path to realizing that dream.
Experiments on Generative AI-Powered Parametric Modeling and BIM for Architectural Design
Ko, Jaechang, Ajibefun, John, Yan, Wei
With the rapid advancement of technology, artificial intelligence (AI) and machine learning (ML) have been integrated into the design process, presenting new opportunities and challenges for architects and designers. However, the potential for AI, particularly language models like ChatGPT - a conversational AI model developed by OpenAI (Radford et al. 2021)- to transform the architectural design process has yet to be fully explored. This paper presents a new framework for architectural design that uses ChatGPT and AI-based ideation and visualization tools, Veras ("VERAS" 2023), to make the design process easier and create 3D geometric models, parametric models, and Building Information Models using natural language input. The proposed framework combines ChatGPT and Veras to generate and explore design ideas rapidly. Using natural language input, architects can communicate their design intentions more intuitively, allowing quicker iterations and reducing barriers associated with traditional design tools (Hsu, Yang, and Buehler 2022). Moreover, ChatGPT's ability to understand human design intentions helps to translate the input into Building Information Modeling (BIM) and parametric Generative AI-Powered Parametric Modeling and BIM for Architectural Design 1 models, highlighting the potential of the architectural design process.
Exploring how a Generative AI interprets music
Barenboim, Gabriela, Del Debbio, Luigi, Hirn, Johannes, Sanz, Veronica
We use Google's MusicVAE, a Variational Auto-Encoder with a 512-dimensional latent space to represent a few bars of music, and organize the latent dimensions according to their relevance in describing music. We find that, on average, most latent neurons remain silent when fed real music tracks: we call these "noise" neurons. The remaining few dozens of latent neurons that do fire are called "music neurons". We ask which neurons carry the musical information and what kind of musical information they encode, namely something that can be identified as pitch, rhythm or melody. We find that most of the information about pitch and rhythm is encoded in the first few music neurons: the neural network has thus constructed a couple of variables that non-linearly encode many human-defined variables used to describe pitch and rhythm. The concept of melody only seems to show up in independent neurons for longer sequences of music.
Generative AI for Programming Education: Benchmarking ChatGPT, GPT-4, and Human Tutors
Phung, Tung, Pădurean, Victor-Alexandru, Cambronero, José, Gulwani, Sumit, Kohn, Tobias, Majumdar, Rupak, Singla, Adish, Soares, Gustavo
Generative AI and large language models hold great promise in enhancing computing education by powering next-generation educational technologies for introductory programming. Recent works have studied these models for different scenarios relevant to programming education; however, these works are limited for several reasons, as they typically consider already outdated models or only specific scenario(s). Consequently, there is a lack of a systematic study that benchmarks state-of-the-art models for a comprehensive set of programming education scenarios. In our work, we systematically evaluate two models, ChatGPT (based on GPT-3.5) and GPT-4, and compare their performance with human tutors for a variety of scenarios. We evaluate using five introductory Python programming problems and real-world buggy programs from an online platform, and assess performance using expert-based annotations. Our results show that GPT-4 drastically outperforms ChatGPT (based on GPT-3.5) and comes close to human tutors' performance for several scenarios. These results also highlight settings where GPT-4 still struggles, providing exciting future directions on developing techniques to improve the performance of these models.
ChatGPT for Shaping the Future of Dentistry: The Potential of Multi-Modal Large Language Model
Huang, Hanyao, Zheng, Ou, Wang, Dongdong, Yin, Jiayi, Wang, Zijin, Ding, Shengxuan, Yin, Heng, Xu, Chuan, Yang, Renjie, Zheng, Qian, Shi, Bing
The ChatGPT, a lite and conversational variant of Generative Pretrained Transformer 4 (GPT-4) developed by OpenAI, is one of the milestone Large Language Models (LLMs) with billions of parameters. LLMs have stirred up much interest among researchers and practitioners in their impressive skills in natural language processing tasks, which profoundly impact various fields. This paper mainly discusses the future applications of LLMs in dentistry. We introduce two primary LLM deployment methods in dentistry, including automated dental diagnosis and cross-modal dental diagnosis, and examine their potential applications. Especially, equipped with a cross-modal encoder, a single LLM can manage multi-source data and conduct advanced natural language reasoning to perform complex clinical operations. We also present cases to demonstrate the potential of a fully automatic Multi-Modal LLM AI system for dentistry clinical application. While LLMs offer significant potential benefits, the challenges, such as data privacy, data quality, and model bias, need further study. Overall, LLMs have the potential to revolutionize dental diagnosis and treatment, which indicates a promising avenue for clinical application and research in dentistry.
Evaluating ChatGPT and GPT-4 for Visual Programming
Generative AI and large language models have the potential to drastically improve the landscape of computing education by automatically generating personalized feedback and content. Recent works have studied the capabilities of these models for different programming education scenarios; however, these works considered only text-based programming, in particular, Python programming. Consequently, they leave open the question of how well these models would perform in visual programming domains popularly used for K-8 programming education. The main research question we study is: Do state-of-the-art generative models show advanced capabilities in visual programming on par with their capabilities in text-based Python programming? In our work, we evaluate two models, ChatGPT (based on GPT-3.5) and GPT-4, in visual programming domains for various scenarios and assess performance using expert-based annotations. In particular, we base our evaluation using reference tasks from the domains of Hour of Code: Maze Challenge by Code-dot-org and Karel. Our results show that these models perform poorly and struggle to combine spatial, logical, and programming skills crucial for visual programming. These results also provide exciting directions for future work on developing techniques to improve the performance of generative models in visual programming.
Does fine-tuning GPT-3 with the OpenAI API leak personally-identifiable information?
Sun, Albert Yu, Zemour, Eliott, Saxena, Arushi, Vaidyanathan, Udith, Lin, Eric, Lau, Christian, Mugunthan, Vaikkunth
Machine learning practitioners often fine-tune generative pre-trained models like GPT-3 to improve model performance at specific tasks. Previous works, however, suggest that fine-tuned machine learning models memorize and emit sensitive information from the original fine-tuning dataset. Companies such as OpenAI offer fine-tuning services for their models, but no prior work has conducted a memorization attack on any closed-source models. In this work, we simulate a privacy attack on GPT-3 using OpenAI's fine-tuning API. Our objective is to determine if personally identifiable information (PII) can be extracted from this model. We (1) explore the use of naive prompting methods on a GPT-3 fine-tuned classification model, and (2) we design a practical word generation task called Autocomplete to investigate the extent of PII memorization in fine-tuned GPT-3 within a real-world context. Our findings reveal that fine-tuning GPT3 for both tasks led to the model memorizing and disclosing critical personally identifiable information (PII) obtained from the underlying fine-tuning dataset. To encourage further research, we have made our codes and datasets publicly available on GitHub at: https://github.com/albertsun1/gpt3-pii-attacks
When Large Language Models Meet Personalization: Perspectives of Challenges and Opportunities
Chen, Jin, Liu, Zheng, Huang, Xu, Wu, Chenwang, Liu, Qi, Jiang, Gangwei, Pu, Yuanhao, Lei, Yuxuan, Chen, Xiaolong, Wang, Xingmei, Lian, Defu, Chen, Enhong
The advent of large language models marks a revolutionary breakthrough in artificial intelligence. With the unprecedented scale of training and model parameters, the capability of large language models has been dramatically improved, leading to human-like performances in understanding, language synthesizing, and common-sense reasoning, etc. Such a major leap-forward in general AI capacity will change the pattern of how personalization is conducted. For one thing, it will reform the way of interaction between humans and personalization systems. Instead of being a passive medium of information filtering, large language models present the foundation for active user engagement. On top of such a new foundation, user requests can be proactively explored, and user's required information can be delivered in a natural and explainable way. For another thing, it will also considerably expand the scope of personalization, making it grow from the sole function of collecting personalized information to the compound function of providing personalized services. By leveraging large language models as general-purpose interface, the personalization systems may compile user requests into plans, calls the functions of external tools to execute the plans, and integrate the tools' outputs to complete the end-to-end personalization tasks. Today, large language models are still being developed, whereas the application in personalization is largely unexplored. Therefore, we consider it to be the right time to review the challenges in personalization and the opportunities to address them with LLMs. In particular, we dedicate this perspective paper to the discussion of the following aspects: the development and challenges for the existing personalization system, the newly emerged capabilities of large language models, and the potential ways of making use of large language models for personalization.
I used a 'jailbreak' to unlock ChatGPT's 'dark side' - here's what happened
Ever since AI chatbot ChatGPT launched last year, people have tried to'jailbreak' the chatbot to make it answer'banned' questions or generate controversial content. 'Jailbreaking' large language models (such as ChatGPT) usually involves a confusing prompt which makes the bot roleplay as someone else - someone without boundaries, who ignores the'rules' built into bots such as ChatGPT. OpenAI has since blocked several'jailbreak' prompts But there are still several'jailbreaks' which do work, and which can unlock a weirder, wilder side of ChatGPT: DailyMail.com Sam Altman of OpenAI has discussed'jailbreaking', saying that he understood why there is a community of jailbreakers (he admitted to'jailbreaking' an iPhone himself as a younger man, a hack which allowed installation of non-Apple apps among other things). Altman said: 'We want users to have a lot of control and get the models to behave in the way they want.
AI prompt engineering: learn how not to ask a chatbot a silly question
After all the initial excitement over ChatGPT, the language-processing tool driven by artificial intelligence (AI), the use of chatbots is becoming more commonplace. So how do you train your AI for work and home? We answer a few simple questions. Systems such as ChatGPT, Bard and Dall-E will produce text, images and snippets of music when fed an input – called a prompt – that instructs them what to generate. But the phrasing of a prompt can drastically alter the returned output.