Generative AI
Will Code Remain a Relevant User Interface for End-User Programming with Generative AI Models?
The research field of end-user programming has largely been concerned with helping non-experts learn to code sufficiently well in order to achieve their tasks. Generative AI stands to obviate this entirely by allowing users to generate code from naturalistic language prompts. In this essay, we explore the extent to which "traditional" programming languages remain relevant for non-expert end-user programmers in a world with generative AI. We posit the "generative shift hypothesis": that generative AI will create qualitative and quantitative expansions in the traditional scope of end-user programming. We outline some reasons that traditional programming languages may still be relevant and useful for end-user programmers. We speculate whether each of these reasons might be fundamental and enduring, or whether they may disappear with further improvements and innovations in generative AI. Finally, we articulate a set of implications for end-user programming research, including the possibility of needing to revisit many well-established core concepts, such as Ko's learning barriers and Blackwell's attention investment model.
Conceptual Framework for Autonomous Cognitive Entities
Shapiro, David, Li, Wangfan, Delaflor, Manuel, Toxtli, Carlos
The rapid development and adoption of Generative AI (GAI) technology in the form of chatbots such as ChatGPT and Claude has greatly increased interest in agentic machines. This paper introduces the Autonomous Cognitive Entity (ACE) model, a novel framework for a cognitive architecture, enabling machines and software agents to operate more independently. Drawing inspiration from the OSI model, the ACE framework presents layers of abstraction to conceptualize artificial cognitive architectures. The model is designed to harness the capabilities of the latest generative AI technologies, including large language models (LLMs) and multimodal generative models (MMMs), to build autonomous, agentic systems. The ACE framework comprises six layers: the Aspirational Layer, Global Strategy, Agent Model, Executive Function, Cognitive Control, and Task Prosecution. Each layer plays a distinct role, ranging from setting the moral compass and strategic thinking to task selection and execution. The ACE framework also incorporates mechanisms for handling failures and adapting actions, thereby enhancing the robustness and flexibility of autonomous agents. This paper introduces the conceptual framework and proposes implementation strategies that have been tested and observed in industry. The goal of this paper is to formalize this framework so as to be more accessible.
AI apocalypse team formed to fend off catastrophic nuclear and biochemical doomsday scenarios
AI expert Marva Bailer explains how, even though there are currently laws in place, the average person has more access than ever to create deepfakes of celebrities. Artificial intelligence (AI) is advancing rapidly, bringing unprecedented benefits to us, yet it also poses serious risks, such as chemical, biological, radiological and nuclear (CBRN) threats, that could have catastrophic consequences for the world. How can we ensure that AI is used for good and not evil? How can we prepare for the worst-case scenarios that might arise from AI? CLICK TO GET KURT'S FREE CYBERGUY NEWSLETTER WITH SECURITY ALERTS, QUICK VIDEO TIPS, TECH REVIEWS, AND EASY HOW-TO'S TO MAKE YOU SMARTER These are some of the questions that OpenAI, a leading AI research lab and the company behind ChatGPT, is trying to answer with its new Preparedness team. Its mission is to track, evaluate, forecast and protect against the frontier risks of AI models.
My Imagination Is on Steroids Now
What if The Atlantic owned a train car? Amtrak, I had just learned on the internet, allows owners of private railcars to lash onto runs along the Northeast Corridor, among other routes. "We should have a train car," I slacked an editor. Moments later, it appeared on my screen, bright red with our magazine's logo emblazoned in white, just like I'd ordered. It's an old logo, and misspelled, but the effect was the same: A momentary notion--one unworthy of relating to someone in private, let alone executing--had been realized, thanks to DALL-E 3, an artificial-intelligence image generator now built into Microsoft Bing's Image Creator website.
'Is this an appropriate use of AI or not?': teachers say classrooms are now AI testing labs
In the year since OpenAI released ChatGPT, high school teacher Vicki Davis has been rethinking every single assignment she gives her students. Davis, a computer science teacher at Sherwood Christian Academy in Georgia, was well-positioned to be an early adopter of the technology. She's also the IT director at the school and helped put together an AI policy in March: the school opted to allow the use of AI tools for specific projects so long as students discussed it with their teachers and cited the tool. In Davis' mind, there were good and bad uses of AI, and ignoring its growing popularity was not going to help students unlock the productive uses or understand its dangers. "It's actually changed how I design my projects because there are some times I want my students to use AI, and then there are times I don't want them to," Davis said.
Rishi Sunak's AI safety summit appears slick โ but look closer and alarm bells start ringing Chris Stokel-Walker
The UK's AI safety summit opens at Bletchley Park this week, and is the passion project of Rishi Sunak: a prime minister desperate for a good news story as his government looks down the barrel of a crushing election defeat. Sunak appears to want progress on AI to become his lasting legacy. Last week, he delivered a speech about the risks of AI if weaponised by terrorists and cybercriminals, and published a series of documents on "frontier AI", an industry term for generative AI tools such as ChatGPT and DALL-E. He even unveiled a UK AI safety institute. The slick โ albeit very behind in the polls โ Stanford MBA grad who likes to holiday in California had, to use a favoured phrase of his, "got to grips" with the problem.
Balancing Act: Constraining Disparate Impact in Sparse Models
Hashemizadeh, Meraj, Ramirez, Juan, Sukumaran, Rohan, Farnadi, Golnoosh, Lacoste-Julien, Simon, Gallego-Posada, Jose
Model pruning is a popular approach to enable the deployment of large deep learning models on edge devices with restricted computational or storage capacities. Although sparse models achieve performance comparable to that of their dense counterparts at the level of the entire dataset, they exhibit high accuracy drops for some data sub-groups. Existing methods to mitigate this disparate impact induced by pruning (i) rely on surrogate metrics that address the problem indirectly and have limited interpretability; or (ii) scale poorly with the number of protected sub-groups in terms of computational cost. We propose a constrained optimization approach that directly addresses the disparate impact of pruning: our formulation bounds the accuracy change between the dense and sparse models, for each subgroup. This choice of constraints provides an interpretable success criterion to determine if a pruned model achieves acceptable disparity levels. Experimental results demonstrate that our technique scales reliably to problems involving large models and hundreds of protected sub-groups. Current deep learning practice displays a trend towards larger architectures (Bommasani et al., 2021), as exemplified by popular models such as GPT-4 (OpenAI, 2023), Llama 2 (Touvron et al., 2023) and DALL-E 2 (Ramesh et al., 2022). Model compression techniques such as pruning (Gale et al., 2019), knowledge distillation (Hinton et al., 2015), or quantization (Gholami et al., 2021) are crucial towards enabling the deployment of large models across a wide range of platforms, including resource-constrained edge devices like smartphones. Despite achieving comparable performance at an aggregate level over the entire dataset, pruned models often exhibit significant accuracy reduction for some data sub-groups (Hooker et al., 2019; 2020; Paganini, 2020). In particular, under-represented groups can suffer high performance degradation while the overall performance remains unaffected, thus exacerbating systemic biases in machine learning models. Tran et al. (2022) refer to this phenomenon as the disparate impact of pruning. Existing mitigation methods face challenges in terms of interpretability and scalability to a large number of sub-groups. Tran et al. (2022) introduce constraints aiming to equalize the loss of the sparse model across sub-groups. However, their approach does not account for the unequal grouplevel performance of the dense model. Moreover, while the loss can be a useful surrogate for training, this method addresses the disparate impact issue indirectly as it focuses on controlling the loss, rather than group-level changes in accuracy. Alternatively, Lin et al. (2022) compute per-group importance scores for every model parameter to determine the weights to be pruned. This approach becomes prohibitively expensive when the model or the number of sub-groups is large.
The Generative AI Paradox: "What It Can Create, It May Not Understand"
West, Peter, Lu, Ximing, Dziri, Nouha, Brahman, Faeze, Li, Linjie, Hwang, Jena D., Jiang, Liwei, Fisher, Jillian, Ravichander, Abhilasha, Chandu, Khyathi, Newman, Benjamin, Koh, Pang Wei, Ettinger, Allyson, Choi, Yejin
The recent wave of generative AI has sparked unprecedented global attention, with both excitement and concern over potentially superhuman levels of artificial intelligence: models now take only seconds to produce outputs that would challenge or exceed the capabilities even of expert humans. At the same time, models still show basic errors in understanding that would not be expected even in non-expert humans. This presents us with an apparent paradox: how do we reconcile seemingly superhuman capabilities with the persistence of errors that few humans would make? In this work, we posit that this tension reflects a divergence in the configuration of intelligence in today's generative models relative to intelligence in humans. Specifically, we propose and test the Generative AI Paradox hypothesis: generative models, having been trained directly to reproduce expert-like outputs, acquire generative capabilities that are not contingent upon -- and can therefore exceed -- their ability to understand those same types of outputs. This contrasts with humans, for whom basic understanding almost always precedes the ability to generate expert-level outputs. We test this hypothesis through controlled experiments analyzing generation vs. understanding in generative models, across both language and image modalities. Our results show that although models can outperform humans in generation, they consistently fall short of human capabilities in measures of understanding, as well as weaker correlation between generation and understanding performance, and more brittleness to adversarial inputs. Our findings support the hypothesis that models' generative capability may not be contingent upon understanding capability, and call for caution in interpreting artificial intelligence by analogy to human intelligence.
Diversity and Diffusion: Observations on Synthetic Image Distributions with Stable Diffusion
Marwood, David, Baluja, Shumeet, Alon, Yair
Recent progress in text-to-image (TTI) systems, such as StableDiffusion, Imagen, and DALL-E 2, have made it possible to create realistic images with simple text prompts. It is tempting to use these systems to eliminate the manual task of obtaining natural images for training a new machine learning classifier. However, in all of the experiments performed to date, classifiers trained solely with synthetic images perform poorly at inference, despite the images used for training appearing realistic. Examining this apparent incongruity in detail gives insight into the limitations of the underlying image generation processes. Through the lens of diversity in image creation vs.accuracy of what is created, we dissect the differences in semantic mismatches in what is modeled in synthetic vs. natural images. This will elucidate the roles of the image-languag emodel, CLIP, and the image generation model, diffusion. We find four issues that limit the usefulness of TTI systems for this task: ambiguity, adherence to prompt, lack of diversity, and inability to represent the underlying concept. We further present surprising insights into the geometry of CLIP embeddings.
Sociotechnical Safety Evaluation of Generative AI Systems
Weidinger, Laura, Rauh, Maribeth, Marchal, Nahema, Manzini, Arianna, Hendricks, Lisa Anne, Mateos-Garcia, Juan, Bergman, Stevie, Kay, Jackie, Griffin, Conor, Bariach, Ben, Gabriel, Iason, Rieser, Verena, Isaac, William
Generative AI systems produce a range of risks. To ensure the safety of generative AI systems, these risks must be evaluated. In this paper, we make two main contributions toward establishing such evaluations. First, we propose a three-layered framework that takes a structured, sociotechnical approach to evaluating these risks. This framework encompasses capability evaluations, which are the main current approach to safety evaluation. It then reaches further by building on system safety principles, particularly the insight that context determines whether a given capability may cause harm. To account for relevant context, our framework adds human interaction and systemic impacts as additional layers of evaluation. Second, we survey the current state of safety evaluation of generative AI systems and create a repository of existing evaluations. Three salient evaluation gaps emerge from this analysis. We propose ways forward to closing these gaps, outlining practical steps as well as roles and responsibilities for different actors. Sociotechnical safety evaluation is a tractable approach to the robust and comprehensive safety evaluation of generative AI systems.