Goto

Collaborating Authors

 Generative AI


AI Is Becoming More Powerful--but Also More Secretive

WIRED

When OpenAI published details of the stunningly capable AI language model GPT-4, which powers ChatGPT, in March, its researchers filled 100 pages. They also left out a few important details--like anything substantial about how it was actually built or how it works. That was no accidental oversight, of course. OpenAI and other big companies are keen to keep the workings of their most prized algorithms shrouded in mystery, in part out of fear the technology might be misused but also from worries about giving competitors a leg up. A study released by researchers at Stanford University this week shows just how deep--and potentially dangerous--the secrecy is around GPT-4 and other cutting-edge AI systems.


The Foundation Model Transparency Index

arXiv.org Artificial Intelligence

Foundation models have rapidly permeated society, catalyzing a wave of generative AI applications spanning enterprise and consumer-facing contexts. While the societal impact of foundation models is growing, transparency is on the decline, mirroring the opacity that has plagued past digital technologies (e.g. social media). Reversing this trend is essential: transparency is a vital precondition for public accountability, scientific innovation, and effective governance. To assess the transparency of the foundation model ecosystem and help improve transparency over time, we introduce the Foundation Model Transparency Index. The Foundation Model Transparency Index specifies 100 fine-grained indicators that comprehensively codify transparency for foundation models, spanning the upstream resources used to build a foundation model (e.g data, labor, compute), details about the model itself (e.g. size, capabilities, risks), and the downstream use (e.g. distribution channels, usage policies, affected geographies). We score 10 major foundation model developers (e.g. OpenAI, Google, Meta) against the 100 indicators to assess their transparency. To facilitate and standardize assessment, we score developers in relation to their practices for their flagship foundation model (e.g. GPT-4 for OpenAI, PaLM 2 for Google, Llama 2 for Meta). We present 10 top-level findings about the foundation model ecosystem: for example, no developer currently discloses significant information about the downstream impact of its flagship model, such as the number of users, affected market sectors, or how users can seek redress for harm. Overall, the Foundation Model Transparency Index establishes the level of transparency today to drive progress on foundation model governance via industry standards and regulatory intervention.


Conditional Generative Modeling for Images, 3D Animations, and Video

arXiv.org Artificial Intelligence

This dissertation attempts to drive innovation in the field of generative modeling for computer vision, by exploring novel formulations of conditional generative models, and innovative applications in images, 3D animations, and video. Our research focuses on architectures that offer reversible transformations of noise and visual data, and the application of encoder-decoder architectures for generative tasks and 3D content manipulation. In all instances, we incorporate conditional information to enhance the synthesis of visual data, improving the efficiency of the generation process as well as the generated content. We introduce the use of Neural ODEs to model video dynamics using an encoder-decoder architecture, demonstrating their ability to predict future video frames despite being trained solely to reconstruct current frames. Next, we propose a conditional variant of continuous normalizing flows that enables higher-resolution image generation based on lower-resolution input, achieving comparable image quality while reducing parameters and training time. Our next contribution presents a pipeline that takes human images as input, automatically aligns a user-specified 3D character with the pose of the human, and facilitates pose editing based on partial inputs. Next, we derive the relevant mathematical details for denoising diffusion models that use non-isotropic Gaussian processes, and show comparable generation quality. Finally, we devise a novel denoising diffusion framework capable of solving all three video tasks of prediction, generation, and interpolation. We perform ablation studies, and show SOTA results on multiple datasets. Our contributions are published articles at peer-reviewed venues. Overall, our research aims to make a meaningful contribution to the pursuit of more efficient and flexible generative models, with the potential to shape the future of computer vision.


Fine-Tuning Generative Models as an Inference Method for Robotic Tasks

arXiv.org Artificial Intelligence

Adaptable models could greatly benefit robotic agents operating in the real world, allowing them to deal with novel and varying conditions. While approaches such as Bayesian inference are well-studied frameworks for adapting models to evidence, we build on recent advances in deep generative models which have greatly affected many areas of robotics. Harnessing modern GPU acceleration, we investigate how to quickly adapt the sample generation of neural network models to observations in robotic tasks. We propose a simple and general method that is applicable to various deep generative models and robotic environments. The key idea is to quickly fine-tune the model by fitting it to generated samples matching the observed evidence, using the cross-entropy method. We show that our method can be applied to both autoregressive models and variational autoencoders, and demonstrate its usability in object shape inference from grasping, inverse kinematics calculation, and point cloud completion.


ChatGPT in Drug Discovery: A Case Study on Anti-Cocaine Addiction Drug Development with Chatbots

arXiv.org Artificial Intelligence

The birth of ChatGPT, a cutting-edge language model-based chatbot developed by OpenAI, ushered in a new era in AI. However, due to potential pitfalls, its role in rigorous scientific research is not clear yet. This paper vividly showcases its innovative application within the field of drug discovery. Focused specifically on developing anti-cocaine addiction drugs, the study employs GPT-4 as a virtual guide, offering strategic and methodological insights to researchers working on generative models for drug candidates. The primary objective is to generate optimal drug-like molecules with desired properties. By leveraging the capabilities of ChatGPT, the study introduces a novel approach to the drug discovery process. This symbiotic partnership between AI and researchers transforms how drug development is approached. Chatbots become facilitators, steering researchers towards innovative methodologies and productive paths for creating effective drug candidates. This research sheds light on the collaborative synergy between human expertise and AI assistance, wherein ChatGPT's cognitive abilities enhance the design and development of potential pharmaceutical solutions. This paper not only explores the integration of advanced AI in drug discovery but also reimagines the landscape by advocating for AI-powered chatbots as trailblazers in revolutionizing therapeutic innovation.


ArtWhisperer: A Dataset for Characterizing Human-AI Interactions in Artistic Creations

arXiv.org Artificial Intelligence

As generative AI becomes more prevalent, it is important to study how human users interact with such models. In this work, we investigate how people use text-to-image models to generate desired target images. To study this interaction, we created ArtWhisperer, an online game where users are given a target image and are tasked with iteratively finding a prompt that creates a similar-looking image as the target. Through this game, we recorded over 50,000 human-AI interactions; each interaction corresponds to one text prompt created by a user and the corresponding generated image. The majority of these are repeated interactions where a user iterates to find the best prompt for their target image, making this a unique sequential dataset for studying human-AI collaborations. In an initial analysis of this dataset, we identify several characteristics of prompt interactions and user strategies. People submit diverse prompts and are able to discover a variety of text descriptions that generate similar images. Interestingly, prompt diversity does not decrease as users find better prompts. We further propose a new metric to quantify the steerability of AI using our dataset. We define steerability as the expected number of interactions required to adequately complete a task. We estimate this value by fitting a Markov chain for each target task and calculating the expected time to reach an adequate score in the Markov chain. We quantify and compare AI steerability across different types of target images and two different models, finding that images of cities and natural world images are more steerable than artistic and fantasy images. These findings provide insights into human-AI interaction behavior, present a concrete method of assessing AI steerability, and demonstrate the general utility of the ArtWhisperer dataset.


ChatGPT live web browsing exits beta, DALL-E 3 enters beta

Engadget

OpenAI has brought live web browsing out of beta. The company launched the feature earlier this year before pulling it after the plugin kept gleaning data from paywalled content. In addition, the next-generation image generation tool DALL-E 3, which integrates with ChatGPT for easier prompting, is now available in beta for ChatGPT Plus and Enterprise subscribers. Browse with Bing, as live web browsing is formally called, no longer requires subscribers to switch a beta toggle under the chatbot's settings. The feature matters since, by default, the popular chatbot has a knowledge cutoff date of September 2021, leaving it clueless about current events.


ChatGPT Creator Partners With Abu Dhabi's G42 in Middle East AI Push

TIME - Tech

OpenAI, the creator of ChatGPT, is teaming up with Abu Dhabi's leading artificial intelligence firm as part of an expansion within the United Arab Emirates and the broader region. The partnership with G42, which is chaired by the UAE's influential national security adviser Sheikh Tahnoon bin Zayed Al Nahyan, will focus on delivering OpenAI's generative AI models across sectors spanning financial services to energy and healthcare. "Leveraging G42's industry expertise, we aim to empower businesses and communities with effective solutions that resonate with the nuances of the region," said Sam Altman, co-founder and chief executive officer of San Francisco-based OpenAI. The partnership is a "convergence of value and vision," G42 CEO Peng Xiao said. The companies didn't disclose financial details of their collaboration. It's partnering with Cerebras Systems Inc., which recently built the first of nine AI supercomputers as an alternative to systems using Nvidia Corp. technology.


China has a new plan for judging the safety of generative AI--and it's packed with details

MIT Technology Review

Last week we got some clarity about what all this may look like in practice. On October 11, a Chinese government organization called the National Information Security Standardization Technical Committee released a draft document that proposed detailed rules for how to determine whether a generative AI model is problematic. Often abbreviated as TC260, the committee consults corporate representatives, academics, and regulators to set up tech industry rules on issues ranging from cybersecurity to privacy to IT infrastructure. Unlike many manifestos you may have seen about how to regulate AI, this standards document is very detailed: it sets clear criteria for when a data source should be banned from training generative AI, and it gives metrics on the exact number of keywords and sample questions that should be prepared to test out a model. Matt Sheehan, a global technology fellow at the Carnegie Endowment for International Peace who flagged the document for me, said that when he first read it, he "felt like it was the most grounded and specific document related to the generative AI regulation."


Interview with Aylin Caliskan: AI ethics

AIHub

In 2023, Aylin Caliskan was recognized as one of the 100 Brilliant Women in AI Ethics. At this year's International Joint Conference on Artificial Intelligence (IJCAI 2023) she gave an IJCAI Early Career Spotlight talk about her work. I met with Aylin at the conference and chatted to her about AI ethics. We spoke about bias in generative AI tools and the associated research and societal challenges. Andrea Rafai: We've seen generative AI tools become mainstream recently.