Generative AI
This self-driving startup is using generative AI to predict traffic
While autonomous driving has long relied on machine learning to plan routes and detect objects, some companies and researchers are now betting that generative AI -- models that take in data of their surroundings and generate predictions -- will help bring autonomy to the next stage. Wayve, a Waabi competitor, released a comparable model last year that is trained on the video that its vehicles collect. Waabi's model works in a similar way to image or video generators like OpenAI's DALL-E and Sora. It takes point clouds of lidar data, which visualize a 3D map of the car's surroundings, and breaks them into chunks, similar to how image generators break photos into pixels. Based on its training data, Copilot4D then predicts how all points of lidar data will move.
Can an A.I. Make Plans?
Last summer, AdamYedidia, a user on a Web forum called LessWrong, published a post titled "Chess as a Case Study in Hidden Capabilities in ChatGPT." He started by noting that the Internet is filled with funny videos of ChatGPT playing bad chess: in one popular clip, the A.I. confidently and illegally moves a pawn backward. But many of these videos were made using the original version of OpenAI's chatbot, which was released to the public in late November, 2022, and was based on the GPT-3.5 large language model. Last March, OpenAI introduced an enhanced version of ChatGPT based on the more powerful GPT-4. As the post demonstrated, this new model, if prompted correctly, could play a surprisingly decent game of chess, achieving something like an Elo rating of 1000--better than roughly fifty per cent of ranked players.
Autonomous Monitoring of Pharmaceutical R&D Laboratories with 6 Axis Arm Equipped Quadruped Robot and Generative AI: A Preliminary Study
This paper presents a proof-of-concept study that examines the utilization of generative AI and mobile robotics for autonomous laboratory monitoring in the pharmaceutical R&D laboratory. The study investigates the potential advantages of anomaly detection and automated reporting by multi-modal model and Vision Foundation Model (VFM), which have the potential to enhance compliance and safety in laboratory environments. Additionally, the paper discusses the current limitations of the generative AI approach and proposes future directions for its application in lab monitoring.
Finetuning Text-to-Image Diffusion Models for Fairness
Shen, Xudong, Du, Chao, Pang, Tianyu, Lin, Min, Wong, Yongkang, Kankanhalli, Mohan
The rapid adoption of text-to-image diffusion models in society underscores an urgent need to address their biases. Without interventions, these biases could propagate a skewed worldview and restrict opportunities for minority groups. In this work, we frame fairness as a distributional alignment problem. Our solution consists of two main technical contributions: (1) a distributional alignment loss that steers specific characteristics of the generated images towards a user-defined target distribution, and (2) adjusted direct finetuning of diffusion model's sampling process (adjusted DFT), which leverages an adjusted gradient to directly optimize losses defined on the generated images. Empirically, our method markedly reduces gender, racial, and their intersectional biases for occupational prompts. Gender bias is significantly reduced even when finetuning just five soft tokens. Crucially, our method supports diverse perspectives of fairness beyond absolute equality, which is demonstrated by controlling age to a $75\%$ young and $25\%$ old distribution while simultaneously debiasing gender and race. Finally, our method is scalable: it can debias multiple concepts at once by simply including these prompts in the finetuning data. We share code and various fair diffusion model adaptors at https://sail-sg.github.io/finetune-fair-diffusion/.
From Melting Pots to Misrepresentations: Exploring Harms in Generative AI
Gautam, Sanjana, Venkit, Pranav Narayanan, Ghosh, Sourojit
With the widespread adoption of advanced generative models such as Gemini and GPT, there has been a notable increase in the incorporation of such models into sociotechnical systems, categorized under AI-as-a-Service (AIaaS). Despite their versatility across diverse sectors, concerns persist regarding discriminatory tendencies within these models, particularly favoring selected `majority' demographics across various sociodemographic dimensions. Despite widespread calls for diversification of media representations, marginalized racial and ethnic groups continue to face persistent distortion, stereotyping, and neglect within the AIaaS context. In this work, we provide a critical summary of the state of research in the context of social harms to lead the conversation to focus on their implications. We also present open-ended research questions, guided by our discussion, to help define future research pathways.
SocialGenPod: Privacy-Friendly Generative AI Social Web Applications with Decentralised Personal Data Stores
Vizgirda, Vidminas, Zhao, Rui, Goel, Naman
We present SocialGenPod, a decentralised and privacy-friendly way of deploying generative AI Web applications. Unlike centralised Web and data architectures that keep user data tied to application and service providers, we show how one can use Solid -- a decentralised Web specification -- to decouple user data from generative AI applications. We demonstrate SocialGenPod using a prototype that allows users to converse with different Large Language Models, optionally leveraging Retrieval Augmented Generation to generate answers grounded in private documents stored in any Solid Pod that the user is allowed to access, directly or indirectly. SocialGenPod makes use of Solid access control mechanisms to give users full control of determining who has access to data stored in their Pods. SocialGenPod keeps all user data (chat history, app configuration, personal documents, etc) securely in the user's personal Pod; separate from specific model or application providers. Besides better privacy controls, this approach also enables portability across different services and applications. Finally, we discuss challenges, posed by the large compute requirements of state-of-the-art models, that future research in this area should address. Our prototype is open-source and available at: https://github.com/Vidminas/socialgenpod/.
Generative Models and Connected and Automated Vehicles: A Survey in Exploring the Intersection of Transportation and AI
This report investigates the history and impact of Generative Models and Connected and Automated Vehicles (CAVs), two groundbreaking forces pushing progress in technology and transportation. By focusing on the application of generative models within the context of CAVs, the study aims to unravel how this integration could enhance predictive modeling, simulation accuracy, and decision-making processes in autonomous vehicles. This thesis discusses the benefits and challenges of integrating generative models and CAV technology in transportation. It aims to highlight the progress made, the remaining obstacles, and the potential for advancements in safety and innovation.
What Was Your Prompt? A Remote Keylogging Attack on AI Assistants
Weiss, Roy, Ayzenshteyn, Daniel, Amit, Guy, Mirsky, Yisroel
AI assistants are becoming an integral part of society, used for asking advice or help in personal and confidential issues. In this paper, we unveil a novel side-channel that can be used to read encrypted responses from AI Assistants over the web: the token-length side-channel. We found that many vendors, including OpenAI and Microsoft, have this side-channel. However, inferring the content of a response from a token-length sequence alone proves challenging. This is because tokens are akin to words, and responses can be several sentences long leading to millions of grammatically correct sentences. In this paper, we show how this can be overcome by (1) utilizing the power of a large language model (LLM) to translate these sequences, (2) providing the LLM with inter-sentence context to narrow the search space and (3) performing a known-plaintext attack by fine-tuning the model on the target model's writing style. Using these methods, we were able to accurately reconstruct 29\% of an AI assistant's responses and successfully infer the topic from 55\% of them. To demonstrate the threat, we performed the attack on OpenAI's ChatGPT-4 and Microsoft's Copilot on both browser and API traffic.
Logits of API-Protected LLMs Leak Proprietary Information
Finlayson, Matthew, Ren, Xiang, Swayamdipta, Swabha
The commercialization of large language models (LLMs) has led to the common practice of high-level API-only access to proprietary models. In this work, we show that even with a conservative assumption about the model architecture, it is possible to learn a surprisingly large amount of non-public information about an API-protected LLM from a relatively small number of API queries (e.g., costing under $1,000 for OpenAI's gpt-3.5-turbo). Our findings are centered on one key observation: most modern LLMs suffer from a softmax bottleneck, which restricts the model outputs to a linear subspace of the full output space. We show that this lends itself to a model image or a model signature which unlocks several capabilities with affordable cost: efficiently discovering the LLM's hidden size, obtaining full-vocabulary outputs, detecting and disambiguating different model updates, identifying the source LLM given a single full LLM output, and even estimating the output layer parameters. Our empirical investigations show the effectiveness of our methods, which allow us to estimate the embedding size of OpenAI's gpt-3.5-turbo to be about 4,096. Lastly, we discuss ways that LLM providers can guard against these attacks, as well as how these capabilities can be viewed as a feature (rather than a bug) by allowing for greater transparency and accountability.
Cooling-Guide Diffusion Model for Battery Cell Arrangement
Sung, Nicholas, Zheng, Liu, Wang, Pingfeng, Ahmed, Faez
Our study introduces a Generative AI method that employs a cooling-guided diffusion model to optimize the layout of battery cells, a crucial step for enhancing the cooling performance and efficiency of battery thermal management systems. Traditional design processes, which rely heavily on iterative optimization and extensive guesswork, are notoriously slow and inefficient, often leading to suboptimal solutions. In contrast, our innovative method uses a parametric denoising diffusion probabilistic model (DDPM) with classifier and cooling guidance to generate optimized cell layouts with enhanced cooling paths, significantly lowering the maximum temperature of the cells. By incorporating position-based classifier guidance, we ensure the feasibility of generated layouts. Meanwhile, cooling guidance directly optimizes cooling-efficiency, making our approach uniquely effective. When compared to two advanced models, the Tabular Denoising Diffusion Probabilistic Model (TabDDPM) and the Conditional Tabular GAN (CTGAN), our cooling-guided diffusion model notably outperforms both. It is five times more effective than TabDDPM and sixty-six times better than CTGAN across key metrics such as feasibility, diversity, and cooling efficiency. This research marks a significant leap forward in the field, aiming to optimize battery cell layouts for superior cooling efficiency, thus setting the stage for the development of more effective and dependable battery thermal management systems.