Generative AI
Look Within, Why LLMs Hallucinate: A Causal Perspective
Li, He, Chi, Haoang, Liu, Mingyu, Yang, Wenjing
The emergence of large language models (LLMs) is a milestone in generative artificial intelligence, achieving significant success in text comprehension and generation tasks. Despite the tremendous success of LLMs in many downstream tasks, they suffer from severe hallucination problems, posing significant challenges to the practical applications of LLMs. Most of the works about LLMs' hallucinations focus on data quality. Self-attention is a core module in transformer-based LLMs, while its potential relationship with LLMs' hallucination has been hardly investigated. To fill this gap, we study this problem from a causal perspective. We propose a method to intervene in LLMs' self-attention layers and maintain their structures and sizes intact. Specifically, we disable different self-attention layers in several popular open-source LLMs and then compare their degrees of hallucination with the original ones. We evaluate the intervened LLMs on hallucination assessment benchmarks and conclude that disabling some specific self-attention layers in the front or tail of the LLMs can alleviate hallucination issues. The study paves a new way for understanding and mitigating LLMs' hallucinations.
OpenAI is reportedly working on more advanced AI models capable of reasoning and 'deep research'
A new report from Reuters claims OpenAI is developing technology to bring advanced reasoning capabilities to its AI models under a secret project code-named "Strawberry." Among the project's goals is to enable the company's AI models to autonomously scour the internet in order to "plan ahead" for more complex tasks, according to an internal document seen by Reuters. The project previously went by the name of Q* (pronounced "Q star"), demos of which showed earlier this year that it could answer "tricky science and math questions," Reuters reports, citing unnamed sources who witnessed the demonstrations. At this stage, much remains unknown about Strawberry -- including how far along in development it is, and whether it's the same system with "human-like reasoning" skills that OpenAI reportedly demonstrated at an employee all-hands meeting earlier this week, per Bloomberg. But the ability for the company's AI to conduct "deep research," as is said to be the aim of Strawberry, would mark a huge leap forward from what's available today.
OpenAI whistleblowers call for SEC probe into NDAs that kept employees from speaking out on safety risks
OpenAI's NDAs are once again under scrutiny after whistleblowers penned a letter to the SEC alleging that employees were made to sign "illegally restrictive" agreements preventing them from speaking out on the potential harms of the company's technology. The letter, which was obtained and published online by The Washington Post, accuses OpenAI of violating SEC rules meant to protect employees' rights to report their concerns to federal authorities and prevent retaliation. It follows an official complaint that was filed with the SEC in June. In the letter, the whistleblowers ask the SEC to "take swift and aggressive steps" to enforce the rules they say OpenAI has violated. The alleged violations include making employees sign agreements "that failed to exempt disclosures of securities violations to the SEC" and requiring employees obtain consent from the company before disclosing confidential information to the authorities.
AI makes writing easier, but stories sound alike, study says
Books and movies of the future could all start to feel the same if creative industries embrace artificial intelligence to help write stories, a study published on Friday warned. The research, which drew on hundreds of volunteers and was published in Science Advances, comes amid rising fears over the impact of widely available AI tools that turn simple text prompts into relatively sophisticated music, art and writing. "Our goal was to study to what extent and how generative AI might help humans with creativity," co-author Anil Doshi of the University College London said.
AI can make you more creative--but it has limits
That's what two researchers set out to explore in new research published today in Science Advances, studying how people used OpenAI's large language model GPT-4 to write short stories. The model was helpful--but only to an extent. They found that while AI improved the output of less creative writers, it made little difference to the quality of the stories produced by writers who were already creative. The stories in which AI had played a part were also more similar to each other than those dreamed up entirely by humans. The research adds to the growing body of work investigating how generative AI affects human creativity, suggesting that although access to AI can offer a creative boost to an individual, it reduces creativity in the aggregate.
FairyLandAI: Personalized Fairy Tales utilizing ChatGPT and DALLE-3
Makridis, Georgios, Oikonomou, Athanasios, Koukos, Vasileios
In the diverse world of AI-driven storytelling, there is a unique opportunity to engage young audiences with customized, and personalized narratives. This paper introduces FairyLandAI an innovative Large Language Model (LLM) developed through OpenAI's API, specifically crafted to create personalized fairytales for children. The distinctive feature of FairyLandAI is its dual capability: it not only generates stories that are engaging, age-appropriate, and reflective of various traditions but also autonomously produces imaginative prompts suitable for advanced image generation tools like GenAI and Dalle-3, thereby enriching the storytelling experience. FairyLandAI is expertly tailored to resonate with the imaginative worlds of children, providing narratives that are both educational and entertaining and in alignment with the moral values inherent in different ages. Its unique strength lies in customizing stories to match individual children's preferences and cultural backgrounds, heralding a new era in personalized storytelling. Further, its integration with image generation technology offers a comprehensive narrative experience that stimulates both verbal and visual creativity. Empirical evaluations of FairyLandAI demonstrate its effectiveness in crafting captivating stories for children, which not only entertain but also embody the values and teachings of diverse traditions. This model serves as an invaluable tool for parents and educators, supporting them in imparting meaningful moral lessons through engaging narratives. FairyLandAI represents a pioneering step in using LLMs, particularly through OpenAI's API, for educational and cultural enrichment, making complex moral narratives accessible and enjoyable for young, imaginative minds.
Machine Apophenia: The Kaleidoscopic Generation of Architectural Images
Tikhonov, Alexey, Sinyavin, Dmitry
This study investigates the application of generative artificial intelligence in architectural design. We present a novel methodology that combines multiple neural networks to create an unsupervised and unmoderated stream of unique architectural images. Our approach is grounded in the conceptual framework called machine apophenia. We hypothesize that neural networks, trained on diverse human-generated data, internalize aesthetic preferences and tend to produce coherent designs even from random inputs. The methodology involves an iterative process of image generation, description, and refinement, resulting in captioned architectural postcards automatically shared on several social media platforms. Evaluation and ablation studies show the improvement both in technical and aesthetic metrics of resulting images on each step.
Procedural Content Generation via Generative Artificial Intelligence
Mao, Xinyu, Yu, Wanli, Yamada, Kazunori D, Zielewski, Michael R.
The attempt to utilize machine learning in PCG has been made in the past. In this survey paper, we investigate how generative artificial intelligence (AI), which saw a significant increase in interest in the mid-2010s, is being used for PCG. We review applications of generative AI for the creation of various types of content, including terrains, items, and even storylines. While generative AI is effective for PCG, one significant issues it faces is that building high-performance generative AI requires vast amounts of training data. Because content generally highly customized, domain-specific training data is scarce, and straightforward approaches to generative AI models may not work well. For PCG research to advance further, issues related to limited training data must be overcome. Thus, we also give special consideration to research that addresses the challenges posed by limited training data.
Refusing Safe Prompts for Multi-modal Large Language Models
Shao, Zedian, Liu, Hongbin, Hu, Yuepeng, Gong, Neil Zhenqiang
Multimodal large language models (MLLMs) have become the cornerstone of today's generative AI ecosystem, sparking intense competition among tech giants and startups. In particular, an MLLM generates a text response given a prompt consisting of an image and a question. While state-of-the-art MLLMs use safety filters and alignment techniques to refuse unsafe prompts, in this work, we introduce MLLM-Refusal, the first method that induces refusals for safe prompts. In particular, our MLLM-Refusal optimizes a nearly-imperceptible refusal perturbation and adds it to an image, causing target MLLMs to likely refuse a safe prompt containing the perturbed image and a safe question. Specifically, we formulate MLLM-Refusal as a constrained optimization problem and propose an algorithm to solve it. Our method offers competitive advantages for MLLM model providers by potentially disrupting user experiences of competing MLLMs, since competing MLLM's users will receive unexpected refusals when they unwittingly use these perturbed images in their prompts. We evaluate MLLM-Refusal on four MLLMs across four datasets, demonstrating its effectiveness in causing competing MLLMs to refuse safe prompts while not affecting non-competing MLLMs. Furthermore, we explore three potential countermeasures -- adding Gaussian noise, DiffPure, and adversarial training. Our results show that they are insufficient: though they can mitigate MLLM-Refusal's effectiveness, they also sacrifice the accuracy and/or efficiency of the competing MLLM. The code is available at https://github.com/Sadcardation/MLLM-Refusal.
Natural language is not enough: Benchmarking multi-modal generative AI for Verilog generation
Chang, Kaiyan, Chen, Zhirong, Zhou, Yunhao, Zhu, Wenlong, wang, kun, Xu, Haobo, Li, Cangyuan, Wang, Mengdi, Liang, Shengwen, Li, Huawei, Han, Yinhe, Wang, Ying
Natural language interfaces have exhibited considerable potential in the automation of Verilog generation derived from high-level specifications through the utilization of large language models, garnering significant attention. Nevertheless, this paper elucidates that visual representations contribute essential contextual information critical to design intent for hardware architectures possessing spatial complexity, potentially surpassing the efficacy of natural-language-only inputs. Expanding upon this premise, our paper introduces an open-source benchmark for multi-modal generative models tailored for Verilog synthesis from visual-linguistic inputs, addressing both singular and complex modules. Additionally, we introduce an open-source visual and natural language Verilog query language framework to facilitate efficient and user-friendly multi-modal queries. To evaluate the performance of the proposed multi-modal hardware generative AI in Verilog generation tasks, we compare it with a popular method that relies solely on natural language. Our results demonstrate a significant accuracy improvement in the multi-modal generated Verilog compared to queries based solely on natural language. We hope to reveal a new approach to hardware design in the large-hardware-design-model era, thereby fostering a more diversified and productive approach to hardware design.