Generative AI
AgentLite: A Lightweight Library for Building and Advancing Task-Oriented LLM Agent System
Liu, Zhiwei, Yao, Weiran, Zhang, Jianguo, Yang, Liangwei, Liu, Zuxin, Tan, Juntao, Choubey, Prafulla K., Lan, Tian, Wu, Jason, Wang, Huan, Heinecke, Shelby, Xiong, Caiming, Savarese, Silvio
The booming success of LLMs initiates rapid development in LLM agents. Though the foundation of an LLM agent is the generative model, it is critical to devise the optimal reasoning strategies and agent architectures. Accordingly, LLM agent research advances from the simple chain-of-thought prompting to more complex ReAct and Reflection reasoning strategy; agent architecture also evolves from single agent generation to multi-agent conversation, as well as multi-LLM multi-agent group chat. However, with the existing intricate frameworks and libraries, creating and evaluating new reasoning strategies and agent architectures has become a complex challenge, which hinders research investigation into LLM agents. Thus, we open-source a new AI agent library, AgentLite, which simplifies this process by offering a lightweight, user-friendly platform for innovating LLM agent reasoning, architectures, and applications with ease. AgentLite is a task-oriented framework designed to enhance the ability of agents to break down tasks and facilitate the development of multi-agent systems. Furthermore, we introduce multiple practical applications developed with AgentLite to demonstrate its convenience and flexibility. Get started now at: \url{https://github.com/SalesforceAIResearch/AgentLite}.
BSPA: Exploring Black-box Stealthy Prompt Attacks against Image Generators
Tian, Yu, Yang, Xiao, Dong, Yinpeng, Yang, Heming, Su, Hang, Zhu, Jun
Extremely large image generators offer significant transformative potential across diverse sectors. It allows users to design specific prompts to generate realistic images through some black-box APIs. However, some studies reveal that image generators are notably susceptible to attacks and generate Not Suitable For Work (NSFW) contents by manually designed toxin texts, especially imperceptible to human observers. We urgently need a multitude of universal and transferable prompts to improve the safety of image generators, especially black-box-released APIs. Nevertheless, they are constrained by labor-intensive design processes and heavily reliant on the quality of the given instructions. To achieve this, we introduce a black-box stealthy prompt attack (BSPA) that adopts a retriever to simulate attacks from API users. It can effectively harness filter scores to tune the retrieval space of sensitive words for matching the input prompts, thereby crafting stealthy prompts tailored for image generators. Significantly, this approach is model-agnostic and requires no internal access to the model's features, ensuring its applicability to a wide range of image generators. Building on BSPA, we have constructed an automated prompt tool and a comprehensive prompt attack dataset (NSFWeval). Extensive experiments demonstrate that BSPA effectively explores the security vulnerabilities in a variety of state-of-the-art available black-box models, including Stable Diffusion XL, Midjourney, and DALL-E 2/3. Furthermore, we develop a resilient text filter and offer targeted recommendations to ensure the security of image generators against prompt attacks in the future.
Stable Diffusion 3 is a new AI image generator that won't mess up text in pictures, its makers claim
Stability AI, the startup behind Stable Diffusion, the tool that uses generative AI to create images from text prompts, revealed Stable Diffusion 3, a next-generation model, on Thursday. Stability AI claimed that the new model, which isn't widely available yet, improves image quality, works better with prompts containing multiple subjects, and can more accurate text as part of the generated image, something that previous Stable Diffusion models weren't great at. Stability AI CEO Emad Mosque posted some examples of this on X. The announcement comes days after Stability AI's largest rival, OpenAI, unveiled Sora, a brand new AI model capable of generating nearly-realistic, high-definition videos from simple text prompts. Sora, which isn't available to the general public yet either, sparked concerns about its potential to create realistic-looking fake footage.
GPT-4 developer tool can hack websites without human help
OpenAI's artificial intelligence model GPT-4 has the capability to hack websites and steal information from online databases without human help, researchers have found. That suggests individuals or organisations without hacking expertise could unleash AI agents to carry out cyber attacks. "You literally don't need to understand anything – you can just let the agent go hack the website by itself," says Daniel Kang at the University of Illinois Urbana-Champaign. "We think this really reduces the expertise needed to…
China's rush to dominate AI has a twist: It depends on U.S. technology
In November, a year after ChatGPT's release, a relatively unknown Chinese startup leaped to the top of a leader board that judged the abilities of open-source artificial intelligence systems. The Chinese firm, 01.AI, was only eight months old but had deep-pocketed backers and a 1 billion valuation, and was founded by a well-known investor and technologist, Kai-Fu Lee. In interviews, Lee presented his AI system as an alternative to options such as Meta's generative AI model, called LLaMA. There was just one twist: Some of the technology in 01.AI's system came from LLaMA. Lee's startup then built on Meta's technology, training its system with new data to make it more powerful.
Chipmaker Nvidia posts record growth, showing AI boom continues
Nvidia invested years ago in software and computer chips focused on AI. When the current excitement around the technology took off in late 2022 after OpenAI's release of ChatGPT, the company was well-positioned to benefit. Its tech is best-suited to run the extremely large computations needed to train AI algorithms, and now Big Tech companies are spending billions to buy its chips to keep up in the AI arms race.
Improving Deep Generative Models on Many-To-One Image-to-Image Translation
Saxena, Sagar, Teli, Mohammad Nayeem
Deep generative models have been applied to multiple applications in image-to-image translation. Generative Adversarial Networks and Diffusion Models have presented impressive results, setting new state-of-the-art results on these tasks. Most methods have symmetric setups across the different domains in a dataset. These methods assume that all domains have either multiple modalities or only one modality. However, there are many datasets that have a many-to-one relationship between two domains. In this work, we first introduce a Colorized MNIST dataset and a Color-Recall score that can provide a simple benchmark for evaluating models on many-to-one translation. We then introduce a new asymmetric framework to improve existing deep generative models on many-to-one image-to-image translation. We apply this framework to StarGAN V2 and show that in both unsupervised and semi-supervised settings, the performance of this new model improves on many-to-one image-to-image translation.
LLMs with Industrial Lens: Deciphering the Challenges and Prospects -- A Survey
Urlana, Ashok, Kumar, Charaka Vinayak, Singh, Ajeet Kumar, Garlapati, Bala Mallikarjunarao, Chalamala, Srinivasa Rao, Mishra, Rahul
Large language models (LLMs) have become the secret ingredient driving numerous industrial applications, showcasing their remarkable versatility across a diverse spectrum of tasks. From natural language processing and sentiment analysis to content generation and personalized recommendations, their unparalleled adaptability has facilitated widespread adoption across industries. This transformative shift driven by LLMs underscores the need to explore the underlying associated challenges and avenues for enhancement in their utilization. In this paper, our objective is to unravel and evaluate the obstacles and opportunities inherent in leveraging LLMs within an industrial context. To this end, we conduct a survey involving a group of industry practitioners, develop four research questions derived from the insights gathered, and examine 68 industry papers to address these questions and derive meaningful conclusions.
Nvidia reports enormous revenue as AI hits a tipping point
The artificial intelligence boom is pushing demand for Nvidia's products past Wall Street's already lofty expectations. The chipmaker beat analyst expectations on Wednesday by leaps and bounds when it reported fourth-quarter earnings, posting 22.1bn in revenue on an expected 20.55bn and 4.93 in earnings per share against an expected 4.64. Revenue was 22% higher than the previous quarter, up 265% from a year ago. Nvidia's most closely watched earnings figure – revenue from data centers – was up more than 400% from the same period last year, reaching 18.4bn. Jensen Huang, founder and CEO of Nvidia, said in a press release, "Accelerated computing and generative AI have hit the tipping point. Demand is surging worldwide across companies, industries and nations."
U.S. Copyright Office's Questions about Generative AI
In late October, the Office received approximately 10,000 comments in response to the NOI questions. The Office expects to publish a report in 2024 offering its perspective on how these questions should be answered and perhaps recommending legislation. This column reviews various positions taken in a non-random sample of comments on the most significant questions raised in the NOI. One takeaway from my review of the NOI comments is that on none of those issues is there a consensus view among the commentaries I reviewed. The Office faces a tough choice: Should it simply describe the many differences of opinion about these issues without taking sides?