Goto

Collaborating Authors

 Generative AI


AesopAgent: Agent-driven Evolutionary System on Story-to-Video Production

arXiv.org Artificial Intelligence

The Agent and AIGC (Artificial Intelligence Generated Content) technologies have recently made significant progress. We propose AesopAgent, an Agent-driven Evolutionary System on Story-to-Video Production. AesopAgent is a practical application of agent technology for multimodal content generation. The system integrates multiple generative capabilities within a unified framework, so that individual users can leverage these modules easily. This innovative system would convert user story proposals into scripts, images, and audio, and then integrate these multimodal contents into videos. Additionally, the animating units (e.g., Gen-2 and Sora) could make the videos more infectious. The AesopAgent system could orchestrate task workflow for video generation, ensuring that the generated video is both rich in content and coherent. This system mainly contains two layers, i.e., the Horizontal Layer and the Utility Layer. In the Horizontal Layer, we introduce a novel RAG-based evolutionary system that optimizes the whole video generation workflow and the steps within the workflow. It continuously evolves and iteratively optimizes workflow by accumulating expert experience and professional knowledge, including optimizing the LLM prompts and utilities usage. The Utility Layer provides multiple utilities, leading to consistent image generation that is visually coherent in terms of composition, characters, and style. Meanwhile, it provides audio and special effects, integrating them into expressive and logically arranged videos. Overall, our AesopAgent achieves state-of-the-art performance compared with many previous works in visual storytelling. Our AesopAgent is designed for convenient service for individual users, which is available on the following page: https://aesopai.github.io/.


SMART: Automatically Scaling Down Language Models with Accuracy Guarantees for Reduced Processing Fees

arXiv.org Artificial Intelligence

The advancement of Large Language Models (LLMs) has significantly boosted performance in natural language processing (NLP) tasks. However, the deployment of high-performance LLMs incurs substantial costs, primarily due to the increased number of parameters aimed at enhancing model performance. This has made the use of state-of-the-art LLMs more expensive for end-users. AI service providers, such as OpenAI and Anthropic, often offer multiple versions of LLMs with varying prices and performance. However, end-users still face challenges in choosing the appropriate LLM for their tasks that balance result quality with cost. We introduce SMART, Scaling Models Adaptively for Reduced Token Fees, a novel LLM framework designed to minimize the inference costs of NLP tasks while ensuring sufficient result quality. It enables users to specify an accuracy constraint in terms of the equivalence of outputs to those of the most powerful LLM. SMART then generates results that deviate from the outputs of this LLM only with a probability below a user-defined threshold. SMART employs a profiling phase that evaluates the performance of multiple LLMs to identify those that meet the user-defined accuracy level. SMART optimizes the tradeoff between profiling overheads and the anticipated cost savings resulting from profiling. Moreover, our approach significantly reduces inference costs by strategically leveraging a mix of LLMs. Our experiments on three real-world datasets show that, based on OpenAI models, SMART achieves significant cost savings, up to 25.6x in comparison to GPT-4.


Active Generation for Image Classification

arXiv.org Artificial Intelligence

Recently, the growing capabilities of deep generative models have underscored their potential in enhancing image classification accuracy. However, existing methods often demand the generation of a disproportionately large number of images compared to the original dataset, while having only marginal improvements in accuracy. This computationally expensive and time-consuming process hampers the practicality of such approaches. In this paper, we propose to address the efficiency of image generation by focusing on the specific needs and characteristics of the model. With a central tenet of active learning, our method, named ActGen, takes a training-aware approach to image generation. It aims to create images akin to the challenging or misclassified samples encountered by the current model and incorporates these generated images into the training set to augment model performance. ActGen introduces an attentive image guidance technique, using real images as guides during the denoising process of a diffusion model. The model's attention on class prompt is leveraged to ensure the preservation of similar foreground object while diversifying the background. Furthermore, we introduce a gradient-based generation guidance method, which employs two losses to generate more challenging samples and prevent the generated images from being too similar to previously generated ones. Experimental results on the CIFAR and ImageNet datasets demonstrate that our method achieves better performance with a significantly reduced number of generated images.


What makes an image realistic?

arXiv.org Machine Learning

The last decade has seen tremendous progress in our ability to generate realistic-looking data, be it images, text, audio, or video. Here, we discuss the closely related problem of quantifying realism, that is, designing functions that can reliably tell realistic data from unrealistic data. This problem turns out to be significantly harder to solve and remains poorly understood, despite its prevalence in machine learning and recent breakthroughs in generative AI. Drawing on insights from algorithmic information theory, we discuss why this problem is challenging, why a good generative model alone is insufficient to solve it, and what a good solution would look like. In particular, we introduce the notion of a universal critic, which unlike adversarial critics does not require adversarial training. While universal critics are not immediately practical, they can serve both as a North Star for guiding practical implementations and as a tool for analyzing existing attempts to capture realism.


Silicon Valley is pricing academics out of AI research

Washington Post - Technology News

Tech giants procure huge amounts of computing power through data centers and have access to GPUs -- specialized computer chips that are necessary for running the gargantuan calculations needed for AI. These resources are expensive: A recent report from Stanford University researchers estimated Google DeepMind's large language model, Chinchilla, cost 2.1 million to develop. More than 100 top artificial intelligence researchers on Tuesday urged generative AI companies to offer a legal and technical safe harbor to researchers so they can scrutinize their products without the fear that internet platforms will suspend their accounts or threaten legal action.


Warning over use in UK of unregulated AI chatbots to create social care plans

The Guardian

Britain's hard-pressed carers need all the help they can get. But that should not include using unregulated AI bots, according to researchers who say the AI revolution in social care needs a hard ethical edge. A pilot study by academics at the University of Oxford found some care providers had been using generative AI chatbots such as ChatGPT and Bard to create care plans for people receiving care. That presents a potential risk to patient confidentiality, according to Dr Caroline Green, an early career research fellow at the Institute for Ethics in AI at Oxford, who surveyed care organisations for the study. "If you put any type of personal data into [a generative AI chatbot], that data is used to train the language model," Green said.


Elon Musk v OpenAI: tech giants are inciting existential fears to evade scrutiny Kenan Malik

The Guardian

In 1914, on the eve of the First World War, HG Wells published a novel about the possibilities of an even greater conflagration. The World Set Free imagines, 30 years before the Manhattan Project, the creation of atomic weapons that allow "a man [to] carry about in a handbag an amount of latent energy sufficient to wreck half a city". It takes the "establishment of a world government" to bring about peace. What concerned Wells was not simply the perils of a new technology, it was also the dangers of democracy. Wells' world government was not created through democratic will but imposed as a benign dictatorship.


A Zero Trust Framework for Realization and Defense Against Generative AI Attacks in Power Grid

arXiv.org Artificial Intelligence

Understanding the potential of generative AI (GenAI)-based attacks on the power grid is a fundamental challenge that must be addressed in order to protect the power grid by realizing and validating risk in new attack vectors. In this paper, a novel zero trust framework for a power grid supply chain (PGSC) is proposed. This framework facilitates early detection of potential GenAI-driven attack vectors (e.g., replay and protocol-type attacks), assessment of tail risk-based stability measures, and mitigation of such threats. First, a new zero trust system model of PGSC is designed and formulated as a zero-trust problem that seeks to guarantee for a stable PGSC by realizing and defending against GenAI-driven cyber attacks. Second, in which a domain-specific generative adversarial networks (GAN)-based attack generation mechanism is developed to create a new vulnerability cyberspace for further understanding that threat. Third, tail-based risk realization metrics are developed and implemented for quantifying the extreme risk of a potential attack while leveraging a trust measurement approach for continuous validation. Fourth, an ensemble learning-based bootstrap aggregation scheme is devised to detect the attacks that are generating synthetic identities with convincing user and distributed energy resources device profiles. Experimental results show the efficacy of the proposed zero trust framework that achieves an accuracy of 95.7% on attack vector generation, a risk measure of 9.61% for a 95% stable PGSC, and a 99% confidence in defense against GenAI-driven attack.


The feud between Elon Musk and Sam Altman – explained

The Guardian

The day after OpenAI launched in December 2015, its co-founder Sam Altman sat down with Vanity Fair to discuss what the magazine described as "a non-profit company to save the world from a dystopian future". Altman talked up his vision for keeping artificial intelligence safe and distributing it widely, as well as his good working relationship with his co-chair – Tesla CEO Elon Musk. "I really trust him, which is obviously important to everyone involved," Altman said. Almost a decade later, Musk and Altman are locked in a public spat and looming legal battle that revolves around the end of their previous partnership and OpenAI's creation of a for-profit subsidiary now valued at 80bn. Musk filed a suit against OpenAI in a California court last week, alleging that Altman and other executives had "breached the founding agreement" of the company by pursuing private commercial success instead of working to benefit humanity.


Why Google's AI tool was slammed for showing images of people of colour

Al Jazeera

America's founding fathers depicted as Black women and Ancient Greek warriors as Asian women and men – this was the world reimagined by Google's generative AI tool, Gemini, in late February. The launch of the new image generation feature sent social media platforms into a flurry of intrigue and confusion. When users entered any prompts to create AI-generated images of people, Gemini was largely showing them results featuring people of colour – whether appropriate or not. X users shared laughs while repeatedly trying to generate images of white people on Gemini and failing to do so. While some instances were deemed humorous online, others, such as images of brown people wearing World War II Nazi uniforms with swastikas on them, prompted outrage, prompting Google to temporarily disable the tool.