Generative AI
Can LLMs Generate Visualizations with Dataless Prompts?
Coelho, Darius, Barot, Harshit, Rathod, Naitik, Mueller, Klaus
Recent advancements in large language models have revolutionized information access, as these models harness data available on the web to address complex queries, becoming the preferred information source for many users. In certain cases, queries are about publicly available data, which can be effectively answered with data visualizations. In this paper, we investigate the ability of large language models to provide accurate data and relevant visualizations in response to such queries. Specifically, we investigate the ability of GPT-3 and GPT-4 to generate visualizations with dataless prompts, where no data accompanies the query. We evaluate the results of the models by comparing them to visualization cheat sheets created by visualization experts.
Human-AI Safety: A Descendant of Generative AI and Control Systems Safety
Bajcsy, Andrea, Fisac, Jaime F.
Artificial intelligence (AI) is interacting with people at an unprecedented scale, offering new avenues for immense positive impact, but also raising widespread concerns around the potential for individual and societal harm. Today, the predominant paradigm for human--AI safety focuses on fine-tuning the generative model's outputs to better agree with human-provided examples or feedback. In reality, however, the consequences of an AI model's outputs cannot be determined in isolation: they are tightly entangled with the responses and behavior of human users over time. In this paper, we distill key complementary lessons from AI safety and control systems safety, highlighting open challenges as well as key synergies between both fields. We then argue that meaningful safety assurances for advanced AI technologies require reasoning about how the feedback loop formed by AI outputs and human behavior may drive the interaction towards different outcomes. To this end, we introduce a unifying formalism to capture dynamic, safety-critical human--AI interactions and propose a concrete technical roadmap towards next-generation human-centered AI safety.
OpticGAI: Generative AI-aided Deep Reinforcement Learning for Optical Networks Optimization
Li, Siyuan, Lin, Xi, Liu, Yaju, Li, Gaolei, Li, Jianhua
Deep Reinforcement Learning (DRL) is regarded as a promising tool for optical network optimization. However, the flexibility and efficiency of current DRL-based solutions for optical network optimization require further improvement. Currently, generative models have showcased their significant performance advantages across various domains. In this paper, we introduce OpticGAI, the AI-generated policy design paradigm for optical networks. In detail, it is implemented as a novel DRL framework that utilizes generative models to learn the optimal policy network. Furthermore, we assess the performance of OpticGAI on two NP-hard optical network problems, Routing and Wavelength Assignment (RWA) and dynamic Routing, Modulation, and Spectrum Allocation (RMSA), to show the feasibility of the AI-generated policy paradigm. Simulation results have shown that OpticGAI achieves the highest reward and the lowest blocking rate of both RWA and RMSA problems. OpticGAI poses a promising direction for future research on generative AI-enhanced flexible optical network optimization.
As Employers Embrace AI, Workers Fret--and Seek Input
The Swedish buy-now-pay-later company Klarna has become something of a poster child for the potential benefits of generative artificial intelligence. The company relies on AI to create and tailor promotional images and to draft marketing copy, saving millions of dollars. Earlier this year it said an AI chatbot assistant was doing the work of 700 human customer-service agents, which it forecast would boost profits by 40 million this year. Klarna's approach highlights generative AI's promise for powering businesswide systems, like customer service. U.S. businesses are investing in AI, and they're eager to see such gains.
Is Alexa about to get smarter? Amazon will copy Apple by giving its smart assistant a powerful AI revamp, report claims
Just a few weeks after Apple laid out its grand new AI project, it seems Amazon is getting in on the act too. The tech giant is about to fit its smart assistant Alexa with powerful generative AI capabilities that make it much smarter, according to a report. Alexa will be fitted with a'conversational generative AI', it says โ although it's unclear what AI model this will actually be. It means she will be able to respond faster and in more human-like language in response to complicated prompts or queries. Amazon's big rivals in tech already have their own AI chatbots, including Google (Gemini) and X (Grok) while Microsoft and Apple have integrations with ChatGPT.
Beyond Efficiency: Scaling AI Sustainably
Wu, Carole-Jean, Acun, Bilge, Raghavendra, Ramya, Hazelwood, Kim
Barroso's seminal contributions in energy-proportional warehouse-scale computing launched an era where modern datacenters have become more energy efficient and cost effective than ever before. At the same time, modern AI applications have driven ever-increasing demands in computing, highlighting the importance of optimizing efficiency across the entire deep learning model development cycle. This paper characterizes the carbon impact of AI, including both operational carbon emissions from training and inference as well as embodied carbon emissions from datacenter construction and hardware manufacturing. We highlight key efficiency optimization opportunities for cutting-edge AI technologies, from deep learning recommendation models to multi-modal generative AI tasks. To scale AI sustainably, we must also go beyond efficiency and optimize across the life cycle of computing infrastructures, from hardware manufacturing to datacenter operations and end-of-life processing for the hardware.
KI-Bilder und die Widerst\"andigkeit der Medienkonvergenz: Von prim\"arer zu sekund\"arer Intermedialit\"at?
The article presents some current observations (as of April 10, 2024) on the integration of AI-generated images within processes of media convergence. It draws on two different concepts of intermediality. Primary intermediality concepts are motivated by the object when a new type of technology develops the potential to become socially relevant as a media form and thus a socially, politically, or culturally important communicative factor. Due to their uncertain 'measurements' within the wider media ecology, however, the new, still potential media form appears hybrid. The "inter-" or "between-" of this initial intermediality moment thus refers to the questionable "site" and the questionable description of the potential media form between already existing technologies and cultural forms and their conceptual measurements. For secondary concepts of intermediality, in contrast, it can be assumed that the boundaries of media forms and their application have already been drawn and are reasonably undisputed. This then raises the question of intentional and staged references to AI imagery within other media forms and pictures. The article discusses indicators of both intermediality moments using current examples and controversies surrounding AI images. The thesis is that there can be no talk of a seamless 'integration' of AI images into the wider media landscape at the moment (within films, comic books, or video games, for example) - as one of countless other image production techniques - and that the medial 'site' of AI image circulation - at least where it is not a matter of deception, but rather their conscious use as AI images - especially in social media communication and in fan cultures, but with repercussions for the more general media ecology and image interpretation, insofar as the suspicion that an image could be AI-generated is now increasingly present as a "hermeneutics of suspicion".
Image Conductor: Precision Control for Interactive Video Synthesis
Li, Yaowei, Wang, Xintao, Zhang, Zhaoyang, Wang, Zhouxia, Yuan, Ziyang, Xie, Liangbin, Zou, Yuexian, Shan, Ying
Filmmaking and animation production often require sophisticated techniques for coordinating camera transitions and object movements, typically involving labor-intensive real-world capturing. Despite advancements in generative AI for video creation, achieving precise control over motion for interactive video asset generation remains challenging. To this end, we propose Image Conductor, a method for precise control of camera transitions and object movements to generate video assets from a single image. An well-cultivated training strategy is proposed to separate distinct camera and object motion by camera LoRA weights and object LoRA weights. To further address cinematographic variations from ill-posed trajectories, we introduce a camera-free guidance technique during inference, enhancing object movements while eliminating camera transitions. Additionally, we develop a trajectory-oriented video motion data curation pipeline for training. Quantitative and qualitative experiments demonstrate our method's precision and fine-grained control in generating motion-controllable videos from images, advancing the practical application of interactive video synthesis. Project webpage available at https://liyaowei-stu.github.io/project/ImageConductor/
V-RECS, a Low-Cost LLM4VIS Recommender with Explanations, Captioning and Suggestions
Podo, Luca, Angelini, Marco, Velardi, Paola
NL2VIS (natural language to visualization) is a promising and recent research area that involves interpreting natural language queries and translating them into visualizations that accurately represent the underlying data. As we navigate the era of big data, NL2VIS holds considerable application potential since it greatly facilitates data exploration by non-expert users. Following the increasingly widespread usage of generative AI in NL2VIS applications, in this paper we present V-RECS, the first LLM-based Visual Recommender augmented with explanations(E), captioning(C), and suggestions(S) for further data exploration. V-RECS' visualization narratives facilitate both response verification and data exploration by non-expert users. Furthermore, our proposed solution mitigates computational, controllability, and cost issues associated with using powerful LLMs by leveraging a methodology to effectively fine-tune small models. To generate insightful visualization narratives, we use Chain-of-Thoughts (CoT), a prompt engineering technique to help LLM identify and generate the logical steps to produce a correct answer. Since CoT is reported to perform poorly with small LLMs, we adopted a strategy in which a large LLM (GPT-4), acting as a Teacher, generates CoT-based instructions to fine-tune a small model, Llama-2-7B, which plays the role of a Student. Extensive experiments-based on a framework for the quantitative evaluation of AI-based visualizations and on manual assessment by a group of participants-show that V-RECS achieves performance scores comparable to GPT-4, at a much lower cost. The efficacy of the V-RECS teacher-student paradigm is also demonstrated by the fact that the un-tuned Llama fails to perform the task in the vast majority of test cases. We release V-RECS for the visualization community to assist visualization designers throughout the entire visualization generation process.
Injecting Bias in Text-To-Image Models via Composite-Trigger Backdoors
Naseh, Ali, Roh, Jaechul, Bagdasaryan, Eugene, Houmansadr, Amir
Recent advances in large text-conditional image generative models such as Stable Diffusion, Midjourney, and DALL-E 3 have revolutionized the field of image generation, allowing users to produce high-quality, realistic images from textual prompts. While these developments have enhanced artistic creation and visual communication, they also present an underexplored attack opportunity: the possibility of inducing biases by an adversary into the generated images for malicious intentions, e.g., to influence society and spread propaganda. In this paper, we demonstrate the possibility of such a bias injection threat by an adversary who backdoors such models with a small number of malicious data samples; the implemented backdoor is activated when special triggers exist in the input prompt of the backdoored models. On the other hand, the model's utility is preserved in the absence of the triggers, making the attack highly undetectable. We present a novel framework that enables efficient generation of poisoning samples with composite (multi-word) triggers for such an attack. Our extensive experiments using over 1 million generated images and against hundreds of fine-tuned models demonstrate the feasibility of the presented backdoor attack. We illustrate how these biases can bypass conventional detection mechanisms, highlighting the challenges in proving the existence of biases within operational constraints. Our cost analysis confirms the low financial barrier to executing such attacks, underscoring the need for robust defensive strategies against such vulnerabilities in text-to-image generation models.