Generative AI
Google's generative AI video model is available in private preview
Google has begun rolling out private access to its Veo and Imagen 3 generative AI models. Starting today, customers of the company's Vertex AI Google Cloud package can begin using Veo to generate videos from text prompts and images. Then, as of next week, Google will make Imagen 3, its latest text-to-image framework, available to those same users. To that point, OpenAI's Sora model is still only available to select artists, academics and researchers -- though that could change quickly with the company teasing 12 days of product demos starting December 5. Of Veo, Google says the model creates 1080p footage "that's consistent and coherent" and can run "beyond a minute." The tool is also capable of working with both text prompts and images.
OpenAI Poaches 3 Top Engineers From DeepMind
OpenAI announced today it has hired three senior computer vision and machine learning engineers from rival Google DeepMind, all of whom will work in a newly opened OpenAI office in Zurich, Switzerland. OpenAI executives told staff in an internal memo on Tuesday that Lucas Beyer, Alexander Kolesnikov, and Xiaohua Zhai will be joining the company to work on multimodal AI, artificial intelligence models capable of performing tasks in different mediums ranging from images to audio. OpenAI has long been at the forefront of multimodal AI and released the first version of its text-to-image platform Dall-E in 2021. Its flagship chatbot ChatGPT, however, was initially only capable of interacting with text inputs. The company later added voice and image features as multimodal functionality became an increasingly important part of its product line and AI research.
Mira Murati Quit OpenAI. She's as Optimistic as Ever About AGI
Former OpenAI executive Mira Murati says it could take decades, but AI systems eventually will perform a wide range of cognitive tasks as well as humans do--a prospective technological milestone widely known as artificial general intelligence, or AGI. "Right now, it feels quite achievable," Murati said at WIRED's The Big Interview event in San Francisco on Tuesday. In her first interview since resigning as OpenAI's chief technology officer in September, Murati told WIRED's Steven Levy that she's not overly concerned about recent chatter in the AI industry that developing more powerful generative AI models is proving challenging. "Current evidence shows that progress will likely continue," Murati said. Whether we need new ideas to get to AGI-level systems, that's uncertain.
Seamless Optical Cloud Computing across Edge-Metro Network for Generative AI
Xing, Sizhe, Sun, Aolong, Wang, Chengxi, Wang, Yizhi, Dong, Boyu, Hu, Junhui, Deng, Xuyu, Yan, An, Liu, Yingjun, Hu, Fangchen, Li, Zhongya, Huang, Ouhan, Zhao, Junhao, Zhou, Yingjun, Li, Ziwei, Shi, Jianyang, Xiao, Xi, Penty, Richard, Cheng, Qixiang, Chi, Nan, Zhang, Junwen
The rapid advancement of generative artificial intelligence (AI) in recent years has profoundly reshaped modern lifestyles, necessitating a revolutionary architecture to support the growing demands for computational power. Cloud computing has become the driving force behind this transformation. However, it consumes significant power and faces computation security risks due to the reliance on extensive data centers and servers in the cloud. Reducing power consumption while enhancing computational scale remains persistent challenges in cloud computing. Here, we propose and experimentally demonstrate an optical cloud computing system that can be seamlessly deployed across edge-metro network. By modulating inputs and models into light, a wide range of edge nodes can directly access the optical computing center via the edge-metro network. The experimental validations show an energy efficiency of 118.6 mW/TOPs (tera operations per second), reducing energy consumption by two orders of magnitude compared to traditional electronic-based cloud computing solutions. Furthermore, it is experimentally validated that this architecture can perform various complex generative AI models through parallel computing to achieve image generation tasks.
From Words to Workflows: Automating Business Processes
Minkova, Laura, Espejel, Jessica Lรณpez, Djaidja, Taki Eddine Toufik, Dahhane, Walid, Ettifouri, El Hassane
As businesses increasingly rely on automation to streamline operations, the limitations of Robotic Process Automation (RPA) have become apparent, particularly its dependence on expert knowledge and inability to handle complex decision-making tasks. Recent advancements in Artificial Intelligence (AI), particularly Generative AI (GenAI) and Large Language Models (LLMs), have paved the way for Intelligent Automation (IA), which integrates cognitive capabilities to overcome the shortcomings of RPA. This paper introduces Text2Workflow, a novel method that automatically generates workflows from natural language user requests. Unlike traditional automation approaches, Text2Workflow offers a generalized solution for automating any business process, translating user inputs into a sequence of executable steps represented in JavaScript Object Notation (JSON) format. Leveraging the decision-making and instruction-following capabilities of LLMs, this method provides a scalable, adaptable framework that enables users to visualize and execute workflows with minimal manual intervention. This research outlines the Text2Workflow methodology and its broader implications for automating complex business processes.
Movie Gen: SWOT Analysis of Meta's Generative AI Foundation Model for Transforming Media Generation, Advertising, and Entertainment Industries
Ehtesham, Abul, Kumar, Saket, Singh, Aditi, Khoei, Tala Talaei
Generative AI is reshaping the media landscape, enabling unprecedented capabilities in video creation, personalization, and scalability. This paper presents a comprehensive SWOT analysis of Metas Movie Gen, a cutting-edge generative AI foundation model designed to produce 1080p HD videos with synchronized audio from simple text prompts. We explore its strengths, including high-resolution video generation, precise editing, and seamless audio integration, which make it a transformative tool across industries such as filmmaking, advertising, and education. However, the analysis also addresses limitations, such as constraints on video length and potential biases in generated content, which pose challenges for broader adoption. In addition, we examine the evolving regulatory and ethical considerations surrounding generative AI, focusing on issues like content authenticity, cultural representation, and responsible use. Through comparative insights with leading models like DALL-E and Google Imagen, this paper highlights Movie Gens unique features, such as video personalization and multimodal synthesis, while identifying opportunities for innovation and areas requiring further research. Our findings provide actionable insights for stakeholders, emphasizing both the opportunities and challenges of deploying generative AI in media production. This work aims to guide future advancements in generative AI, ensuring scalability, quality, and ethical integrity in this rapidly evolving field.
Enhancing Supply Chain Visibility with Generative AI: An Exploratory Case Study on Relationship Prediction in Knowledge Graphs
Zheng, Ge, Brintrup, Alexandra
A key stumbling block in effective supply chain risk management for companies and policymakers is a lack of visibility on interdependent supply network relationships. Relationship prediction, also called link prediction is an emergent area of supply chain surveillance research that aims to increase the visibility of supply chains using data-driven techniques. Existing methods have been successful for predicting relationships but struggle to extract the context in which these relationships are embedded - such as the products being supplied or locations they are supplied from. Lack of context prevents practitioners from distinguishing transactional relations from established supply chain relations, hindering accurate estimations of risk. In this work, we develop a new Generative Artificial Intelligence (Gen AI) enhanced machine learning framework that leverages pre-trained language models as embedding models combined with machine learning models to predict supply chain relationships within knowledge graphs. By integrating Generative AI techniques, our approach captures the nuanced semantic relationships between entities, thereby improving supply chain visibility and facilitating more precise risk management. Using data from a real case study, we show that GenAI-enhanced link prediction surpasses all benchmarks, and demonstrate how GenAI models can be explored and effectively used in supply chain risk management.
Integrating Generative AI into Art Therapy: A Technical Showcase
Schmutz, Yannis Valentin, Kravchenko, Tetiana, Souissi, Souhir Ben, Kurpicz-Briki, Mascha
This paper explores the integration of generative AI into the field of art therapy. Leveraging proven text-to-image models, we introduce a novel technical design to complement art therapy. The resulting AI-based tools shall enable patients to refine and customize their creative work, opening up new avenues of expression and accessibility. Using three illustrative examples, we demonstrate potential outputs of our solution and evaluate them qualitatively. Furthermore, we discuss the current limitations and ethical considerations associated with this integration and provide an outlook into future research efforts. Our implementations are publicly available at https://github.com/BFH-AMI/sds24.
Adaptive Dense Reward: Understanding the Gap Between Action and Reward Space in Alignment
Li, Yanshi, Xiong, Shaopan, Chen, Gengru, Li, Xiaoyang, Luo, Yijia, Zhang, Xingyao, Huang, Yanhui, Bu, Xingyuan, Tan, Yingshui, Yuan, Chun, Wang, Jiamang, Su, Wenbo, Zheng, Bo
Reinforcement Learning from Human Feedback (RLHF) has proven highly effective in aligning Large Language Models (LLMs) with human preferences. However, the original RLHF typically optimizes under an overall reward, which can lead to a suboptimal learning process. This limitation stems from RLHF's lack of awareness regarding which specific tokens should be reinforced or suppressed. Moreover, conflicts in supervision can arise, for instance, when a chosen response includes erroneous tokens, while a rejected response contains accurate elements. To rectify these shortcomings, increasing dense reward methods, such as step-wise and token-wise RLHF, have been proposed. However, these existing methods are limited to specific tasks (like mathematics). In this paper, we propose the "Adaptive Message-wise RLHF" method, which robustly applies to various tasks. By defining pivot tokens as key indicators, our approach adaptively identifies essential information and converts sequence-level supervision into finegrained, subsequence-level supervision. Experiments demonstrate that our method can be integrated into various training methods, significantly mitigating hallucinations and catastrophic forgetting problems, while outperforming other methods on multiple evaluation metrics. Our method improves the success rate on adversarial samples by 10% compared to the samplewise approach, and achieves a 1.3% improvement on evaluation benchmarks such as MMLU, GSM8K, HumanEval, etc. In recent years, generative AI models have made significant achievements, with preference alignment by reinforcement learning playing an essential role in this progress (Ouyang et al., 2022; Touvron et al., 2023; Rafailov et al., 2024; Dubey et al., 2024; Yang et al., 2024a; OpenAI et al., 2024).
Amazon Is Building a Mega AI Supercomputer With Anthropic
Amazon is building one of the world's most powerful artificial intelligence supercomputers in collaboration with Anthropic, an OpenAI rival that is working to push the frontier of what is possible with artificial intelligence. When completed, it will be five times larger than the cluster used to build Anthropic's current most powerful model. Amazon says it expects the supercomputer, which will feature hundreds of thousands of Amazon's latest AI training chip, Trainium 2, to be the largest reported AI machine in the world when finished. Matt Garman, the CEO of Amazon Web Services, revealed the supercomputer plans, dubbed project Rainer, at the company's Re:Invent conference in Las Vegas today, along with a host of other announcements cementing Amazon's rising dark-horse status in the world of generative AI. Garman also announced that Tranium 2 will be made generally available in so-called Trn2 UltraServer clusters specialized for training frontier AI.