Generative AI
Diff-Instruct++: Training One-step Text-to-image Generator Model to Align with Human Preferences
One-step text-to-image generator models offer advantages such as swift inference efficiency, flexible architectures, and state-of-the-art generation performance. In this paper, we study the problem of aligning one-step generator models with human preferences for the first time. Inspired by the success of reinforcement learning using human feedback (RLHF), we formulate the alignment problem as maximizing expected human reward functions while adding an Integral Kullback-Leibler divergence term to prevent the generator from diverging. By overcoming technical challenges, we introduce Diff-Instruct++ (DI++), the first, fast-converging and image data-free human preference alignment method for one-step text-to-image generators. We also introduce novel theoretical insights, showing that using CFG for diffusion distillation is secretly doing RLHF with DI++. Such an interesting finding brings understanding and potential contributions to future research involving CFG. In the experiment sections, we align both UNet-based and DiT-based one-step generators using DI++, which use the Stable Diffusion 1.5 and the PixelArt-$\alpha$ as the reference diffusion processes. The resulting DiT-based one-step text-to-image model achieves a strong Aesthetic Score of 6.19 and an Image Reward of 1.24 on the COCO validation prompt dataset. It also achieves a leading Human preference Score (HPSv2.0) of 28.48, outperforming other open-sourced models such as Stable Diffusion XL, DMD2, SD-Turbo, as well as PixelArt-$\alpha$. Both theoretical contributions and empirical evidence indicate that DI++ is a strong human-preference alignment approach for one-step text-to-image models.
The Nature of Mathematical Modeling and Probabilistic Optimization Engineering in Generative AI
In this paper, we give an in-depth analysis on the mathematical problem formulations and the probabilistic optimization explorations for some of the key components in Transformer model [33] in the field of generative AI. We explore and discuss some potential further enhancement for current state of the art methods for some key underlying technologies of generative AI models from algorithmic and probabilistic optimization perspective. In particular, we present an optimal solution for sub-word encoding (SWE) based on similar initial settings as that of byte-pair encoding (BPE) algorithm in [9] with similar objectives as that of WordPiece approach in [28, 31] to maximize the likelihood of the training data. We also present cross entropy optimization method to optimize hyperparameters for word2vec model [17]. In addition, we propose a factored combination of rotary positional encoding (RoPE) [32] and attention with linear biases (ALiBi) [23] with a harmonic series. We also present a probabilistic FlashAttention [6, 7] (PrFlashAttention) method with a probability distribution over block distances in the matrix to decide which block is likely to participate in a given round of attention computation while maintaining the lower triangle shape of the tensor for autoregressive language models by re-shaping the tensors. Finally, we present staircase adaptive quantization (SAQ) of key-value (KV) cache for multi-query attention (MQA) based on the framework presented in [16] to have gradual quantization degradation while achieving reasonable model quality and cost savings.
Google tool makes AI-generated writing easily detectable
Google has been using artificial intelligence watermarking to automatically identify text generated by the company's Gemini chatbot, making it easier to distinguish AI-generated content from human-written posts. That watermark system could help prevent misuse of the AI chatbots for misinformation and disinformation – not to mention cheating in school and business settings. Now, the tech company is making an open-source version of its technique available so that other generative AI developers can similarly watermark the output from their own large language models, says Pushmeet Kohli at Google DeepMind, the company's AI research team, which combines the former Google Brain and DeepMind labs. "While SynthID isn't a silver bullet for identifying AI-generated content, it is an important building block for developing more reliable AI identification tools," he says. Google creates self-replicating life from digital'primordial soup' Independent researchers voiced similar optimism.
Structure Language Models for Protein Conformation Generation
Lu, Jiarui, Chen, Xiaoyin, Lu, Stephen Zhewen, Shi, Chence, Guo, Hongyu, Bengio, Yoshua, Tang, Jian
Proteins adopt multiple structural conformations to perform their diverse biological functions, and understanding these conformations is crucial for advancing drug discovery. Traditional physics-based simulation methods often struggle with sampling equilibrium conformations and are computationally expensive. Recently, deep generative models have shown promise in generating protein conformations as a more efficient alternative. However, these methods predominantly rely on the diffusion process within a 3D geometric space, which typically centers around the vicinity of metastable states and is often inefficient in terms of runtime. In this paper, we introduce Structure Language Modeling (SLM) as a novel framework for efficient protein conformation generation. Specifically, the protein structures are first encoded into a compact latent space using a discrete variational auto-encoder, followed by conditional language modeling that effectively captures sequencespecific conformation distributions. This enables a more efficient and interpretable exploration of diverse ensemble modes compared to existing methods. Based on this general framework, we instantiate SLM with various popular LM architectures as well as proposing the ESMDiff, a novel BERT-like structure language model fine-tuned from ESM3 with masked diffusion. We verify our approach in various scenarios, including the equilibrium dynamics of BPTI, conformational change pairs, and intrinsically disordered proteins. SLM provides a highly efficient solution, offering a 20-100x speedup than existing methods in generating diverse conformations, shedding light on promising avenues for future research. Protein structure dynamics are fundamental to understanding the biological functions of proteins. The ability of proteins to adopt multiple conformations is crucial for their function in influencing interactions with other biomolecules and the environment. Traditional computational methods, such as molecular dynamics (MD) simulations, have long been used to explore these dynamics. However, these methods are computationally expensive and time-consuming.
Aggregated Knowledge Model: Enhancing Domain-Specific QA with Fine-Tuned and Retrieval-Augmented Generation Models
Liu, Fengchen, Jung, Jordan, Feinstein, Wei, DAmbrogia, Jeff, Jung, Gary
This paper introduces a novel approach to enhancing closed-domain Question Answering (QA) systems, focusing on the specific needs of the Lawrence Berkeley National Laboratory (LBL) Science Information Technology (ScienceIT) domain. Utilizing a rich dataset derived from the ScienceIT documentation, our study embarks on a detailed comparison of two fine-tuned large language models and five retrieval-augmented generation (RAG) models. Through data processing techniques, we transform the documentation into structured context-question-answer triples, leveraging the latest Large Language Models (AWS Bedrock, GCP PaLM2, Meta LLaMA2, OpenAI GPT-4, Google Gemini-Pro) for data-driven insights. Additionally, we introduce the Aggregated Knowledge Model (AKM), which synthesizes responses from the seven models mentioned above using K-means clustering to select the most representative answers. The evaluation of these models across multiple metrics offers a comprehensive look into their effectiveness and suitability for the LBL ScienceIT environment. The results demonstrate the potential benefits of integrating fine-tuning and retrieval-augmented strategies, highlighting significant performance improvements achieved with the AKM. The insights gained from this study can be applied to develop specialized QA systems tailored to specific domains.
Benchmarking Floworks against OpenAI & Anthropic: A Novel Framework for Enhanced LLM Function Calling
Bhan, Nirav, Gupta, Shival, Manaswini, Sai, Baba, Ritik, Yadav, Narun, Desai, Hillori, Choudhary, Yash, Pawar, Aman, Shrivastava, Sarthak, Biswas, Sudipta
Large Language Models (LLMs) have shown remarkable capabilities in various domains, yet their economic impact has been limited by challenges in tool use and function calling. This paper introduces ThorV2, a novel architecture that significantly enhances LLMs' function calling abilities. We develop a comprehensive benchmark focused on HubSpot CRM operations to evaluate ThorV2 against leading models from OpenAI and Anthropic. Our results demonstrate that ThorV2 outperforms existing models in accuracy, reliability, latency, and cost efficiency for both single and multi-API calling tasks. We also show that ThorV2 is far more reliable and scales better to multistep tasks compared to traditional models. Our work offers the tantalizing possibility of more accurate function-calling compared to today's best-performing models using significantly smaller LLMs. These advancements have significant implications for the development of more capable AI assistants and the broader application of LLMs in real-world scenarios.
Countering Autonomous Cyber Threats
Heckel, Kade M., Weller, Adrian
With the capability to write convincing and fluent natural language and generate code, Foundation Models present dual-use concerns broadly and within the cyber domain specifically. Generative AI has already begun to impact cyberspace through a broad illicit marketplace for assisting malware development and social engineering attacks through hundreds of malicious-AI-as-a-services tools. More alarming is that recent research has shown the potential for these advanced models to inform or independently execute offensive cyberspace operations. However, these previous investigations primarily focused on the threats posed by proprietary models due to the until recent lack of strong open-weight model and additionally leave the impacts of network defenses or potential countermeasures unexplored. Critically, understanding the aptitude of downloadable models to function as offensive cyber agents is vital given that they are far more difficult to govern and prevent their misuse. As such, this work evaluates several state-of-the-art FMs on their ability to compromise machines in an isolated network and investigates defensive mechanisms to defeat such AI-powered attacks. Using target machines from a commercial provider, the most recently released downloadable models are found to be on par with a leading proprietary model at conducting simple cyber attacks with common hacking tools against known vulnerabilities. To mitigate such LLM-powered threats, defensive prompt injection (DPI) payloads for disrupting the malicious cyber agent's workflow are demonstrated to be effective. From these results, the implications for AI safety and governance with respect to cybersecurity is analyzed.
Collaborative AI in Sentiment Analysis: System Architecture, Data Prediction and Deployment Strategies
Zhang, Chaofeng, Hou, Jia, Tan, Xueting, Li, Gaolei, Chen, Caijuan
The advancement of large language model (LLM) based artificial intelligence technologies has been a game-changer, particularly in sentiment analysis. This progress has enabled a shift from highly specialized research environments to practical, widespread applications within the industry. However, integrating diverse AI models for processing complex multimodal data and the associated high costs of feature extraction presents significant challenges. Motivated by the marketing oriented software development +needs, our study introduces a collaborative AI framework designed to efficiently distribute and resolve tasks across various AI systems to address these issues. Initially, we elucidate the key solutions derived from our development process, highlighting the role of generative AI models like \emph{chatgpt}, \emph{google gemini} in simplifying intricate sentiment analysis tasks into manageable, phased objectives. Furthermore, we present a detailed case study utilizing our collaborative AI system in edge and cloud, showcasing its effectiveness in analyzing sentiments across diverse online media channels.
OpenAI and Microsoft are funding 10 million in grants for AI-powered journalism
OpenAI and Microsoft are funding projects to bring more AI tools into the newsroom. The duo will give grants of up to 10 million to Chicago Public Media, the Minnesota Star Tribune, Newsday (in Long Island, NY), The Philadelphia Inquirer and The Seattle Times. Each of the publications will hire a two-year AI fellow to develop projects for implementing the technology and improving business sustainability. Three more outlets are expected to receive fellowship grants in a second round. OpenAI and Microsoft are each contributing 2.5 million in direct funding as well as 2.5 million in software and enterprise credits.
Thom Yorke and Julianne Moore join thousands of creatives in AI warning
Abba's Björn Ulvaeus, the actor Julianne Moore, the Radiohead singer Thom Yorke are among 10,500 signatories of a statement from the creative industries warning artificial intelligence companies that unlicensed use of their work is a "major, unjust threat" to artists' livelihoods. "The unlicensed use of creative works for training generative AI is a major, unjust threat to the livelihoods of the people behind those works, and must not be permitted," reads the statement. Thousands of creative professionals from the worlds of literature, music, film, theatre and television have given their backing to the statement, with authors including Kazuo Ishiguro, Ann Patchett, and Kate Mosse, musicians including the Cure's Robert Smith as well as the composer Max Richter and actors including Kevin Bacon, Rosario Dawson and F Murray Abraham. The organiser of the letter, the British composer and former AI executive Ed Newton-Rex, said people who make a living from creative work are "very worried" about the situation. "There are three key resources that generative AI companies need to build AI models: people, compute, and data. They spend vast sums on the first two – sometimes a million dollars per engineer, and up to a billion dollars per model. But they expect to take the third – training data – for free," he said.