Generative AI
Self-Play Fine-Tuning of Diffusion Models for Text-to-Image Generation
Yuan, Huizhuo, Chen, Zixiang, Ji, Kaixuan, Gu, Quanquan
Fine-tuning Diffusion Models remains an underexplored frontier in generative artificial intelligence (GenAI), especially when compared with the remarkable progress made in fine-tuning Large Language Models (LLMs). While cutting-edge diffusion models such as Stable Diffusion (SD) and SDXL rely on supervised fine-tuning, their performance inevitably plateaus after seeing a certain volume of data. Recently, reinforcement learning (RL) has been employed to fine-tune diffusion models with human preference data, but it requires at least two images ("winner" and "loser" images) for each text prompt. In this paper, we introduce an innovative technique called self-play fine-tuning for diffusion models (SPIN-Diffusion), where the diffusion model engages in competition with its earlier versions, facilitating an iterative self-improvement process. Our approach offers an alternative to conventional supervised fine-tuning and RL strategies, significantly improving both model performance and alignment. Our experiments on the Pick-a-Pic dataset reveal that SPIN-Diffusion outperforms the existing supervised fine-tuning method in aspects of human preference alignment and visual appeal right from its first iteration. By the second iteration, it exceeds the performance of RLHF-based methods across all metrics, achieving these results with less data.
Generative AI in the Construction Industry: A State-of-the-art Analysis
Taiwo, Ridwan, Bello, Idris Temitope, Abdulai, Sulemana Fatoama, Yussif, Abdul-Mugis, Salami, Babatunde Abiodun, Saka, Abdullahi, Zayed, Tarek
The construction industry is a vital sector of the global economy, but it faces many productivity challenges in various processes, such as design, planning, procurement, inspection, and maintenance. Generative artificial intelligence (AI), which can create novel and realistic data or content, such as text, image, video, or code, based on some input or prior knowledge, offers innovative and disruptive solutions to address these challenges. However, there is a gap in the literature on the current state, opportunities, and challenges of generative AI in the construction industry. This study aims to fill this gap by providing a state-of-the-art analysis of generative AI in construction, with three objectives: (1) to review and categorize the existing and emerging generative AI opportunities and challenges in the construction industry; (2) to propose a framework for construction firms to build customized generative AI solutions using their own data, comprising steps such as data collection, dataset curation, training custom large language model (LLM), model evaluation, and deployment; and (3) to demonstrate the framework via a case study of developing a generative model for querying contract documents. The results show that retrieval augmented generation (RAG) improves the baseline LLM by 5.2, 9.4, and 4.8% in terms of quality, relevance, and reproducibility. This study provides academics and construction professionals with a comprehensive analysis and practical framework to guide the adoption of generative AI techniques to enhance productivity, quality, safety, and sustainability across the construction industry.
Not Just Novelty: A Longitudinal Study on Utility and Customization of AI Workflows
Long, Tao, Gero, Katy Ilonka, Chilton, Lydia B.
Generative AI brings novel and impressive abilities to help people in everyday tasks. There are many AI workflows that solve real and complex problems by chaining AI outputs together with human interaction. Although there is an undeniable lure of AI, it's uncertain how useful generative AI workflows are after the novelty wears off. Additionally, tools built with generative AI have the potential to be personalized and adapted quickly and easily, but do users take advantage of the potential to customize? We conducted a three-week longitudinal study with 12 users to understand the familiarization and customization of generative AI tools for science communication. Our study revealed that the familiarization phase lasts for 4.3 sessions, where users explore the capabilities of the workflow and which aspects they find useful. After familiarization, the perceived utility of the system is rated higher than before, indicating that the perceived utility of AI is not just a novelty effect. The increase in benefits mainly comes from end-users' ability to customize prompts, and thus appropriate the system to their own needs. This points to a future where generative AI systems can allow us to design for appropriation.
Inadequacies of Large Language Model Benchmarks in the Era of Generative Artificial Intelligence
McIntosh, Timothy R., Susnjak, Teo, Liu, Tong, Watters, Paul, Halgamuge, Malka N.
The rapid rise in popularity of Large Language Models (LLMs) with emerging capabilities has spurred public curiosity to evaluate and compare different LLMs, leading many researchers to propose their LLM benchmarks. Noticing preliminary inadequacies in those benchmarks, we embarked on a study to critically assess 23 state-of-the-art LLM benchmarks, using our novel unified evaluation framework through the lenses of people, process, and technology, under the pillars of functionality and security. Our research uncovered significant limitations, including biases, difficulties in measuring genuine reasoning, adaptability, implementation inconsistencies, prompt engineering complexity, evaluator diversity, and the overlooking of cultural and ideological norms in one comprehensive assessment. Our discussions emphasized the urgent need for standardized methodologies, regulatory certainties, and ethical guidelines in light of Artificial Intelligence (AI) advancements, including advocating for an evolution from static benchmarks to dynamic behavioral profiling to accurately capture LLMs' complex behaviors and potential risks. Our study highlighted the necessity for a paradigm shift in LLM evaluation methodologies, underlining the importance of collaborative efforts for the development of universally accepted benchmarks and the enhancement of AI systems' integration into society.
AbuseGPT: Abuse of Generative AI ChatBots to Create Smishing Campaigns
Shibli, Ashfak Md, Pritom, Mir Mehedi A., Gupta, Maanak
SMS phishing, also known as "smishing", is a growing threat that tricks users into disclosing private information or clicking into URLs with malicious content through fraudulent mobile text messages. In recent past, we have also observed a rapid advancement of conversational generative AI chatbot services (e.g., OpenAI's ChatGPT, Google's BARD), which are powered by pre-trained large language models (LLMs). These AI chatbots certainly have a lot of utilities but it is not systematically understood how they can play a role in creating threats and attacks. In this paper, we propose AbuseGPT method to show how the existing generative AI-based chatbot services can be exploited by attackers in real world to create smishing texts and eventually lead to craftier smishing campaigns. To the best of our knowledge, there is no pre-existing work that evidently shows the impacts of these generative text-based models on creating SMS phishing. Thus, we believe this study is the first of its kind to shed light on this emerging cybersecurity threat. We have found strong empirical evidences to show that attackers can exploit ethical standards in the existing generative AI-based chatbot services by crafting prompt injection attacks to create newer smishing campaigns. We also discuss some future research directions and guidelines to protect the abuse of generative AI-based services and safeguard users from smishing attacks.
North Korea and Iran using AI for hacking, Microsoft says
US adversaries โ chiefly Iran and North Korea, and to a lesser extent Russia and China โ are beginning to use generative artificial intelligence to mount or organize offensive cyber operations, Microsoft said on Wednesday. Microsoft said it detected and disrupted, in collaboration with business partner OpenAI, many threats that used or attempted to exploit AI technology they had developed. In a blogpost, the company said the techniques were "early-stage" and neither "particularly novel or unique" but that it was important to expose them publicly as US rivals leveraging large-language models to expand their ability to breach networks and conduct influence operations. Cybersecurity firms have long used machine-learning on defense, principally to detect anomalous behavior in networks. But criminals and offensive hackers use it as well, and the introduction of large-language models led by OpenAI's ChatGPT upped that game of cat-and-mouse.
Russian and North Korean hackers used OpenAI tools to hone cyberattacks
Microsoft and OpenAI say that several state-backed hacking groups are using the latter's generative AI (GAI) tools to bolster cyberattacks. The pair suggests that new research details for the first time how hackers linked to foreign governments are making use of GAI. The groups in question have ties to China, Russia, North Korea and Iran. According to the companies, the state actors are using GAI for code debugging, looking up open-source information to research targets, developing social engineering techniques, drafting phishing emails and translating text. OpenAI (which powers Microsoft GAI products such as Copilot) says it shut down the groups' access to its GAI systems after finding out they were using its tools.
Slack's new generative AI features include thread summaries and conversational search
Slack has finally unleashed its generative AI toolset on the world, after teasing it last year. The vast majority of these features look to simplify your day-to-day life when using the work-focused chat platform. First up, the AI will auto-generate channel recaps to give you key highlights of anything you missed while away from the keyboard or smartphone. Slack says the algorithm that generates these recaps is smart enough to separate the content from the various topics discussed. In other words, if your co-workers launched into a debate about coffee beans and also talked about third-quarter earnings or whatever, you should get a paragraph on both.
Exploring a Behavioral Model of "Positive Friction" in Human-AI Interaction
Designing seamless, frictionless user experiences has long been a dominant trend in both applied behavioral science and artificial intelligence (AI), in which the goal of making desirable actions easy and efficient informs efforts to minimize friction in user experiences. However, in some settings, friction can be genuinely beneficial, such as the insertion of deliberate delays to increase reflection, preventing individuals from resorting to automatic or biased behaviors, and enhancing opportunities for unexpected discoveries. More recently, the popularization and availability of AI on a widespread scale has only increased the need to examine how friction can help or hinder users of AI; it also suggests a need to consider how positive friction can benefit AI practitioners, both during development processes (e.g., working with diverse teams) and to inform how AI is designed into offerings. This paper first proposes a "positive friction" model that can help characterize how friction is currently beneficial in user and developer experiences with AI, diagnose the potential need for friction where it may not yet exist in these contexts, and inform how positive friction can be used to generate solutions, especially as advances in AI continue to be progress and new opportunities emerge. It then explores this model in the context of AI users and developers by proposing the value of taking a hybrid "AI+human" lens, and concludes by suggesting questions for further exploration.
L3GO: Language Agents with Chain-of-3D-Thoughts for Generating Unconventional Objects
Yamada, Yutaro, Chandu, Khyathi, Lin, Yuchen, Hessel, Jack, Yildirim, Ilker, Choi, Yejin
Diffusion-based image generation models such as DALL-E 3 and Stable Diffusion-XL demonstrate remarkable capabilities in generating images with realistic and unique compositions. Yet, these models are not robust in precisely reasoning about physical and spatial configurations of objects, especially when instructed with unconventional, thereby out-of-distribution descriptions, such as "a chair with five legs". In this paper, we propose a language agent with chain-of-3D-thoughts (L3GO), an inference-time approach that can reason about part-based 3D mesh generation of unconventional objects that current data-driven diffusion models struggle with. More concretely, we use large language models as agents to compose a desired object via trial-and-error within the 3D simulation environment. To facilitate our investigation, we develop a new benchmark, Unconventionally Feasible Objects (UFO), as well as SimpleBlenv, a wrapper environment built on top of Blender where language agents can build and compose atomic building blocks via API calls. Human and automatic GPT-4V evaluations show that our approach surpasses the standard GPT-4 and other language agents (e.g., ReAct and Reflexion) for 3D mesh generation on ShapeNet. Moreover, when tested on our UFO benchmark, our approach outperforms other state-of-the-art text-to-2D image and text-to-3D models based on human evaluation.