Goto

Collaborating Authors

 Generative AI


LLM2FEA: Discover Novel Designs with Generative Evolutionary Multitasking

arXiv.org Artificial Intelligence

The rapid research and development of generative artificial intelligence has enabled the generation of high-quality images, text, and 3D models from text prompts. This advancement impels an inquiry into whether these models can be leveraged to create digital artifacts for both creative and engineering applications. Drawing on innovative designs from other domains may be one answer to this question, much like the historical practice of ``bionics", where humans have sought inspiration from nature's exemplary designs. This raises the intriguing possibility of using generative models to simultaneously tackle design tasks across multiple domains, facilitating cross-domain learning and resulting in a series of innovative design solutions. In this paper, we propose LLM2FEA as the first attempt to discover novel designs in generative models by transferring knowledge across multiple domains. By utilizing a multi-factorial evolutionary algorithm (MFEA) to drive a large language model, LLM2FEA integrates knowledge from various fields to generate prompts that guide the generative model in discovering novel and practical objects. Experimental results in the context of 3D aerodynamic design verify the discovery capabilities of the proposed LLM2FEA. The designs generated by LLM2FEA not only satisfy practicality requirements to a certain degree but also feature novel and aesthetically pleasing shapes, demonstrating the potential applications of LLM2FEA in discovery tasks.


GenAI-Bench: Evaluating and Improving Compositional Text-to-Visual Generation

arXiv.org Artificial Intelligence

While text-to-visual models now produce photo-realistic images and videos, they struggle with compositional text prompts involving attributes, relationships, and higher-order reasoning such as logic and comparison. In this work, we conduct an extensive human study on GenAI-Bench to evaluate the performance of leading image and video generation models in various aspects of compositional text-to-visual generation. We also compare automated evaluation metrics against our collected human ratings and find that VQAScore -- a metric measuring the likelihood that a VQA model views an image as accurately depicting the prompt -- significantly outperforms previous metrics such as CLIPScore. In addition, VQAScore can improve generation in a black-box manner (without finetuning) via simply ranking a few (3 to 9) candidate images. Ranking by VQAScore is 2x to 3x more effective than other scoring methods like PickScore, HPSv2, and ImageReward at improving human alignment ratings for DALL-E 3 and Stable Diffusion, especially on compositional prompts that require advanced visio-linguistic reasoning. We will release a new GenAI-Rank benchmark with over 40,000 human ratings to evaluate scoring metrics on ranking images generated from the same prompt. Lastly, we discuss promising areas for improvement in VQAScore, such as addressing fine-grained visual details. We will release all human ratings (over 80,000) to facilitate scientific benchmarking of both generative models and automated metrics.


Generative AI Misuse: A Taxonomy of Tactics and Insights from Real-World Data

arXiv.org Artificial Intelligence

Generative, multimodal artificial intelligence (GenAI) offers transformative potential across industries, but its misuse poses significant risks. Prior research has shed light on the potential of advanced AI systems to be exploited for malicious purposes. However, we still lack a concrete understanding of how GenAI models are specifically exploited or abused in practice, including the tactics employed to inflict harm. In this paper, we present a taxonomy of GenAI misuse tactics, informed by existing academic literature and a qualitative analysis of approximately 200 observed incidents of misuse reported between January 2023 and March 2024. Through this analysis, we illuminate key and novel patterns in misuse during this time period, including potential motivations, strategies, and how attackers leverage and abuse system capabilities across modalities (e.g. image, text, audio, video) in the wild.


Battling Botpoop using GenAI for Higher Education: A Study of a Retrieval Augmented Generation Chatbots Impact on Learning

arXiv.org Artificial Intelligence

Generative artificial intelligence (GenAI) and large language models (LLMs) have simultaneously opened new avenues for enhancing human learning and increased the prevalence of poor-quality information in student response - termed'Botpoop'. This study introduces Professor Leodar, a custom-built, Singlish-speaking Retrieval Augmented Generation (RAG) chatbot designed to enhance educational while reducing Botpoop. Deployed at Nanyang Technological University, Singapore, Professor Leodar offers a glimpse into the future of AI-assisted learning, offering personalized guidance, 24/7 availability, and contextually relevant information. Through a mixed-methods approach, we examine the impact of Professor Leodar on learning, engagement, and exam preparedness, with 97.1% of participants reporting positive experiences. These findings help define possible roles of AI in education and highlight the potential of custom GenAI chatbots. Our combination of chatbot development, in-class deployment and outcomes study offers a benchmark for GenAI educational tools and is a stepping stone for redefining the interplay between AI and human learning.


Anthropic's newest Claude chatbot beats OpenAI's GPT-4o in some benchmarks

Engadget

Anthropic rolled out its newest AI language model on Thursday, Claude 3.5 Sonnet. The updated chatbot outperforms the company's previous top-tier model, Claude 3 Opus, while working at twice the speed. Claude users (including those on free accounts) can check it out beginning today. Sonnet, which tends to be Anthropic's most balanced model, is the first release in the Claude 3.5 family. The company says Claude 3.5 Haiku (the fastest in each generation) and Claude 3.5 Opus (the most powerful) will arrive later this year.


We're Still Waiting for the Next Big Leap in AI

WIRED

When OpenAI announced GPT-4, its latest large language model, last March, it sent shockwaves through the tech world. It was clearly more capable than anything seen before at chatting, coding, and solving all sorts of thorny problems--including school homework. Anthropic, a rival to OpenAI, announced today that it has made its own AI advance that will upgrade chatbots and other use cases. But although the new model is the world's best by some measures, it's more of a step forward than a big leap. Anthropic's new model, called Claude 3.5 Sonnet, is an upgrade to its existing Claude 3 family of AI models.


How generative AI could reinvent what it means to play

MIT Technology Review

After a while, however, the repetitive chitchat (or threats) of a passing stranger forces you to bump up against the truth: This is just a game. It's still fun--I had a whale of a time, honestly, looting stagecoaches, fighting in bar brawls, and stalking deer through rainy woods--but the illusion starts to weaken when you poke at it. Video games are carefully crafted objects, part of a multibillion-dollar industry, that are designed to be consumed. You play them, you loot a few stagecoaches, you finish, you move on. It may not always be like that.


CollaFuse: Collaborative Diffusion Models

arXiv.org Artificial Intelligence

In the landscape of generative artificial intelligence, diffusion-based models have emerged as a promising method for generating synthetic images. However, the application of diffusion models poses numerous challenges, particularly concerning data availability, computational requirements, and privacy. Traditional approaches to address these shortcomings, like federated learning, often impose significant computational burdens on individual clients, especially those with constrained resources. In response to these challenges, we introduce a novel approach for distributed collaborative diffusion models inspired by split learning. Our approach facilitates collaborative training of diffusion models while alleviating client computational burdens during image synthesis. This reduced computational burden is achieved by retaining data and computationally inexpensive processes locally at each client while outsourcing the computationally expensive processes to shared, more efficient server resources. Through experiments on the common CelebA dataset, our approach demonstrates enhanced privacy by reducing the necessity for sharing raw data. These capabilities hold significant potential across various application areas, including the design of edge computing solutions. Thus, our work advances distributed machine learning by contributing to the evolution of collaborative diffusion models.


How critically can an AI think? A framework for evaluating the quality of thinking of generative artificial intelligence

arXiv.org Artificial Intelligence

Generative AI such as those with large language models have created opportunities for innovative assessment design practices. Due to recent technological developments, there is a need to know the limits and capabilities of generative AI in terms of simulating cognitive skills. Assessing student critical thinking skills has been a feature of assessment for time immemorial, but the demands of digital assessment create unique challenges for equity, academic integrity and assessment authorship. Educators need a framework for determining their assessments vulnerability to generative AI to inform assessment design practices. This paper presents a framework that explores the capabilities of the LLM ChatGPT4 application, which is the current industry benchmark. This paper presents the Mapping of questions, AI vulnerability testing, Grading, Evaluation (MAGE) framework to methodically critique their assessments within their own disciplinary contexts. This critique will provide specific and targeted indications of their questions vulnerabilities in terms of the critical thinking skills. This can go on to form the basis of assessment design for their tasks.


A Large Language Model Outperforms Other Computational Approaches to the High-Throughput Phenotyping of Physician Notes

arXiv.org Artificial Intelligence

High-throughput phenotyping, the automated mapping of patient signs and symptoms to standardized ontology concepts, is essential to gaining value from electronic health records (EHR) in the support of precision medicine. Despite technological advances, high-throughput phenotyping remains a challenge. This study compares three computational approaches to high-throughput phenotyping: a Large Language Model (LLM) incorporating generative AI, a Natural Language Processing (NLP) approach utilizing deep learning for span categorization, and a hybrid approach combining word vectors with machine learning. The approach that implemented GPT-4 (a Large Language Model) demonstrated superior performance, suggesting that Large Language Models are poised to be the preferred method for high-throughput phenotyping of physician notes.