Goto

Collaborating Authors

 Generative AI


Trustworthy Large Models in Vision: A Survey

arXiv.org Artificial Intelligence

The rapid progress of Large Models (LMs) has recently revolutionized various fields of deep learning with remarkable grades, ranging from Natural Language Processing (NLP) to Computer Vision (CV). However, LMs are increasingly challenged and criticized by academia and industry due to their powerful performance but untrustworthy behavior, which urgently needs to be alleviated by reliable methods. Despite the abundance of literature on trustworthy LMs in NLP, a systematic survey specifically delving into the trustworthiness of LMs in CV remains absent. In order to mitigate this gap, we summarize four relevant concerns that obstruct the trustworthy usage in vision of LMs in this survey, including 1) human misuse, 2) vulnerability, 3) inherent issue and 4) interpretability. By highlighting corresponding challenge, countermeasures, and discussion in each topic, we hope this survey will facilitate readers' understanding of this field, promote alignment of LMs with human expectations and enable trustworthy LMs to serve as welfare rather than disaster for human society.


SELF: Self-Evolution with Language Feedback

arXiv.org Artificial Intelligence

Large Language Models (LLMs) have shown impressive adaptability in various fields, yet the optimal pathway of autonomous model evolution remains underexplored. Drawing inspiration from the self-driven learning process of humans, we introduce SELF (Self-Evolution with Language Feedback), a novel learning framework that empowers LLMs to continually self-improve their abilities. SELF initiates with a meta-skill learning process that equips the LLMs with capabilities for self-feedback and self-refinement. SELF employs language-based feedback for detailed and nuanced evaluations, pinpointing response flaws and suggesting refinements. Subsequently, the model engages in an iterative process of self-evolution: they autonomously generate responses to unlabeled instructions, refine these responses interactively, and use the refined and filtered data for iterative self-training, thereby progressively boosting their capabilities. Moreover, the SELF framework equips the model with the ability to self-refine during inference, leading to further improved response quality. Our experiments on mathematical and general tasks demonstrate that SELF enables the model to continually selfimprove without human intervention. The SELF framework indicates a promising direction for the autonomous evolution of LLMs, transitioning them from passive information receivers to active participants in their development. Large Language Models (LLMs), like ChatGPT (OpenAI, 2022) and GPT-4 (OpenAI, 2023), stand at the forefront of the AI revolution, demonstrating versatility across tasks. Despite their evident capabilities, the way towards achieving autonomous development of LLMs is still under-explored. The development of automatically improved LLMs can draw inspiration from human self-driven learning mechanisms. When facing new challenges, humans naturally engage in a learning cycle of initial attempts, introspective feedback, and behavior refinement. This leads to a critical question: "Can LLMs mimic the human learning process, utilizing self-refinement to enhance their inherent capabilities?"


LLM Voting: Human Choices and AI Collective Decision Making

arXiv.org Artificial Intelligence

This paper investigates the voting behaviors of Large Language Models (LLMs), particularly OpenAI's GPT4 and LLaMA2, and their alignment with human voting patterns. Our approach included a human voting experiment to establish a baseline for human preferences and a parallel experiment with LLM agents. The study focused on both collective outcomes and individual preferences, revealing differences in decision-making and inherent biases between humans and LLMs. We observed a trade-off between preference diversity and alignment in LLMs, with a tendency towards more uniform choices as compared to the diverse preferences of human voters. This finding indicates that LLMs could lead to more homogenized collective outcomes when used in voting assistance, underscoring the need for cautious integration of LLMs into democratic processes.


The whack-a-mole governance challenge for AI-enabled synthetic biology: literature review and emerging frameworks

arXiv.org Artificial Intelligence

AI-enabled synthetic biology has tremendous potential but also significantly increases biorisks and brings about a new set of dual use concerns. The picture is complicated given the vast innovations envisioned to emerge by combining emerging technologies, as AI-enabled synthetic biology potentially scales up bioengineering into industrial biomanufacturing. However, the literature review indicates that goals such as maintaining a reasonable scope for innovation, or more ambitiously to foster a huge bioeconomy don't necessarily contrast with biosafety, but need to go hand in hand. This paper presents a literature review of the issues and describes emerging frameworks for policy and practice that transverse the options of command-and control, stewardship, bottom-up, and laissez-faire governance. How to achieve early warning systems that enable prevention and mitigation of future AI-enabled biohazards from the lab, from deliberate misuse, or from the public realm, will constantly need to evolve, and adaptive, interactive approaches should emerge. Although biorisk is subject to an established governance regime, and scientists generally adhere to biosafety protocols, even experimental, but legitimate use by scientists could lead to unexpected developments. Recent advances in chatbots enabled by generative AI have revived fears that advanced biological insight can more easily get into the hands of malignant individuals or organizations. Given these sets of issues, society needs to rethink how AI-enabled synthetic biology should be governed. The suggested way to visualize the challenge at hand is whack-a-mole governance, although the emerging solutions are perhaps not so different either.


Are Generative AI systems Capable of Supporting Information Needs of Patients?

arXiv.org Artificial Intelligence

Patients managing a complex illness such as cancer face a complex information challenge where they not only must learn about their illness but also how to manage it. Close interaction with healthcare experts (radiologists, oncologists) can improve patient learning and thereby, their disease outcome. However, this approach is resource intensive and takes expert time away from other critical tasks. Given the recent advancements in Generative AI models aimed at improving the healthcare system, our work investigates whether and how generative visual question answering systems can responsibly support patient information needs in the context of radiology imaging data. We conducted a formative need-finding study in which participants discussed chest computed tomography (CT) scans and associated radiology reports of a fictitious close relative with a cardiothoracic radiologist. Using thematic analysis of the conversation between participants and medical experts, we identified commonly occurring themes across interactions, including clarifying medical terminology, locating the problems mentioned in the report in the scanned image, understanding disease prognosis, discussing the next diagnostic steps, and comparing treatment options. Based on these themes, we evaluated two state-of-the-art generative visual language models against the radiologist's responses. Our results reveal variability in the quality of responses generated by the models across various themes. We highlight the importance of patient-facing generative AI systems to accommodate a diverse range of conversational themes, catering to the real-world informational needs of patients.


SCAPE: Searching Conceptual Architecture Prompts using Evolution

arXiv.org Artificial Intelligence

Conceptual architecture involves a highly creative exploration of novel ideas, often taken from other disciplines as architects consider radical new forms, materials, textures and colors for buildings. While today's generative AI systems can produce remarkable results, they lack the creativity demonstrated for decades by evolutionary algorithms. SCAPE, our proposed tool, combines evolutionary search with generative AI, enabling users to explore creative and good quality designs inspired by their initial input through a simple point and click interface. SCAPE injects randomness into generative AI, and enables memory, making use of the built-in language skills of GPT-4 to vary prompts via text-based mutation and crossover. We demonstrate that compared to DALL-E 3, SCAPE enables a 67% improvement in image novelty, plus improvements in quality and effectiveness of use; we show that in just 3 iterations SCAPE has a 24% image novelty increase enabling effective exploration, plus optimization of images by users. We use more than 20 independent architects to assess SCAPE, who provide markedly positive feedback.


Generative AI to Generate Test Data Generators

arXiv.org Artificial Intelligence

Generating fake data is an essential dimension of modern software testing, as demonstrated by the number and significance of data faking libraries. Yet, developers of faking libraries cannot keep up with the wide range of data to be generated for different natural languages and domains. In this paper, we assess the ability of generative AI for generating test data in different domains. We design three types of prompts for Large Language Models (LLMs), which perform test data generation tasks at different levels of integrability: 1) raw test data generation, 2) synthesizing programs in a specific language that generate useful test data, and 3) producing programs that use state-of-the-art faker libraries. We evaluate our approach by prompting LLMs to generate test data for 11 domains. The results show that LLMs can successfully generate realistic test data generators in a wide range of domains at all three levels of integrability.


Multilingual Text-to-Image Generation Magnifies Gender Stereotypes and Prompt Engineering May Not Help You

arXiv.org Artificial Intelligence

Text-to-image generation models have recently achieved astonishing results in image quality, flexibility, and text alignment and are consequently employed in a fast-growing number of applications. Through improvements in multilingual abilities, a larger community now has access to this kind of technology. Yet, as we will show, multilingual models suffer similarly from (gender) biases as monolingual models. Furthermore, the natural expectation is that these models will provide similar results across languages, but this is not the case and there are important differences between languages. Thus, we propose a novel benchmark MAGBIG intending to foster research in multilingual models without gender bias. We investigate whether multilingual T2I models magnify gender bias with MAGBIG. To this end, we use multilingual prompts requesting portrait images of persons of a certain occupation or trait (using adjectives). Our results show not only that models deviate from the normative assumption that each gender should be equally likely to be generated, but that there are also big differences across languages. Furthermore, we investigate prompt engineering strategies, i.e. the use of indirect, neutral formulations, as a possible remedy for these biases. Unfortunately, they help only to a limited extent and result in worse text-to-image alignment. Consequently, this work calls for more research into diverse representations across languages in image generators.


Microsoft's legal department allegedly silenced an engineer who raised concerns about DALL-E 3

Engadget

A Microsoft manager claims OpenAI's DALL-E 3 has security vulnerabilities that could allow users to generate violent or explicit images (similar to those that recently targeted Taylor Swift). GeekWire reported Tuesday the company's legal team blocked Microsoft engineering leader Shane Jones' attempts to alert the public about the exploit. The self-described whistleblower is now taking his message to Capitol Hill. "I reached the conclusion that DALL·E 3 posed a public safety risk and should be removed from public use until OpenAI could address the risks associated with this model," Jones wrote to US Senators Patty Murray (D-WA) and Maria Cantwell (D-WA), Rep. Adam Smith (D-WA 9th District), and Washington state Attorney General Bob Ferguson (D). GeekWire published Jones' full letter. Jones claims he discovered an exploit allowing him to bypass DALL-E 3's security guardrails in early December.


Hulu Shows Jarring Anti-Hamas Ad Likely Generated With AI

WIRED

Hulu ran an anti-Hamas ad that appears to be made using artificial intelligence to show an idealized version of Gaza--claiming this paradise destination could exist if not for Hamas. The 30-second spot, opening like a tourism ad, shows palm trees and coastlines. There are five-star hotels and children playing. People dance, eat, and laugh, while a voiceover encourages visitors to "experience a culture rich in tradition." But it suddenly shifts, turning the face of a smiling man into a grimacing one.