Goto

Collaborating Authors

 Generative AI


ChatGPT's refusal to acknowledge 'David Mayer' down to glitch, says OpenAI

The Guardian

Last weekend the name was all over the internet โ€“ just not on ChatGPT. David Mayer became famous for a moment on social media because the popular chatbot appeared to want nothing to do with him. Legions of chatbot wranglers spent days trying โ€“ and failing โ€“ to make ChatGPT write the words "David Mayer". But the chatbot refused to comply, with replies alternating between "something seems to have gone wrong" to "I'm unable to produce a response" or just stopping at "David". This produced a blizzard of online speculation about Mayer's identity.


Japanese firms begin adopting generative AI for information searches

The Japan Times

With generative artificial intelligence gaining rapid adoption worldwide, private-sector businesses in Japan are starting to embrace related information search technologies to enhance operational efficiency. Conventional information searches require users to enter relevant keywords and manually browse selected websites to locate the desired information. The process can be time-consuming, however, and users may not always find the exact content they need. By contrast, generative AI searches enable users to input queries using natural language or images. After interpreting the user's wishes, the AI retrieves relevant information from websites and other sources, providing concise, natural-sounding answers.


Dynamic Prompt Middleware: Contextual Prompt Refinement Controls for Comprehension Tasks

arXiv.org Artificial Intelligence

Effective prompting of generative AI is challenging for many users, particularly in expressing context for comprehension tasks such as explaining spreadsheet formulas, Python code, and text passages. Prompt middleware aims to address this barrier by assisting in prompt construction, but barriers remain for users in expressing adequate control so that they can receive AI-responses that match their preferences. We conduct a formative survey (n=38) investigating user needs for control over AI-generated explanations in comprehension tasks, which uncovers a trade-off between standardized but predictable support for prompting, and adaptive but unpredictable support tailored to the user and task. To explore this trade-off, we implement two prompt middleware approaches: Dynamic Prompt Refinement Control (Dynamic PRC) and Static Prompt Refinement Control (Static PRC). The Dynamic PRC approach generates context-specific UI elements that provide prompt refinements based on the user's prompt and user needs from the AI, while the Static PRC approach offers a preset list of generally applicable refinements. We evaluate these two approaches with a controlled user study (n=16) to assess the impact of these approaches on user control of AI responses for crafting better explanations. Results show a preference for the Dynamic PRC approach as it afforded more control, lowered barriers to providing context, and encouraged exploration and reflection of the tasks, but that reasoning about the effects of different generated controls on the final output remains challenging. Drawing on participant feedback, we discuss design implications for future Dynamic PRC systems that enhance user control of AI responses. Our findings suggest that dynamic prompt middleware can improve the user experience of generative AI workflows by affording greater control and guide users to a better AI response.


ScImage: How Good Are Multimodal Large Language Models at Scientific Text-to-Image Generation?

arXiv.org Artificial Intelligence

Multimodal large language models (LLMs) have demonstrated impressive capabilities in generating high-quality images from textual instructions. However, their performance in generating scientific images--a critical application for accelerating scientific progress--remains underexplored. In this work, we address this gap by introducing ScImage, a benchmark designed to evaluate the multimodal capabilities of LLMs in generating scientific images from textual descriptions. ScImage assesses three key dimensions of understanding: spatial, numeric, and attribute comprehension, as well as their combinations, focusing on the relationships between scientific objects (e.g., squares, circles). We evaluate five models, GPT-4o, Llama, AutomaTikZ, Dall-E, and StableDiffusion, using two modes of output generation: code-based outputs (Python, TikZ) and direct raster image generation. Additionally, we examine four different input languages: English, German, Farsi, and Chinese. Our evaluation, conducted with 11 scientists across three criteria (correctness, relevance, and scientific accuracy), reveals that while GPT-4o produces outputs of decent quality for simpler prompts involving individual dimensions such as spatial, numeric, or attribute understanding in isolation, all models face challenges in this task, especially for more complex prompts.


GerPS-Compare: Comparing NER methods for legal norm analysis

arXiv.org Artificial Intelligence

We apply NER to a particular sub-genre of legal texts in German: the genre of legal norms regulating administrative processes in public service administration. The analysis of such texts involves identifying stretches of text that instantiate one of ten classes identified by public service administration professionals. We investigate and compare three methods for performing Named Entity Recognition (NER) to detect these classes: a Rule-based system, deep discriminative models, and a deep generative model. Our results show that Deep Discriminative models outperform both the Rule-based system as well as the Deep Generative model, the latter two roughly performing equally well, outperforming each other in different classes. The main cause for this somewhat surprising result is arguably the fact that the classes used in the analysis are semantically and syntactically heterogeneous, in contrast to the classes used in more standard NER tasks. Deep Discriminative models appear to be better equipped for dealing with this heterogenerity than both generic LLMs and human linguists designing rule-based NER systems.


Scaffold or Crutch? Examining College Students' Use and Views of Generative AI Tools for STEM Education

arXiv.org Artificial Intelligence

Developing problem-solving competency is central to Science, Technology, Engineering, and Mathematics (STEM) education, yet translating this priority into effective approaches to problem-solving instruction and assessment remain a significant challenge. The recent proliferation of generative artificial intelligence (genAI) tools like ChatGPT in higher education introduces new considerations about how these tools can help or hinder students' development of STEM problem-solving competency. Our research examines these considerations by studying how and why college students use genAI tools in their STEM coursework, focusing on their problem-solving support. We surveyed 40 STEM college students from diverse U.S. institutions and 28 STEM faculty to understand instructor perspectives on effective genAI tool use and guidance in STEM courses. Our findings reveal high adoption rates and diverse applications of genAI tools among STEM students. The most common use cases include finding explanations, exploring related topics, summarizing readings, and helping with problem-set questions. The primary motivation for using genAI tools was to save time. Moreover, over half of student participants reported simply inputting problems for AI to generate solutions, potentially bypassing their own problem-solving processes. These findings indicate that despite high adoption rates, students' current approaches to utilizing genAI tools often fall short in enhancing their own STEM problem-solving competencies. The study also explored students' and STEM instructors' perceptions of the benefits and risks associated with using genAI tools in STEM education. Our findings provide insights into how to guide students on appropriate genAI use in STEM courses and how to design genAI-based tools to foster students' problem-solving competency.


Hacking CTFs with Plain Agents

arXiv.org Artificial Intelligence

Cybersecurity is one of the key AI risk areas (OpenAI 2024b; The White House 2023; UK Government 2023): advanced LLMs could hack real-world systems at speeds far exceeding human capabilities (OpenAI 2024a). To quantify AI cyber capabilities, researchers use benchmarks, with InterCode-CTF (Yang, Prabhakar, Narasimhan, et al. 2023) among the most popular. InterCode-CTF adapts traditional Capture The Flag competitions to assess LLM hacking skills. Previously, Phuong et al. 2024 showed low performance on this benchmark and suggested low cyber exploitation capabilities. A recent follow-up by Abramovich et al. 2024 claimed state-ofthe-art results (72%) due to a particular novel harness design choice.


Efficient and Diverse Generative Robot Designs using Evolution and Intrinsic Motivation

arXiv.org Artificial Intelligence

Methods for generative design of robot physical configurations can automatically find optimal and innovative solutions for challenging tasks in complex environments. The vast search-space includes the physical design-space and the controller parameter-space, making it a challenging problem in machine learning and optimisation in general. Evolutionary algorithms (EAs) have shown promising results in generating robot designs via gradient-free optimisation. Morpho-evolution with learning (MEL) uses EAs to concurrently generate robot designs and learn the optimal parameters of the controllers. Two main issues prevent MEL from scaling to higher complexity tasks: computational cost and premature convergence to sub-optimal designs. To address these issues, we propose combining morpho-evolution with intrinsic motivations. Intrinsically motivated behaviour arises from embodiment and simple learning rules without external guidance. We use a homeokinetic controller that generates exploratory behaviour in a few seconds with reduced knowledge of the robot's design. Homeokinesis replaces costly learning phases, reducing computational time and favouring diversity, preventing premature convergence. We compare our approach with current MEL methods in several downstream tasks. The generated designs score higher in all the tasks, are more diverse, and are quickly generated compared to morpho-evolution with static parameters.


Moving generative AI into production

MIT Technology Review

Yet, difficulty successfully deploying generative AI continues to hamper progress. Companies know that generative AI could transform their businesses--and that failing to adopt will leave them behind--but they are faced with hurdles during implementation. This leaves two-thirds of business leaders dissatisfied with progress on their AI deployments. And while, in Q3 2023, 79% of companies said they planned to deploy generative AI projects in the next year, only 5% reported having use cases in production in May 2024. "We're just at the beginning of figuring out how to productize AI deployment and make it cost effective," says Rowan Trollope, CEO of Redis, a maker of real-time data platforms and AI accelerators.


The Morning After: Elon Musk wants the court to stop OpenAI becoming a for-profit

Engadget

Elon Musk's attorneys filed for an injunction against OpenAI and Microsoft on Friday, accusing them of anticompetitive practices. He wants to stop OpenAI's conversion to a for-profit company. Musk first sued OpenAI earlier this year for allegedly violating its founding mission of building AI "for the benefit of humanity," but he withdrew the lawsuit a few months later. He filed another lawsuit against OpenAI in a California federal court in August. The third time's the charm and all: Musk's new motion accuses OpenAI and Microsoft of telling investors not to fund OpenAI's competitors, such as Musk's xAI, of "benefitting from wrongfully obtained competitively sensitive information or coordination" through its relationship with Microsoft.