Generative AI
Reflexive Prompt Engineering: A Framework for Responsible Prompt Engineering and Interaction Design
Responsible prompt engineering has emerged as a critical framework for ensuring that generative artificial intelligence (AI) systems serve society's needs while minimizing potential harms. As generative AI applications become increasingly powerful and ubiquitous, the way we instruct and interact with them through prompts has profound implications for fairness, accountability, and transparency. This article examines how strategic prompt engineering can embed ethical and legal considerations and societal values directly into AI interactions, moving beyond mere technical optimization for functionality. This article proposes a comprehensive framework for responsible prompt engineering that encompasses five interconnected components: prompt design, system selection, system configuration, performance evaluation, and prompt management. Drawing from empirical evidence, the paper demonstrates how each component can be leveraged to promote improved societal outcomes while mitigating potential risks. The analysis reveals that effective prompt engineering requires a delicate balance between technical precision and ethical consciousness, combining the systematic rigor and focus on functionality with the nuanced understanding of social impact. Through examination of real-world and emerging practices, the article illustrates how responsible prompt engineering serves as a crucial bridge between AI development and deployment, enabling organizations to fine-tune AI outputs without modifying underlying model architectures. This approach aligns with broader "Responsibility by Design" principles, embedding ethical considerations directly into the implementation process rather than treating them as post-hoc additions. The article concludes by identifying key research directions and practical guidelines for advancing the field of responsible prompt engineering.
aiXamine: Simplified LLM Safety and Security
Deniz, Fatih, Popovic, Dorde, Boshmaf, Yazan, Jeong, Euisuh, Ahmad, Minhaj, Chawla, Sanjay, Khalil, Issa
Evaluating Large Language Models (LLMs) for safety and security remains a complex task, often requiring users to navigate a fragmented landscape of ad hoc benchmarks, datasets, metrics, and reporting formats. To address this challenge, we present aiXamine, a comprehensive black-box evaluation platform for LLM safety and security. aiXamine integrates over 40 tests (i.e., benchmarks) organized into eight key services targeting specific dimensions of safety and security: adversarial robustness, code security, fairness and bias, hallucination, model and data privacy, out-of-distribution (OOD) robustness, over-refusal, and safety alignment. The platform aggregates the evaluation results into a single detailed report per model, providing a detailed breakdown of model performance, test examples, and rich visualizations. We used aiXamine to assess over 50 publicly available and proprietary LLMs, conducting over 2K examinations. Our findings reveal notable vulnerabilities in leading models, including susceptibility to adversarial attacks in OpenAI's GPT-4o, biased outputs in xAI's Grok-3, and privacy weaknesses in Google's Gemini 2.0. Additionally, we observe that open-source models can match or exceed proprietary models in specific services such as safety alignment, fairness and bias, and OOD robustness. Finally, we identify trade-offs between distillation strategies, model size, training methods, and architectural choices.
ChatGPT maker OpenAI wants to buy Chrome from Google
Google is having a bit of a moment. It's not quite an Enron- or FTX-style "abandon ship" situation, but between two separate US antitrust rulings on its core search and advertising businesses, it's a five-alarm fire. One of the possible outcomes is Google selling off the Chrome browserโฆ and it looks like one possible buyer is OpenAI, maker of ChatGPT. OpenAI's head of product for ChatGPT is named Nick Turley, and he testified at the remedy phase of the Department of Justice's successful monopoly suit against Google. When asked if OpenAI would be interested in buying the Chrome browser from Google, Turley didn't mince words.
WhatsApp defends 'optional' AI tool that cannot be turned off
When you first use Meta AI in WhatsApp, it states the chatbot "can only read messages people share with it". "Meta can't read any other messages in your personal chats, as your personal messages remain end to end encrypted," it says. Meanwhile the Information Commissioner's Office told the BBC it would "continue to monitor the adoption of Meta AI's technology and use of personal data within WhatsApp". "Personal information fuels much of AI innovation so people need to trust that organisations are using their information responsibly," it said. "Organisations who want to use people's personal details to train or use generative AI models need to comply with all their data protection obligations, and take the necessary extra steps when it comes to processing the data of children."
ChatGPT-maker wants to buy Google Chrome
The current trial is looking at remedies to curtail Google's dominance in online search, as the recent explosion in generative AI services such as ChatGPT has expanded the market. Newer AI models search the internet to improve results and reduce hallucination, which has been a problem from developers since chatbots started to become popular. Last year, OpenAI offered to do a deal with Google which would have integrated Google search results into ChatGPT, according to Mr Turley's testimony. But he says their offer was rejected. "We have no partnership with Google today," Mr Turley said, according to Reuters. OpenAI does however have a partnership with Microsoft, which makes the Bing search engine and Edge browser.
3 Things Caiwei Chen is into right now
I recently saw Doomers, a new play by Matthew Gasda about the aborted 2023 coup at OpenAI, here represented by a fictional company called MindMesh. The action is set almost entirely in a meeting room; the first act follows executives immediately after the firing of company CEO Seth (a stand-in for Sam Altman), and the second re-creates the board negotiations that determined his fate. It's a solid attempt to capture the zeitgeist of Silicon Valley's AI frenzy and the world's moral panic over artificial intelligence, but the rapid-fire, high-stakes exchanges mean it sometimes seems to get lost in its own verbosity. The vastness of Chinese cuisine defies easy categorization, and even in a city with no shortage of options, I often find myself cooking--not just to recapture something closer to home, but to create a home unlike one that ever existed. Recently, I've been experimenting with a Chinese take on the charcuterie board--pairing toasted steamed buns, called mantou, with furu, a fermented tofu spread that is sharp, pungent, and full of umami. I started sewing three years ago, but only in the past year have I begun making clothes from scratch.
AI floods Amazon with strange political books before Canadian election
Canada has seen a boom in political books created with generative artificial intelligence, adding to concerns about how new technologies are affecting the information voters receive during the election campaign. Canadian Prime Minister Mark Carney was the subject of at least 16 books published in March and listed on Amazon, according to a review of the site on April 16. Five of those were published on a single day. In total, some 30 titles were published about Carney this year and made available on Amazon -- but most were taken down from the site after inquiries were made.
Benchmarking LLM for Code Smells Detection: OpenAI GPT-4.0 vs DeepSeek-V3
Sadik, Ahmed R., Govind, Siddhata
Determining the most effective Large Language Model for code smell detection presents a complex challenge. This study introduces a structured methodology and evaluation matrix to tackle this issue, leveraging a curated dataset of code samples consistently annotated with known smells. The dataset spans four prominent programming languages Java, Python, JavaScript, and C++; allowing for cross language comparison. We benchmark two state of the art LLMs, OpenAI GPT 4.0 and DeepSeek-V3, using precision, recall, and F1 score as evaluation metrics. Our analysis covers three levels of detail: overall performance, category level performance, and individual code smell type performance. Additionally, we explore cost effectiveness by comparing the token based detection approach of GPT 4.0 with the pattern-matching techniques employed by DeepSeek V3. The study also includes a cost analysis relative to traditional static analysis tools such as SonarQube. The findings offer valuable guidance for practitioners in selecting an efficient, cost effective solution for automated code smell detection
Generative AI for Research Data Processing: Lessons Learnt From Three Use Cases
Mitra, Modhurita, de Vos, Martine G., Cortinovis, Nicola, Ometto, Dawa
--There has been enormous interest in generative AI since ChatGPT was launched in 2022. However, there are concerns about the accuracy and consistency of the outputs of generative AI. We have carried out an exploratory study on the application of this new technology in research data processing. We identified tasks for which rule-based or traditional machine learning approaches were difficult to apply, and then performed these tasks using generative AI. We demonstrate the feasibility of using the generative AI model Claude 3 Opus in three research projects involving complex data processing tasks: 1) Information extraction: We extract plant species names from historical seedlists (catalogues of seeds) published by botanical gardens. We share the lessons we learnt from these use cases: How to determine if generative AI is an appropriate tool for a given data processing task, and if so, how to maximise the accuracy and consistency of the results obtained. In this paper, we share our insights on the application of generative AI in research software engineering projects. Generative AI can potentially be used to perform a wide variety of research data processing tasks, such as interpreting documents, extracting information from them, and classifying text into categories. Since the tasks are specified through prompts in natural language, the barrier to entry is low. Therefore, this tool can be easily used by domain experts in a wide range of fields, with varying levels of programming skills and depth of knowledge of technical topics such as machine learning.
A Multi-Agent Framework for Automated Qinqiang Opera Script Generation Using Large Language Models
Cao, Gengxian, Li, Fengyuan, Duan, Hong, Yang, Ye, Wang, Bofeng, Li, Donghe
This paper introduces a novel multi-Agent framework that automates the end to end production of Qinqiang opera by integrating Large Language Models , visual generation, and Text to Speech synthesis. Three specialized agents collaborate in sequence: Agent1 uses an LLM to craft coherent, culturally grounded scripts;Agent2 employs visual generation models to render contextually accurate stage scenes; and Agent3 leverages TTS to produce synchronized, emotionally expressive vocal performances. In a case study on Dou E Yuan, the system achieved expert ratings of 3.8 for script fidelity, 3.5 for visual coherence, and 3.8 for speech accuracy-culminating in an overall score of 3.6, a 0.3 point improvement over a Single Agent baseline. Ablation experiments demonstrate that removing Agent2 or Agent3 leads to drops of 0.4 and 0.5 points, respectively, underscoring the value of modular collaboration. This work showcases how AI driven pipelines can streamline and scale the preservation of traditional performing arts, and points toward future enhancements in cross modal alignment, richer emotional nuance, and support for additional opera genres.