Generative AI
Culling Misinformation from Gen AI: Toward Ethical Curation and Refinement
Khatiwada, Prerana, Donaher, Grace, Navarro, Jasymyn, Bhatta, Lokesh
While Artificial Intelligence (AI) is not a new field, recent developments, especially with the release of generative tools like ChatGPT, have brought it to the forefront of the minds of industry workers and academic folk alike. There is currently much talk about AI and its ability to reshape many everyday processes as we know them through automation. It also allows users to expand their ideas by suggesting things they may not have thought of on their own and provides easier access to information. However, not all of the changes this technology will bring or has brought so far are positive; this is why it is extremely important for all modern people to recognize and understand the risks before using these tools and allowing them to cause harm. This work takes a position on better understanding many equity concerns and the spread of misinformation that result from new AI, in this case, specifically ChatGPT and deepfakes, and encouraging collaboration with law enforcement, developers, and users to reduce harm. Considering many academic sources, it warns against these issues, analyzing their cause and impact in fields including healthcare, education, science, academia, retail, and finance. Lastly, we propose a set of future-facing guidelines and policy considerations to solve these issues while still enabling innovation in these fields, this responsibility falling upon users, developers, and government entities.
"Before, I Asked My Mom, Now I Ask ChatGPT": Visual Privacy Management with Generative AI for Blind and Low-Vision People
Sharma, Tanusree, Tseng, Yu-Yun, Zhang, Lotus, Ide, Ayae, Mack, Kelly Avery, Findlater, Leah, Gurari, Danna, Wang, Yang
Blind and low vision (BLV) individuals use Generative AI (GenAI) tools to interpret and manage visual content in their daily lives. While such tools can enhance the accessibility of visual content and so enable greater user independence, they also introduce complex challenges around visual privacy. In this paper, we investigate the current practices and future design preferences of blind and low vision individuals through an interview study with 21 participants. Our findings reveal a range of current practices with GenAI that balance privacy, efficiency, and emotional agency, with users accounting for privacy risks across six key scenarios, such as self-presentation, indoor/outdoor spatial privacy, social sharing, and handling professional content. Our findings reveal design preferences, including on-device processing, zero-retention guarantees, sensitive content redaction, privacy-aware appearance indicators, and multimodal tactile mirrored interaction methods. We conclude with actionable design recommendations to support user-centered visual privacy through GenAI, expanding the notion of privacy and responsible handling of others data.
Specification and Evaluation of Multi-Agent LLM Systems -- Prototype and Cybersecurity Applications
Recent advancements in LLMs indicate potential for novel applications, as evidenced by the reasoning capabilities in the latest OpenAI and DeepSeek models. To apply these models to domain-specific applications beyond text generation, LLM-based multi-agent systems can be utilized to solve complex tasks, particularly by combining reasoning techniques, code generation, and software execution across multiple, potentially specialized LLMs. However, while many evaluations are performed on LLMs, reasoning techniques, and applications individually, their joint specification and combined application are not well understood. Defined specifications for multi-agent LLM systems are required to explore their potential and suitability for specific applications, allowing for systematic evaluations of LLMs, reasoning techniques, and related aspects. This paper reports the results of exploratory research on (1.) multi-agent specification by introducing an agent schema language and (2.) the execution and evaluation of the specifications through a multi-agent system architecture and prototype. The specification language, system architecture, and prototype are first presented in this work, building on an LLM system from prior research. Test cases involving cybersecurity tasks indicate the feasibility of the architecture and evaluation approach. As a result, evaluations could be demonstrated for question answering, server security, and network security tasks completed correctly by agents with LLMs from OpenAI and DeepSeek.
Visually Guided Decoding: Gradient-Free Hard Prompt Inversion with Language Models
Kim, Donghoon, Bae, Minji, Shim, Kyuhong, Shim, Byonghyo
Text-to-image generative models like DALL-E and Stable Diffusion have revolutionized visual content creation across various applications, including advertising, personalized media, and design prototyping. However, crafting effective textual prompts to guide these models remains challenging, often requiring extensive trial and error. Existing prompt inversion approaches, such as soft and hard prompt techniques, are not so effective due to the limited interpretability and incoherent prompt generation. To address these issues, we propose Visually Guided Decoding (VGD), a gradient-free approach that leverages large language models (LLMs) and CLIP-based guidance to generate coherent and semantically aligned prompts. In essence, VGD utilizes the robust text generation capabilities of LLMs to produce human-readable prompts. Further, by employing CLIP scores to ensure alignment with user-specified visual concepts, VGD enhances the interpretability, generalization, and flexibility of prompt generation without the need for additional training. Our experiments demonstrate that VGD outperforms existing prompt inversion techniques in generating understandable and contextually relevant prompts, facilitating more intuitive and controllable interactions with text-to-image models. Figure 1: Visually Guided Decoding ( VGD) works with any LLM without extra training, making it easy to integrate into a chat-based interface that offers interpretable and controllable text-to-image generation. In recent years, image generative models such as DALL-E and Stable Diffusion have shown remarkable success in generating high-fidelity images (Ramesh et al., 2022; Rombach et al., 2022; Podell et al., 2024). These models are widely used in a variety of applications, including visual content generation ( e.g., advertisement, movie, game), personalized content generation ( e.g., caricature, photo editing), and prototyping ( e.g., architecture and product design).
OpenAI signs deal with UK to find government uses for its models
Sam Altman, leader of one of the world's biggest artificial intelligence companies, has signed a deal with the British government to explore the deployment of advanced AI models in areas including justice, security and education. The chief executive of OpenAI, which has been valued at 300bn ( 220bn) and provides the ChatGPT suite of large language models, agreed the memorandum of understanding with the science and technology secretary, Peter Kyle, on Monday. It follows a similarly wide-ranging deal between the UK government and OpenAI's rival US tech company, Google, which campaigners called "dangerously naive", citing fears that the arrangement could leave the public sector dependent on private technology providers and make it harder for politicians to regulate them. The latest agreement states that OpenAI and the government "will collaborate to identify opportunities for how advanced AI models can be deployed throughout government", including "to help civil servants work more efficiently" and to support "citizens to navigate public services more effectively". It said they will collaborate to develop AI solutions "to the UK's hardest problems, including in areas such as justice, defence and security, and education technology" and develop partnerships "to expand public engagement with AI technology".
OpenAI's New CEO of Applications Strikes Hyper-Optimistic Tone in First Memo to Staff
OpenAI's incoming CEO of applications, Fidji Simo, sent her first note to staff on Monday, telling employees the tools they're developing "will unlock more opportunities for more people than any other technology in history." "If we get this right, AI can give everyone more power than ever," Simo wrote, striking a hyper-optimistic tone, according to a copy of the memo viewed by WIRED. "But I also realize those opportunities won't magically appear on their own." Simo previously worked as the CEO of Instacart. Before that, she spent a decade at Meta, where she went from being a product manager on the company's news feed to the head of product for the Facebook app.
Jensen Huang, AI visionary in a leather jacket
Unknown to the general public just three years ago, Jensen Huang is now one of the most powerful entrepreneurs in the world as head of chip giant Nvidia. The unassuming 62-year-old draws stadium crowds of more than 10,000 people as his company's products push the boundaries of artificial intelligence. Chips designed by Nvidia, known as graphics cards or GPUs (Graphics Processing Units), are essential in developing the generative artificial intelligence powering technology like ChatGPT.
Generative AI-Driven High-Fidelity Human Motion Simulation
Iyer, Hari, Macwan, Neel, Hude, Atharva Jitendra, Jeong, Heejin, Guo, Shenghan
Human motion simulation (HMS) supports cost-effective evaluation of worker behavior, safety, and productivity in industrial tasks. However, existing methods often suffer from low motion fidelity. This study introduces Generative-AI-Enabled HMS (G-AI-HMS), which integrates text-to-text and text-to-motion models to enhance simulation quality for physical tasks. G-AI-HMS tackles two key challenges: (1) translating task descriptions into motion-aware language using Large Language Models aligned with MotionGPT's training vocabulary, and (2) validating AI-enhanced motions against real human movements using computer vision. Posture estimation algorithms are applied to real-time videos to extract joint landmarks, and motion similarity metrics are used to compare them with AI-enhanced sequences. In a case study involving eight tasks, the AI-enhanced motions showed lower error than human created descriptions in most scenarios, performing better in six tasks based on spatial accuracy, four tasks based on alignment after pose normalization, and seven tasks based on overall temporal similarity. Statistical analysis showed that AI-enhanced prompts significantly (p $<$ 0.0001) reduced joint error and temporal misalignment while retaining comparable posture accuracy.
IConMark: Robust Interpretable Concept-Based Watermark For AI Images
Sadasivan, Vinu Sankar, Saberi, Mehrdad, Feizi, Soheil
With the rapid rise of generative AI and synthetic media, distinguishing AI-generated images from real ones has become crucial in safeguarding against misinformation and ensuring digital authenticity. Traditional watermarking techniques have shown vulnerabilities to adversarial attacks, undermining their effectiveness in the presence of attackers. W e propose IConMark, a novel in-generation robust semantic watermarking method that embeds interpretable concepts into AI-generated images, as a first step toward interpretable watermarking. Unlike traditional methods, which rely on adding noise or perturbations to AI-generated images, IConMark incorporates meaningful semantic attributes, making it interpretable to humans and hence, resilient to adversarial manipulation. This method is not only robust against various image augmentations but also human-readable, enabling manual verification of watermarks. W e demonstrate a detailed evaluation of IConMark's effectiveness, demonstrating its superiority in terms of detection accuracy and maintaining image quality. Moreover, IConMark can be combined with existing watermarking techniques to further enhance and complement its robustness. W e introduce IConMark+SS and ICon-Mark+TM, hybrid approaches combining IConMark with StegaStamp and TrustMark, respectively, to further bolster robustness against multiple types of image manipulations. Our base watermarking technique (IConMark) and its variants (+TM and +SS) achieve 10.8%, 14.5%, and 15.9% higher mean area under the receiver operating characteristic curve (AUROC) scores for watermark detection, respectively, compared to the best baseline on various datasets.
The role of large language models in UI/UX design: A systematic literature review
Ahmed, Ammar, Imran, Ali Shariq
User Interface (UI) and User Experience (UX) design are foundational components of the software development lifecycle, playing a very important role in shaping how users perceive, interact with, and derive value from digital products. UI design encompasses the visual and interactive elements of a system, including layout, typography, and on-screen components. In contrast, UX design encompasses the broader user journey, including the emotions, perceptions, and behaviors that emerge before, during, and after interaction with a product [34]. The quality of UI/UX design is a decisive factor in product success and user retention. Research consistently shows that poor UI/UX can drive users to abandon products altogether [9, 63].