Generative AI
Bridging the LLM Accessibility Divide? Performance, Fairness, and Cost of Closed versus Open LLMs for Automated Essay Scoring
Oketch, Kezia, Lalor, John P., Yang, Yi, Abbasi, Ahmed
The rapid development of machine learning (ML) technologies, particularly large language models (LLMs), has led to major advancements in natural language processing (NLP, Abbasi et al. 2023). While much of this advancement happened under the umbrella of the common task framework which espouses transparency and openness (Abbasi et al. 2023), in recent years, closed LLMs such as GPT-3 and GPT-4 have set new performance standards in tasks ranging from text generation to question answering, demonstrating unprecedented capabilities in zero-shot and few-shot learning scenarios (Brown et al. 2020, OpenAI 2023). Given the strong performance of closed LLMs such as GPT-4, many studies within the LLM-as-a-judge paradigm rely on their scores as ground truth benchmarks for evaluating both open and closed LLMs (Chiang and Lee 2023), further entrenching the dominance of SOTA closed LLMs (Vergho et al. 2024). Along with closed LLMs, there are also LLMs where the pre-trained models (i.e., training weights) and inference code are publicly available ("open LLMs") such as Llama (Touvron et al. 2023, Dubey et al. 2024) as well as LLMs where the full training data and training code are also available ("open-source LLMs") such as OLMo (Groeneveld et al. 2024). Open and open-source LLMs provide varying levels of transparency for developers and researchers (Liu et al. 2023). Access to model weights, training data, and inference code enables several benefits for the user-developer-researcher community, including lower costs per input/output token through third-party API services, support for local/offline pre-training and fine-tuning, and deeper analysis of model biases and debiasing strategies. However, the dominance of closed LLMs raises a number of concerns, including accessibility and fairness (Strubell et al. 2020, Bender 2021, Irugalbandara et al. 2024).
Prompt Injection Detection and Mitigation via AI Multi-Agent NLP Frameworks
Gosmar, Diego, Dahl, Deborah A., Gosmar, Dario
Recent advances in generative AI have enabled increasingly sophisticated applications in various domains, from customer service chatbots to automated content generation. However, alongside these advancements, the vulnerability of large language models (LLMs) to adversarial inputs has emerged as a critical concern. Among these, prompt injection attacks pose a particularly insidious challenge, as they exploit the model's inherent instruction-following behavior to override intended constraints. While prompt injection is often discussed in theoretical contexts, its impact on deployed AI systems has been observed in practical settings. Research has demonstrated that even models with reinforced safety mechanisms--or with specific Knowledge based on RAG (Retrieval Augmented Generation)--can be manipulated into disclosing sensitive data, executing unauthorized instructions, or producing harmful content [4].
Multi-Stage Generative Upscaler: Reconstructing Football Broadcast Images via Diffusion Models
Martini, Luca, Zolezzi, Daniele, Iacono, Saverio, Vercelli, Gianni Viardo
Generative Artificial Intelligence (genAI) represents a groundbreaking approach to creativity and automation, empowering machines to produce novel and highly realistic data, including images, text, and music. Among the diverse generative models, Diffusion Models have emerged as a powerful technique for high-quality image synthesis. Rooted in the principles of probabilistic modeling, Diffusion Models iteratively refine noise into detailed and coherent representations, achieving remarkable performance in domains like image generation, image inpainting and style transfer. Diffusion Models have gained traction due to their versatility and robustness, allowing them to excel in challenging tasks where conventional generative approaches, such as Generative Adversarial Networks (GANs), often struggle. These models leverage a forward-backward diffusion process, where images are progressively noised during the forward phase and restored to their original form during the reverse phase.
Accessibility Considerations in the Development of an AI Action Plan
Mankoff, Jennifer, Light, Janice, Coughlan, James, Vogler, Christian, Glasser, Abraham, Vanderheiden, Gregg, Rice, Laura
AI has the potential to empower everyone to become more independent and self-sufficient. The increasing use of artificial intelligence (AI)-based technologies in everyday settings creates new opportunities to understand how disabled people might use these technologies [Glazko, 2023]. It also enables the development of new types of assistive technologies as well as new ways for people with disabilities to interact with technology in ways that are both simpler (for those who need things simpler) and more efficient and effective for those who cannot use the traditional interfaces effectively. AI has been rapidly taken up in almost all accessibility communities [Adnin 2024, Alharbi 2024, Jiang 2024, Bennett 2024, Valencia 2023]. Since becoming widely available to the public, Generative Artificial Intelligence (GAI) has steadily gained recognition for its potential as a valuable tool in the private sector and by government, as well as a tool for accessibility. Studies of blind and visually impaired individuals have found that they use GAI to'offload' cognitively demanding tasks and obtain personal help such as fashion advice (e.g., [Xie 2024]), and to create content or retrieve information [Adnin 2024]. A study of GAI use by neurodiverse users found GAI can both support and complicate tasks like code-switching, emotional regulation, and accessing information [Glazko, 2025]. A study of people who use AAC found it helpful for text input [Valencia 2023]. However there are concerns with a technology that is often based on probability and thus tends toward the most common case rather than those at the margins.
From Generative AI to Innovative AI: An Evolutionary Roadmap
Mohammadabadi, Seyed Mahmoud Sajjadi
This paper explores the critical transition from Generative Artificial Intelligence (GenAI) to Innovative Artificial Intelligence (InAI). While recent advancements in GenAI have enabled systems to produce high-quality content across various domains, these models often lack the capacity for true innovation. In this context, innovation is defined as the ability to generate novel and useful outputs that go beyond mere replication of learned data. The paper examines this shift and proposes a roadmap for developing AI systems that can generate content and engage in autonomous problem-solving and creative ideation. The work provides both theoretical insights and practical strategies for advancing AI to a stage where it can genuinely innovate, contributing meaningfully to science, technology, and the arts.
Optimizing Large Language Models for Detecting Symptoms of Comorbid Depression or Anxiety in Chronic Diseases: Insights from Patient Messages
Kim, Jiyeong, Ma, Stephen P., Chen, Michael L., Galatzer-Levy, Isaac R., Torous, John, van Roessel, Peter J., Sharp, Christopher, Pfeffer, Michael A., Rodriguez, Carolyn I., Linos, Eleni, Chen, Jonathan H.
Patients with diabetes are at increased risk of comorbid depression or anxiety, complicating their management. This study evaluated the performance of large language models (LLMs) in detecting these symptoms from secure patient messages. We applied multiple approaches, including engineered prompts, systemic persona, temperature adjustments, and zero-shot and few-shot learning, to identify the best-performing model and enhance performance. Three out of five LLMs demonstrated excellent performance (over 90% of F-1 and accuracy), with Llama 3.1 405B achieving 93% in both F-1 and accuracy using a zero-shot approach. While LLMs showed promise in binary classification and handling complex metrics like Patient Health Questionnaire-4, inconsistencies in challenging cases warrant further real-life assessment. The findings highlight the potential of LLMs to assist in timely screening and referrals, providing valuable empirical knowledge for real-world triage systems that could improve mental health care for patients with chronic diseases.
Can Large Reasoning Models do Analogical Reasoning under Perceptual Uncertainty?
Camposampiero, Giacomo, Hersche, Michael, Wattenhofer, Roger, Sebastian, Abu, Rahimi, Abbas
This work presents a first evaluation of two state-of-the-art Large Reasoning Models (LRMs), OpenAI's o3-mini and DeepSeek R1, on analogical reasoning, focusing on well-established nonverbal human IQ tests based on Raven's progressive matrices. We benchmark with the I-RAVEN dataset and its more difficult extension, I-RAVEN-X, which tests the ability to generalize to longer reasoning rules and ranges of the attribute values. To assess the influence of visual uncertainties on these nonverbal analogical reasoning tests, we extend the I-RAVEN-X dataset, which otherwise assumes an oracle perception. We adopt a two-fold strategy to simulate this imperfect visual perception: 1) we introduce confounding attributes which, being sampled at random, do not contribute to the prediction of the correct answer of the puzzles and 2) smoothen the distributions of the input attributes' values. We observe a sharp decline in OpenAI's o3-mini task accuracy, dropping from 86.6% on the original I-RAVEN to just 17.0% -- approaching random chance -- on the more challenging I-RAVEN-X, which increases input length and range and emulates perceptual uncertainty. This drop occurred despite spending 3.4x more reasoning tokens. A similar trend is also observed for DeepSeek R1: from 80.6% to 23.2%. On the other hand, a neuro-symbolic probabilistic abductive model, ARLC, that achieves state-of-the-art performances on I-RAVEN, can robustly reason under all these out-of-distribution tests, maintaining strong accuracy with only a modest reduction from 98.6% to 88.0%. Our code is available at https://github.com/IBM/raven-large-language-models.
Researchers Propose a Better Way to Report Dangerous AI Flaws
In late 2023, a team of third party researchers discovered a troubling glitch in OpenAI's widely used artificial intelligence model GPT-3.5. When asked to repeat certain words a thousand times, the model began repeating the word over and over, then suddenly switched to spitting out incoherent text and snippets of personal information drawn from its training data, including parts of names, phone numbers, and email addresses. The team that discovered the problem worked with OpenAI to ensure the flaw was fixed before revealing it publicly. It is just one of scores of problems found in major AI models in recent years. In a proposal released today, more than 30 prominent AI researchers, including some who found the GPT-3.5 flaw, say that many other vulnerabilities affecting popular models are reported in problematic ways.
Snapchat launches generative AI video Lenses
Snapchat's future includes generative AI video Lenses, wherein users can watch themselves cuddling with virtual animals on screen. The first three Lenses the app has launched include the Racoon and Fox, which animate the animals into a Snap. Meanwhile, the third one called Spring Flowers will generate a bouquet of flowers and use a zoom-out effect to reveal who's holding it. All three Lenses, as well as future ones Snapchat releases, are powered by a generative video model the company built in-house. Snap says it will be adding more every week to expand users' options.
Securing the AI future: How President Trump's action plan can position America for success
The Trump administration is prioritizing the critical role of artificial intelligence in creating and upholding freedom. Just three weeks in, Vice President JD Vance declared at a global AI summit in Paris that AI "will make people more productive, more prosperous, and more free. The United States of America is the leader in AI, and our administration plans to keep it that way." To achieve this, the White House is working toward an AI action plan and calling on leading American AI companies to submit our best ideas. OpenAI is pleased to submit proposals today on a range of important considerations for AI from national security, to infrastructure and energy, to the federal government's own use of AI.