Goto

Collaborating Authors

 Generative AI


Generative Models in Decision Making: A Survey

arXiv.org Artificial Intelligence

In recent years, the exceptional performance of generative models in generative tasks has sparked significant interest in their integration into decision-making processes. Due to their ability to handle complex data distributions and their strong model capacity, generative models can be effectively incorporated into decision-making systems by generating trajectories that guide agents toward high-reward state-action regions or intermediate sub-goals. This paper presents a comprehensive review of the application of generative models in decision-making tasks. We classify seven fundamental types of generative models: energy-based models, generative adversarial networks, variational autoencoders, normalizing flows, diffusion models, generative flow networks, and autoregressive models. Regarding their applications, we categorize their functions into three main roles: controllers, modelers and optimizers, and discuss how each role contributes to decision-making. Furthermore, we examine the deployment of these models across five critical real-world decision-making scenarios. Finally, we summarize the strengths and limitations of current approaches and propose three key directions for advancing next-generation generative directive models: high-performance algorithms, large-scale generalized decision-making models, and self-evolving and adaptive models.


When Discourse Stalls: Moving Past Five Semantic Stopsigns about Generative AI in Design Research

arXiv.org Artificial Intelligence

It has been roughly three years since the open-source release of Stable Diffusion ignited a Generative AI (GenAI) boom [Bengesi et al., 2023]. The proliferation of these technologies has since reshaped design practice and research. From early ideation to final implementation, these developments have significantly altered how design work is conceived, conducted, and evaluated [Hou et al., 2024]. This essay examines the critical juncture at which the design research community finds itself, seeking to understand and shape these developments while grappling with their implications for creative practice, design education, and professional identities. Popular discourse around GenAI often centers on simplified unequivocal narratives: AI as a threat to humanity, as a solution to global challenges, as a force of disruption, or as a replacement for humans [Gilardi et al., 2024]. While these narratives have sparked debate and interest, they can function as "semantic stopsigns"--conceptual framings that oversimplify complex issues, providing an illusion of resolution that hinders deeper inquiry [LessWrong Community, n.d., Lifton, 1961]. For instance, claims like "AI is unreliable" can lead to outright dismissal of its potential,


Fact-checking with Generative AI: A Systematic Cross-Topic Examination of LLMs Capacity to Detect Veracity of Political Information

arXiv.org Artificial Intelligence

The purpose of this study is to assess how large language models (LLMs) can be used for fact-checking and contribute to the broader debate on the use of automated means for veracity identification. To achieve this purpose, we use AI auditing methodology that systematically evaluates performance of five LLMs (ChatGPT 4, Llama 3 (70B), Llama 3.1 (405B), Claude 3.5 Sonnet, and Google Gemini) using prompts regarding a large set of statements fact-checked by professional journalists (16,513). Specifically, we use topic modeling and regression analysis to investigate which factors (e.g. topic of the prompt or the LLM type) affect evaluations of true, false, and mixed statements. Our findings reveal that while ChatGPT 4 and Google Gemini achieved higher accuracy than other models, overall performance across models remains modest. Notably, the results indicate that models are better at identifying false statements, especially on sensitive topics such as COVID-19, American political controversies, and social issues, suggesting possible guardrails that may enhance accuracy on these topics. The major implication of our findings is that there are significant challenges for using LLMs for factchecking, including significant variation in performance across different LLMs and unequal quality of outputs for specific topics which can be attributed to deficits of training data. Our research highlights the potential and limitations of LLMs in political fact-checking, suggesting potential avenues for further improvements in guardrails as well as fine-tuning.


JurisTCU: A Brazilian Portuguese Information Retrieval Dataset with Query Relevance Judgments

arXiv.org Artificial Intelligence

This paper introduces JurisTCU, a Brazilian Portuguese dataset for legal information retrieval (LIR). The dataset is freely available and consists of 16,045 jurisprudential documents from the Brazilian Federal Court of Accounts, along with 150 queries annotated with relevance judgments. It addresses the scarcity of Portuguese-language LIR datasets with query relevance annotations. The queries are organized into three groups: real user keyword-based queries, synthetic keyword-based queries, and synthetic question-based queries. Relevance judgments were produced through a hybrid approach combining LLM-based scoring with expert domain validation. We used JurisTCU in 14 experiments using lexical search (document expansion methods) and semantic search (BERT-based and OpenAI embeddings). We show that the document expansion methods significantly improve the performance of standard BM25 search on this dataset, with improvements exceeding 45% in P@10, R@10, and nDCG@10 metrics when evaluating short keyword-based queries. Among the embedding models, the OpenAI models produced the best results, with improvements of approximately 70% in P@10, R@10, and nDCG@10 metrics for short keyword-based queries, suggesting that these dense embeddings capture semantic relationships in this domain, surpassing the reliance on lexical terms. Besides offering a dataset for the Portuguese-language IR research community, suitable for evaluating search systems, the results also contribute to enhancing a search system highly relevant to Brazilian citizens.


DOGE's Plans to Replace Humans With AI Are Already Under Way

The Atlantic - Technology

If you have tips about the remaking of the federal government, you can contact Matteo Wong on Signal at @matteowong.52. A new phase of the president and the Department of Government Efficiency's attempts to downsize and remake the civil service is under way. The idea is simple: use generative AI to automate work that was previously done by people. The Trump administration is testing a new chatbot with 1,500 federal employees at the General Services Administration and may release it to the entire agency as soon as this Friday--meaning it could be used by more than 10,000 workers who are responsible for more than 100 billion in contracts and services. This article is based in part on conversations with several current and former GSA employees with knowledge of the technology, all of whom requested anonymity to speak about confidential information; it is also based on internal GSA documents that I reviewed, as well as the software's code base, which is visible on GitHub.


Microsoft cuts data centre plans and hikes prices in push to make users carry AI costs

AIHub

After a year of shoehorning generative AI into its flagship products, Microsoft is trying to recoup the costs by raising prices, putting ads in products, and cancelling data centre leases. Google is making similar moves, adding unavoidable AI features to its Workspace service while increasing prices. Is the tide finally turning on investments into generative AI? The situation is not quite so simple. Tech companies are fully committed to the new technology – but are struggling to find ways to make people pay for it.


Can Artificial Intelligence Stir-Fry?

The New Yorker

That year's game was known as the Dot-Com Bowl. Twenty years later, Super Bowl LVI was called the Crypto Bowl, and featured ads from Coinbase, Crypto.com, and FTX. Soon, FTX was bankrupt, and Bitcoin was sputtering. This year, the Super Bowl was all about artificial intelligence, as Google, Meta, OpenAI, and Salesforce ran ads showing off their A.I. tools. "It is such a bad sign," Ed Zitron, an A.I. skeptic and the host of the tech podcast "Better Offline," said the other day.


Generative Artificial Intelligence in Robotic Manipulation: A Survey

arXiv.org Artificial Intelligence

This survey provides a comprehensive review on recent advancements of generative learning models in robotic manipulation, addressing key challenges in the field. Robotic manipulation faces critical bottlenecks, including significant challenges in insufficient data and inefficient data acquisition, long-horizon and complex task planning, and the multi-modality reasoning ability for robust policy learning performance across diverse environments. To tackle these challenges, this survey introduces several generative model paradigms, including Generative Adversarial Networks (GANs), Variational Autoencoders (VAEs), diffusion models, probabilistic flow models, and autoregressive models, highlighting their strengths and limitations. The applications of these models are categorized into three hierarchical layers: the Foundation Layer, focusing on data generation and reward generation; the Intermediate Layer, covering language, code, visual, and state generation; and the Policy Layer, emphasizing grasp generation and trajectory generation. Each layer is explored in detail, along with notable works that have advanced the state of the art. Finally, the survey outlines future research directions and challenges, emphasizing the need for improved efficiency in data utilization, better handling of long-horizon tasks, and enhanced generalization across diverse robotic scenarios. All the related resources, including research papers, open-source data, and projects, are collected for the community in https://github.com/GAI4Manipulation/AwesomeGAIManipulation


NeuroChat: A Neuroadaptive AI Chatbot for Customizing Learning Experiences

arXiv.org Artificial Intelligence

Generative AI is transforming education by enabling personalized, on-demand learning experiences. However, AI tutors lack the ability to assess a learner's cognitive state in real time, limiting their adaptability. Meanwhile, electroencephalography (EEG)-based neuroadaptive systems have successfully enhanced engagement by dynamically adjusting learning content. This paper presents NeuroChat, a proof-of-concept neuroadaptive AI tutor that integrates real-time EEG-based engagement tracking with generative AI. NeuroChat continuously monitors a learner's cognitive engagement and dynamically adjusts content complexity, response style, and pacing using a closed-loop system. We evaluate this approach in a pilot study (n=24), comparing NeuroChat to a standard LLM-based chatbot. Results indicate that NeuroChat enhances cognitive and subjective engagement but does not show an immediate effect on learning outcomes. These findings demonstrate the feasibility of real-time cognitive feedback in LLMs, highlighting new directions for adaptive learning, AI tutoring, and human-AI interaction.


The Impact of Generative AI Coding Assistants on Developers Who Are Visually Impaired

arXiv.org Artificial Intelligence

The rapid adoption of generative AI in software development has impacted the industry, yet its effects on developers with visual impairments remain largely unexplored. To address this gap, we used an Activity Theory framework to examine how developers with visual impairments interact with AI coding assistants. For this purpose, we conducted a study where developers who are visually impaired completed a series of programming tasks using a generative AI coding assistant. We uncovered that, while participants found the AI assistant beneficial and reported significant advantages, they also highlighted accessibility challenges. Specifically, the AI coding assistant often exacerbated existing accessibility barriers and introduced new challenges. For example, it overwhelmed users with an excessive number of suggestions, leading developers who are visually impaired to express a desire for ``AI timeouts.'' Additionally, the generative AI coding assistant made it more difficult for developers to switch contexts between the AI-generated content and their own code. Despite these challenges, participants were optimistic about the potential of AI coding assistants to transform the coding experience for developers with visual impairments. Our findings emphasize the need to apply activity-centered design principles to generative AI assistants, ensuring they better align with user behaviors and address specific accessibility needs. This approach can enable the assistants to provide more intuitive, inclusive, and effective experiences, while also contributing to the broader goal of enhancing accessibility in software development.