AITopics

Generative AI significantly enhances player agency in interactive narratives (IN) by enabling just-in-time content generation that adapts to player actions. While delegating generation to AI makes IN more interactive, it becomes challenging for authors to control the space of possible narratives - within which the final story experienced by the player emerges from their interaction with AI. In this paper, we present WhatELSE, an AI-bridged IN authoring system that creates narrative possibility spaces from example stories. WhatELSE provides three views (narrative pivot, outline, and variants) to help authors understand the narrative space and corresponding tools leveraging linguistic abstraction to control the boundaries of the narrative space. Taking innovative LLM-based narrative planning approaches, WhatELSE further unfolds the narrative space into executable game events. Through a user study (N=12) and technical evaluations, we found that WhatELSE enables authors to perceive and edit the narrative space and generates engaging interactive narratives at play-time.

narrative space, player action, whatelse, (15 more...)

2502.18641

Country:

Asia > Japan > Honshū > Kantō > Kanagawa Prefecture > Yokohama (0.05)
North America > United States > Washington > King County > Redmond (0.04)
North America > United States > New York > New York County > New York City (0.04)
(11 more...)

Genre:

Research Report > New Finding (1.00)
Questionnaire & Opinion Survey (1.00)

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)
Information Technology > Artificial Intelligence > Natural Language > Generation (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.34)

Tai, Yuehong Cassandra, Patni, Khushi Navin, Hemauer, Nicholas Daniel, Desmarais, Bruce, Lin, Yu-Ru

GenAI vs. Human Fact-Checkers: Accurate Ratings, Flawed Rationales

Despite recent advances in understanding the capabilities and limits of generative artificial intelligence (GenAI) models, we are just beginning to understand their capacity to assess and reason about the veracity of content. We evaluate multiple GenAI models across tasks that involve the rating of, and perceived reasoning about, the credibility of information. The information in our experiments comes from content that subnational U.S. politicians post to Facebook. We find that GPT-4o, one of the most used AI models in consumer applications, outperforms other models, but all models exhibit only moderate agreement with human coders. Importantly, even when GenAI models accurately identify low-credibility content, their reasoning relies heavily on linguistic features and ``hard'' criteria, such as the level of detail, source reliability, and language formality, rather than an understanding of veracity. We also assess the effectiveness of summarized versus full content inputs, finding that summarized content holds promise for improving efficiency without sacrificing accuracy. While GenAI has the potential to support human fact-checkers in scaling misinformation detection, our results caution against relying solely on these models.

credibility, misinformation, reasoning, (13 more...)

2502.14943

Country:

North America > United States > New Jersey > Middlesex County > New Brunswick (0.05)
North America > United States > Pennsylvania > Centre County > State College (0.05)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine (0.95)
Media > News (0.53)
Government > Regional Government > North America Government > United States Government (0.47)
Government > Voting & Elections (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.48)

DeepSeek-R1 Outperforms Gemini 2.0 Pro, OpenAI o1, and o3-mini in Bilingual Complex Ophthalmology Reasoning

Xu, Pusheng, Wu, Yue, Jin, Kai, Chen, Xiaolan, He, Mingguang, Shi, Danli

Purpose: To evaluate the accuracy and reasoning ability of DeepSeek-R1 and three other recently released large language models (LLMs) in bilingual complex ophthalmology cases. Methods: A total of 130 multiple-choice questions (MCQs) related to diagnosis (n = 39) and management (n = 91) were collected from the Chinese ophthalmology senior professional title examination and categorized into six topics. These MCQs were translated into English using DeepSeek-R1. The responses of DeepSeek-R1, Gemini 2.0 Pro, OpenAI o1 and o3-mini were generated under default configurations between February 15 and February 20, 2025. Accuracy was calculated as the proportion of correctly answered questions, with omissions and extra answers considered incorrect. Reasoning ability was evaluated through analyzing reasoning logic and the causes of reasoning error. Results: DeepSeek-R1 demonstrated the highest overall accuracy, achieving 0.862 in Chinese MCQs and 0.808 in English MCQs. Gemini 2.0 Pro, OpenAI o1, and OpenAI o3-mini attained accuracies of 0.715, 0.685, and 0.692 in Chinese MCQs (all P<0.001 compared with DeepSeek-R1), and 0.746 (P=0.115), 0.723 (P=0.027), and 0.577 (P<0.001) in English MCQs, respectively. DeepSeek-R1 achieved the highest accuracy across five topics in both Chinese and English MCQs. It also excelled in management questions conducted in Chinese (all P<0.05). Reasoning ability analysis showed that the four LLMs shared similar reasoning logic. Ignoring key positive history, ignoring key positive signs, misinterpretation medical data, and too aggressive were the most common causes of reasoning errors. Conclusion: DeepSeek-R1 demonstrated superior performance in bilingual complex ophthalmology reasoning tasks than three other state-of-the-art LLMs. While its clinical applicability remains challenging, it shows promise for supporting diagnosis and clinical decision-making.

deepseek, diagnosis, gemini 2, (14 more...)

2502.17947

Country:

Asia > China > Hong Kong > Kowloon (0.04)
North America > United States > Texas > Brazos County > College Station (0.04)
North America > United States > Massachusetts > Hampshire County > Northampton (0.04)
(2 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Health & Medicine > Therapeutic Area > Ophthalmology/Optometry (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (1.00)

From System 1 to System 2: A Survey of Reasoning Large Language Models

Li, Zhong-Zhi, Zhang, Duzhen, Zhang, Ming-Liang, Zhang, Jiaxin, Liu, Zengyan, Yao, Yuxuan, Xu, Haotian, Zheng, Junhao, Wang, Pei-Jie, Chen, Xiuyi, Zhang, Yingying, Yin, Fei, Dong, Jiahua, Guo, Zhijiang, Song, Le, Liu, Cheng-Lin

Achieving human-level intelligence requires refining the transition from the fast, intuitive System 1 to the slower, more deliberate System 2 reasoning. While System 1 excels in quick, heuristic decisions, System 2 relies on logical reasoning for more accurate judgments and reduced biases. Foundational Large Language Models (LLMs) excel at fast decision-making but lack the depth for complex reasoning, as they have not yet fully embraced the step-by-step analysis characteristic of true System 2 thinking. Recently, reasoning LLMs like OpenAI's o1/o3 and DeepSeek's R1 have demonstrated expert-level performance in fields such as mathematics and coding, closely mimicking the deliberate reasoning of System 2 and showcasing human-like cognitive abilities. This survey begins with a brief overview of the progress in foundational LLMs and the early development of System 2 technologies, exploring how their combination has paved the way for reasoning LLMs. Next, we discuss how to construct reasoning LLMs, analyzing their features, the core methods enabling advanced reasoning, and the evolution of various reasoning LLMs. Additionally, we provide an overview of reasoning benchmarks, offering an in-depth comparison of the performance of representative reasoning LLMs. Finally, we explore promising directions for advancing reasoning LLMs and maintain a real-time \href{https://github.com/zzli2022/Awesome-Slow-Reason-System}{GitHub Repository} to track the latest developments. We hope this survey will serve as a valuable resource to inspire innovation and drive progress in this rapidly evolving field.

arxiv preprint arxiv, reasoning llm, zhang, (9 more...)

2502.17419

Country:

Europe > Austria > Vienna (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
(11 more...)

Genre: Overview (1.00)

Industry:

Leisure & Entertainment > Games (1.00)
Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.35)

WIREDFeb-24-2025, 18:43:31 GMT

Anthropic Launches the World's First 'Hybrid Reasoning' AI Model

Anthropic, an artificial intelligence company founded by exiles from OpenAI, has introduced the first AI model that can produce either conventional output or a controllable amount of "reasoning" needed to solve more grueling problems. Anthropic says the new hybrid model, called Claude 3.7, will make it easier for users and developers to tackle problems that require a mix of instinctive output and step-by-step cogitation. "The [user] has a lot of control over the behavior--how long it thinks, and can trade reasoning and intelligence with time and budget," says Michael Gerstenhaber, product lead, AI platform at Anthropic. Claude 3.7 also features a new "scratchpad" that reveals the model's reasoning process. A similar feature proved popular with the Chinese AI model DeepSeek.

large language model, machine learning, natural language, (10 more...)

WIRED

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.34)

AIHubFeb-24-2025, 09:56:59 GMT

Generative AI, online platforms and compensation for content: the need for a new framework

The emergence of generative artificial intelligence has put the issue of compensation for content producers back on the table. Generative AI offers undeniable benefits but raises familiar fears tied to disruptive technologies. Legal battles are already emerging worldwide, with intellectual property owners and AI developers clashing over rights. Alongside these legal and ethical concerns lies the economic question: how should revenues generated by AI be fairly distributed? Individual contributions to AI-generated outputs are often too complex to quantify, making it difficult to apply the principle of proportional remuneration, which holds that payment for an individual work is tied to the revenue it generates.

machine learning, natural language, platform, (15 more...)

AIHub

Genre: Personal > Honors (0.31)

Industry: Law > Intellectual Property & Technology Law (0.52)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Generation (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.93)

Ravera, Anna, Gena, Cristina

On the usability of generative AI: Human generative AI

arXiv.org Artificial IntelligenceFeb-24-2025

Generative AI systems are transforming content creation, but their usability remains a key challenge. This paper examines usability factors such as user experience, transparency, control, and cognitive load. Common challenges include unpredictability and difficulties in fine-tuning outputs. We review evaluation metrics like efficiency, learnability, and satisfaction, highlighting best practices from various domains. Improving interpretability, intuitive interfaces, and user feedback can enhance usability, making generative AI more accessible and effective.

ai system, generanullve ai system, interacnullon, (15 more...)

2502.17714

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
Europe > Italy > Piedmont > Turin Province > Turin (0.04)
North America > United States > New York > New York County > New York City (0.04)
(4 more...)

Genre:

Overview (0.86)
Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Generation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (1.00)

arXiv.org Artificial IntelligenceFeb-24-2025

SFLD: Reducing the content bias for AI-generated Image Detection

Gye, Seoyeon, Ko, Junwon, Shon, Hyounguk, Kwon, Minchan, Kim, Junmo

Identifying AI-generated content is critical for the safe and ethical use of generative AI. Recent research has focused on developing detectors that generalize to unknown generators, with popular methods relying either on high-level features or low-level fingerprints. However, these methods have clear limitations: biased towards unseen content, or vulnerable to common image degradations, such as JPEG compression. T o address these issues, we propose a novel approach, SFLD, which incorporates PatchShuffle to integrate high-level semantic and low-level textural information. SFLD applies PatchShuffle at multiple levels, improving robustness and generalization across various generative models. Additionally, current benchmarks face challenges such as low image quality, insufficient content preservation, and limited class diversity. In response, we introduce TwinSynths, a new benchmark generation methodology that constructs visually near-identical pairs of real and synthetic images to ensure high quality and content preservation. Our extensive experiments and analysis show that SFLD outperforms existing methods on detecting a wide variety of fake images sourced from GANs, diffusion models, and TwinSynths, demonstrating the state-of-the-art performance and generalization capabilities to novel generative models.

benchmark, patch size, sfld, (17 more...)

2502.17105

Country:

Asia > South Korea (0.04)
Asia > Japan > Honshū > Chūbu > Nagano Prefecture > Nagano (0.04)

Genre: Research Report > Promising Solution (0.48)

Industry: Information Technology > Security & Privacy (0.70)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Generation (0.75)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.48)

arXiv.org Artificial IntelligenceFeb-24-2025

Reinforcement Learning for Generative AI: A Survey

Cao, Yuanjiang, Sheng, Quan Z., McAuley, Julian, Yao, Lina

Deep Generative AI has been a long-standing essential topic in the machine learning community, which can impact a number of application areas like text generation and computer vision. The major paradigm to train a generative model is maximum likelihood estimation, which pushes the learner to capture and approximate the target data distribution by decreasing the divergence between the model distribution and the target distribution. This formulation successfully establishes the objective of generative tasks, while it is incapable of satisfying all the requirements that a user might expect from a generative model. Reinforcement learning, serving as a competitive option to inject new training signals by creating new objectives that exploit novel signals, has demonstrated its power and flexibility to incorporate human inductive bias from multiple angles, such as adversarial learning, hand-designed rules and learned reward model to build a performant model. Thereby, reinforcement learning has become a trending research field and has stretched the limits of generative AI in both model design and application. It is reasonable to summarize and conclude advances in recent years with a comprehensive review. Although there are surveys in different application areas recently, this survey aims to shed light on a high-level review that spans a range of application areas. We provide a rigorous taxonomy in this area and make sufficient coverage on various models and applications. Notably, we also surveyed the fast-developing large language model area. We conclude this survey by showing the potential directions that might tackle the limit of current models and expand the frontiers for generative AI.

arxiv preprint arxiv, proceedings, reinforcement learning, (11 more...)

2308.14328

Country:

Europe > Denmark > Capital Region > Copenhagen (0.04)
Oceania > Australia > Victoria > Melbourne (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)
(13 more...)

Genre:

Research Report (1.00)
Overview (1.00)

Industry:

Leisure & Entertainment > Games (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (0.67)
Information Technology (0.67)
Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Generation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (1.00)

arXiv.org Artificial IntelligenceFeb-23-2025

Beyond Release: Access Considerations for Generative AI Systems

Solaiman, Irene, Bommasani, Rishi, Hendrycks, Dan, Herbert-Voss, Ariel, Jernite, Yacine, Skowron, Aviya, Trask, Andrew

Generative AI release decisions determine whether system components are made available, but release does not address many other elements that change how users and stakeholders are able to engage with a system. Beyond release, access to system components informs potential risks and benefits. Access refers to practical needs, infrastructurally, technically, and societally, in order to use available components in some way. We deconstruct access along three axes: resourcing, technical usability, and utility. Within each category, a set of variables per system component clarify tradeoffs. For example, resourcing requires access to computing infrastructure to serve model weights. We also compare the accessibility of four high performance language models, two open-weight and two closed-weight, showing similar considerations for all based instead on access variables. Access variables set the foundation for being able to scale or increase access to users; we examine the scale of access and how scale affects ability to manage and intervene on risks. This framework better encompasses the landscape and risk-benefit tradeoffs of system releases to inform system release decisions, research, and policy.

arxiv, wang, zhang, (16 more...)

2502.16701

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > United States > California > San Francisco County > San Francisco (0.04)
Asia > Myanmar (0.04)
(8 more...)

Genre: Research Report (1.00)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.93)
(4 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.71)