AITopics

2503.04691

Country:

Asia > China > Shanghai > Shanghai (0.04)
North America > United States > Maryland > Montgomery County > Bethesda (0.04)
North America > United States > Florida > Miami-Dade County > Miami (0.04)
(2 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Therapeutic Area > Psychiatry/Psychology (1.00)
Health & Medicine > Therapeutic Area > Internal Medicine (1.00)
Health & Medicine > Therapeutic Area > Genetic Disease (1.00)
(6 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.46)

Bapat, Nachiket U., Paffenroth, Randy C., Cowlagi, Raghvendra V.

Synthetic Data Generation for Minimum-Exposure Navigation in a Time-Varying Environment using Generative AI Models

We study the problem of synthetic generation of samples of environmental features for autonomous vehicle navigation. These features are described by a spatiotemporally varying scalar field that we refer to as a threat field. The threat field is known to have some underlying dynamics subject to process noise. Some "real-world" data of observations of various threat fields are also available. The assumption is that the volume of ``real-world'' data is relatively small. The objective is to synthesize samples that are statistically similar to the data. The proposed solution is a generative artificial intelligence model that we refer to as a split variational recurrent neural network (S-VRNN). The S-VRNN merges the capabilities of a variational autoencoder, which is a widely used generative model, and a recurrent neural network, which is used to learn temporal dependencies in data. The main innovation in this work is that we split the latent space of the S-VRNN into two subspaces. The latent variables in one subspace are learned using the ``real-world'' data, whereas those in the other subspace are learned using the data as well as the known underlying system dynamics. Through numerical experiments we demonstrate that the proposed S-VRNN can synthesize data that are statistically similar to the training data even in the case of very small volume of ``real-world'' training data.

artificial intelligence, machine learning, training data, (18 more...)

2503.06619

Country:

North America > United States > Massachusetts > Worcester County > Worcester (0.04)
North America > United States > California > Santa Clara County > Mountain View (0.04)
Europe > Switzerland (0.04)
(2 more...)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.84)

Li, Peizheng, Aijaz, Adnan

Task-Oriented Connectivity for Networked Robotics with Generative AI and Semantic Communications

--The convergence of robotics, advanced communication networks, and artificial intelligence (AI) holds the promise of transforming industries through fully automated and intelligent operations. In this work, we introduce a novel co-working framework for robots that unifies goal-oriented semantic communication (SemCom) with a Generative AI (GenAI)-agent under a semantic-aware network. Meanwhile, the GenAI-agent leverages generative AI models to interpret high-level task instructions, allocate resources, and adapt to dynamic changes in both network and robotic environments. This agent-driven paradigm ushers in a new level of autonomy and intelligence, enabling complex tasks of networked robots to be conducted with minimal human intervention. We validate our approach through a multi-robot anomaly detection use-case simulation, where robots detect, compress, and transmit relevant information for classification. Simulation results confirm that SemCom significantly reduces data traffic while preserving critical semantic details, and the GenAI-agent ensures task coordination and network adaptation. This synergy provides a robust, efficient, and scalable solution for modern industrial environments.

genai-agent, machine learning, natural language, (19 more...)

2503.06771

Country: Europe (0.04)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Generation (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.83)

Text-to-Image Diffusion Models Cannot Count, and Prompt Refinement Cannot Help

Cao, Yuefan, Guo, Xuyang, Huo, Jiayan, Liang, Yingyu, Shi, Zhenmei, Song, Zhao, Zhang, Jiahao, Zhuang, Zhen

Generative modeling is widely regarded as one of the most essential problems in today's AI community, with text-to-image generation having gained unprecedented real-world impacts. Among various approaches, diffusion models have achieved remarkable success and have become the de facto solution for text-to-image generation. However, despite their impressive performance, these models exhibit fundamental limitations in adhering to numerical constraints in user instructions, frequently generating images with an incorrect number of objects. While several prior works have mentioned this issue, a comprehensive and rigorous evaluation of this limitation remains lacking. To address this gap, we introduce T2ICountBench, a novel benchmark designed to rigorously evaluate the counting ability of state-of-the-art text-to-image diffusion models. Our benchmark encompasses a diverse set of generative models, including both open-source and private systems. It explicitly isolates counting performance from other capabilities, provides structured difficulty levels, and incorporates human evaluations to ensure high reliability. Extensive evaluations with T2ICountBench reveal that all state-of-the-art diffusion models fail to generate the correct number of objects, with accuracy dropping significantly as the number of objects increases. Additionally, an exploratory study on prompt refinement demonstrates that such simple interventions generally do not improve counting accuracy. Our findings highlight the inherent challenges in numerical understanding within diffusion models and point to promising directions for future improvements.

flux 1, gemini 2, generation result, (16 more...)

2503.06884

Country:

North America > United States > Wisconsin > Dane County > Madison (0.04)
North America > United States > Arizona (0.04)
North America > United States > New Mexico > Bernalillo County > Albuquerque (0.04)
(2 more...)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.48)

Boulard, Cécile, Viswanathan, Sruthi, Fey, Wanda, Jacquin, Thierry

Actionable AI: Enabling Non Experts to Understand and Configure AI Systems

Interaction between humans and AI systems raises the question of how people understand AI systems. This has been addressed with explainable AI, the interpretability arising from users' domain expertise, or collaborating with AI in a stable environment. In the absence of these elements, we discuss designing Actionable AI, which allows non-experts to configure black-box agents. In this paper, we experiment with an AI-powered cartpole game and observe 22 pairs of participants to configure it via direct manipulation. Our findings suggest that, in uncertain conditions, non-experts were able to achieve good levels of performance. By influencing the behaviour of the agent, they exhibited an operational understanding of it, which proved sufficient to reach their goals. Based on this, we derive implications for designing Actionable AI systems. In conclusion, we propose Actionable AI as a way to open access to AI-based agents, giving end users the agency to influence such agents towards their own goals.

cartpole, experiment, participant, (13 more...)

2503.06803

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.28)
Asia > Japan > Honshū > Kantō > Kanagawa Prefecture > Yokohama (0.06)
North America > United States > New York > New York County > New York City (0.05)
(8 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Leisure & Entertainment > Games > Computer Games (0.93)
Leisure & Entertainment > Sports (0.67)
Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

Agent models: Internalizing Chain-of-Action Generation into Reasoning models

Zhang, Yuxiang, Yang, Yuqi, Shu, Jiangming, Wen, Xinyan, Sang, Jitao

Traditional agentic workflows rely on external prompts to manage interactions with tools and the environment, which limits the autonomy of reasoning models. We position Large Agent Models (LAMs) that internalize the generation of Chain-of-Action (CoA), enabling the model to autonomously decide when and how to use external tools. Our proposed AutoCoA framework combines supervised fine-tuning (SFT) and reinforcement learning (RL), allowing the model to seamlessly switch between reasoning and action while efficiently managing environment interactions. Main components include step-level action triggering, trajectory-level CoA optimization, and an internal world model to reduce realenvironment interaction costs. Evaluations on open-domain QA tasks demonstrate that AutoCoA-trained agent models significantly outperform ReAct-based workflows in task completion, especially in tasks that require long-term reasoning and multi-step actions. Code and dataset are available at https://github.com/ OpenAI has outlined five progressive stages on the path to Artificial General Intelligence (AGI). The first stage, characterized as Chatbot, is exemplified by Large Language Models (LLMs) like GPT-3.5 and GPT-4 OpenAI (2023). The second stage, termed Reasoner, introduces Large Reasoning Models (LRMs) such as o1 OpenAI (2024) and o3. Recently, OpenAI released Operator OpenAI (2025a) and Deep Research OpenAI (2025b), signaling the arrival of the third stage: Agent. These systems reportedly combine reasoning with autonomous tool usage, enabling independent execution of multi-round workflows by interacting with the real-world environment. It is believed that the technology behind Operator and Deep Research is not merely integrating existing LLMs or LRMs with agentic workflows (e.g., ReAct Yao et al. (2022), Reflexion Shinn et al. (2023)). Instead, it represents a further upgrade in model capabilities: the new models are capable of long-term planning, tool manipulation, and environmental interaction.

reasoning, reasoning model, sequence, (14 more...)

2503.0658

Country:

Asia > China > Beijing > Beijing (0.04)
Asia > Thailand > Bangkok > Bangkok (0.04)

Genre: Workflow (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (1.00)

Generative AI as Digital Media

Abiri, Gilad

Generative AI is frequently portrayed as revolutionary or even apocalyptic, prompting calls for novel regulatory approaches. This essay argues that such views are misguided. Instead, generative AI should be understood as an evolutionary step in the broader algorithmic media landscape, alongside search engines and social media. Like these platforms, generative AI centralizes information control, relies on complex algorithms to shape content, and extensively uses user data, thus perpetuating common problems: unchecked corporate power, echo chambers, and weakened traditional gatekeepers. Regulation should therefore share a consistent objective: ensuring media institutions remain trustworthy. Without trust, public discourse risks fragmenting into isolated communities dominated by comforting, tribal beliefs -- a threat intensified by generative AI's capacity to bypass gatekeepers and personalize truth. Current governance frameworks, such as the EU's AI Act and the US Executive Order 14110, emphasize reactive risk mitigation, addressing measurable threats like national security, public health, and algorithmic bias. While effective for novel technological risks, this reactive approach fails to adequately address broader issues of trust and legitimacy inherent to digital media. Proactive regulation fostering transparency, accountability, and public confidence is essential. Viewing generative AI exclusively as revolutionary risks repeating past regulatory failures that left social media and search engines insufficiently regulated. Instead, regulation must proactively shape an algorithmic media environment serving the public good, supporting quality information and robust civic discourse.

generative ai, platform, regulation, (15 more...)

2503.06523

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > United States > Arizona (0.04)
Europe > Switzerland > Basel-City > Basel (0.04)
(3 more...)

Genre:

Overview (1.00)
Research Report (0.81)

Industry:

Social Sector (1.00)
Media > News (1.00)
Media > Film (1.00)
(7 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Generation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (1.00)

Scarlatos, Alexander, Liu, Naiming, Lee, Jaewook, Baraniuk, Richard, Lan, Andrew

Training LLM-based Tutors to Improve Student Learning Outcomes in Dialogues

arXiv.org Artificial IntelligenceMar-8-2025

Recent advances in generative artificial intelligence (AI), including large language models (LLMs), have opened new possibilities in education and in particular on scaling up personalization. One form of personalization that generative AI powers is interactive learning via tutoring dialogues between AI-powered tutors and students. These interactions have the potential to tailor instruction to each student's needs and progress, while offering personalized feedback, all in real time, in a scalable way. Given the widespread success of human tutors for improving student outcomes [29], many recent works have developed LLM-based tutors, showing promise across various educational domains [15, 25, 30, 32, 33, 39, 42, 50]. Many LLM-based tutors are even deployed in practice, such as Khan Academy's Khanmigo [21] and Carnegie Learning's LiveHint [4]. Several preliminary studies have shown that interacting with LLMs can increase student learning [52], although some have shown that students can develop an over-reliance on LLMs which negatively impacts their learning [23]. Many prior works have focused on improving LLMs' ability to follow effective tutoring principles, adapting them for the tutoring task that they are not pre-trained for. One approach, explored in [46], analyzes the decision-making process underlying human tutor utterances, showing that integrating expert decisions enhances LLM-based tutoring. Another study, [28], examines tutor moves in interactions with an LLM-powered simulated student agent, demonstrating that move annotation data contributes to better tutoring performance.

dialogue, student, tutor utterance, (10 more...)

2503.06424

Country:

North America > United States > Massachusetts > Hampshire County > Amherst (0.04)
North America > United States > Florida > Miami-Dade County > Miami (0.04)
North America > Mexico > Mexico City > Mexico City (0.04)
(7 more...)

Genre:

Research Report > New Finding (0.68)
Instructional Material > Course Syllabus & Notes (0.52)

Industry:

Education > Educational Technology > Educational Software > Computer Based Training (1.00)
Education > Educational Setting (0.88)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.68)

arXiv.org Artificial IntelligenceMar-8-2025

GenAI for Simulation Model in Model-Based Systems Engineering

Zhang, Lin, Zhang, Yuteng, Niyato, Dusit, Ren, Lei, Gu, Pengfei, Chen, Zhen, Laili, Yuanjun, Cai, Wentong, Bruzzone, Agostino

Generative AI (GenAI) has demonstrated remarkable capabilities in code generation, and its integration into complex product modeling and simulation code generation can significantly enhance the efficiency of the system design phase in Model-Based Systems Engineering (MBSE). In this study, we introduce a generative system design methodology framework for MBSE, offering a practical approach for the intelligent generation of simulation models for system physical properties. First, we employ inference techniques, generative models, and integrated modeling and simulation languages to construct simulation models for system physical properties based on product design documents. Subsequently, we fine-tune the language model used for simulation model generation on an existing library of simulation models and additional datasets generated through generative modeling. Finally, we introduce evaluation metrics for the generated simulation models for system physical properties. Our proposed approach to simulation model generation presents the innovative concept of scalable templates for simulation models. Using these templates, GenAI generates simulation models for system physical properties through code completion. The experimental results demonstrate that, for mainstream open-source Transformer-based models, the quality of the simulation model is significantly improved using the simulation model generation method proposed in this paper.

class model, simulation model, transformer-based model, (14 more...)

2503.06422

Country:

Asia > China > Beijing > Beijing (0.04)
Asia > China > Zhejiang Province > Hangzhou (0.04)
North America > United States > Massachusetts > Middlesex County > Burlington (0.04)
(3 more...)

Genre:

Research Report > New Finding (0.68)
Research Report > Promising Solution (0.48)

Industry: Transportation (0.95)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(2 more...)

arXiv.org Artificial IntelligenceMar-8-2025

InterFeedback: Unveiling Interactive Intelligence of Large Multimodal Models via Human Feedback

Zhao, Henry Hengyuan, Pei, Wenqi, Tao, Yifei, Mei, Haiyang, Shou, Mike Zheng

Existing benchmarks do not test Large Multimodal Models (LMMs) on their interactive intelligence with human users, which is vital for developing generalpurpose AI assistants. We design InterFeedback, an interactive framework, which can be applied to any LMM and dataset to assess this ability autonomously. On top of this, we introduce InterFeedback-Bench that evaluates interactive intelligence using two representative datasets, MMMU-Pro and MathVerse, to test 10 different open-source LMMs. Additionally, we present InterFeedback-Human, a newly collected dataset of 120 cases designed for manually testing interactive performance in leading models such as OpenAI-o1 and Claude-3.5-Sonnet. Our evaluation results indicate that even the state-of-the-art LMM, OpenAI-o1, struggles to refine its responses based on human feedback, achieving an average score of less than 50%. Our findings point to the need for methods that can enhance LMMs' capabilities to interpret and benefit from feedback. In this paper, we are curious about the question "Can Large Multimodal Models evolve through Interactive Human Feedback?" It is central to developing general-purpose AI assistants with Large Multimodal Models (LMMs). While these models show exceptional performance on tackling multimodal tasks directly, their ability to interact with humans remains largely unknown. We argue that an LMM functioning as the general assistant should possess two capabilities: 1) exceptional problem-solving ability and 2) the ability to improve itself through feedback (e.g., human feedback, execution results).

incorrect, lmm, zhang, (16 more...)

2502.15027

Country:

North America > Mexico > Mexico City > Mexico City (0.04)
Asia > Singapore (0.04)

Genre: Research Report > New Finding (0.34)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)