AITopics

2412.07935

Country: Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Scarlatos, Alexander, Baker, Ryan S., Lan, Andrew

Exploring Knowledge Tracing in Tutor-Student Dialogues using LLMs

arXiv.org Artificial IntelligenceDec-10-2024

Recent advances in large language models (LLMs) have led to the development of artificial intelligence (AI)-powered tutoring chatbots, showing promise in providing broad access to high-quality personalized education. Existing works have studied how to make LLMs follow tutoring principles, but have not studied broader uses of LLMs for supporting tutoring. Up until now, tracing student knowledge and analyzing misconceptions has been difficult and time-consuming to implement for open-ended dialogue tutoring. In this work, we investigate whether LLMs can be supportive of this task: we first use LLM prompting methods to identify the knowledge components/skills involved in each dialogue turn, i.e., a tutor utterance posing a task or a student utterance that responds to it. We also evaluate whether the student responds correctly to the tutor and verify the LLM's accuracy using human expert annotations. We then apply a range of knowledge tracing (KT) methods on the resulting labeled data to track student knowledge levels over an entire dialogue. We conduct experiments on two tutoring dialogue datasets, and show that a novel yet simple LLM-based method, LLMKT, significantly outperforms existing KT methods in predicting student response correctness in dialogues. We perform extensive qualitative analyses to highlight the challenges in dialogueKT and outline multiple avenues for future work.

large language model, machine learning, natural language, (17 more...)

2409.1649

Country:

Europe > Switzerland (0.04)
North America > United States > Massachusetts > Hampshire County > Amherst (0.04)
North America > United States > Florida > Miami-Dade County > Miami (0.04)
(10 more...)

Genre: Research Report (1.00)

Industry:

Education > Curriculum > Subject-Specific Education (0.93)
Education > Educational Technology > Educational Software > Computer Based Training (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Kollarčík, Adam, Hanzálek, Zdeněk

POMDP-Based Trajectory Planning for On-Ramp Highway Merging

arXiv.org Artificial IntelligenceDec-10-2024

This paper addresses the trajectory planning problem for automated vehicle on-ramp highway merging. To tackle this challenge, we extend our previous work on trajectory planning at unsignalized intersections using Partially Observable Markov Decision Processes (POMDPs). The method utilizes the Adaptive Belief Tree (ABT) algorithm, an approximate sampling-based approach to solve POMDPs efficiently. We outline the POMDP formulation process, beginning with discretizing the highway topology to reduce problem complexity. Additionally, we describe the dynamics and measurement models used to predict future states and establish the relationship between available noisy measurements and predictions. Building on our previous work, the dynamics model is expanded to account for lateral movements necessary for lane changes during the merging process. We also define the reward function, which serves as the primary mechanism for specifying the desired behavior of the automated vehicle, combining multiple goals such as avoiding collisions or maintaining appropriate velocity. Our simulation results, conducted on three scenarios based on real-life traffic data from German highways, demonstrate the method's ability to generate safe, collision-free, and efficient merging trajectories. This work shows the versatility of this POMDP-based approach in tackling various automated driving problems.

artificial intelligence, machine learning, vehicle, (16 more...)

2412.07567

Country:

Europe > Germany (0.04)
Europe > Czechia > Prague (0.04)

Genre: Research Report (0.64)

Industry:

Transportation > Ground > Road (1.00)
Transportation > Infrastructure & Services (0.88)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

arXiv.org Machine LearningDec-10-2024

Dynamic Classification of Latent Disease Progression with Auxiliary Surrogate Labels

Cai, Zexi, Zeng, Donglin, Marder, Karen S., Honig, Lawrence S., Wang, Yuanjia

Disease progression prediction based on patients' evolving health information is challenging when true disease states are unknown due to diagnostic capabilities or high costs. For example, the absence of gold-standard neurological diagnoses hinders distinguishing Alzheimer's disease (AD) from related conditions such as AD-related dementias (ADRDs), including Lewy body dementia (LBD). Combining temporally dependent surrogate labels and health markers may improve disease prediction. However, existing literature models informative surrogate labels and observed variables that reflect the underlying states using purely generative approaches, limiting the ability to predict future states. We propose integrating the conventional hidden Markov model as a generative model with a time-varying discriminative classification model to simultaneously handle potentially misspecified surrogate labels and incorporate important markers of disease progression. We develop an adaptive forward-backward algorithm with subjective labels for estimation, and utilize the modified posterior and Viterbi algorithms to predict the progression of future states or new patients based on objective markers only. Importantly, the adaptation eliminates the need to model the marginal distribution of longitudinal markers, a requirement in traditional algorithms. Asymptotic properties are established, and significant improvement with finite samples is demonstrated via simulation studies. Analysis of the neuropathological dataset of the National Alzheimer's Coordinating Center (NACC) shows much improved accuracy in distinguishing LBD from AD.

diagnosis, disease state, progression, (17 more...)

arXiv.org Machine Learning

2412.08088

Country:

North America > United States > New York (0.04)
North America > United States > Michigan > Washtenaw County > Ann Arbor (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine > Therapeutic Area > Neurology > Alzheimer's Disease (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.93)

Vision-Based Deep Reinforcement Learning of UAV Autonomous Navigation Using Privileged Information

Wang, Junqiao, Yu, Zhongliang, Zhou, Dong, Shi, Jiaqi, Deng, Runran

The capability of UAVs for efficient autonomous navigation and obstacle avoidance in complex and unknown environments is critical for applications in agricultural irrigation, disaster relief and logistics. In this paper, we propose the DPRL (Distributed Privileged Reinforcement Learning) navigation algorithm, an end-to-end policy designed to address the challenge of high-speed autonomous UAV navigation under partially observable environmental conditions. Our approach combines deep reinforcement learning with privileged learning to overcome the impact of observation data corruption caused by partial observability. We leverage an asymmetric Actor-Critic architecture to provide the agent with privileged information during training, which enhances the model's perceptual capabilities. Additionally, we present a multi-agent exploration strategy across diverse environments to accelerate experience collection, which in turn expedites model convergence. We conducted extensive simulations across various scenarios, benchmarking our DPRL algorithm against the state-of-the-art navigation algorithms. The results consistently demonstrate the superior performance of our algorithm in terms of flight efficiency, robustness and overall success rate.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

2412.06313

Country: Asia > China (0.28)

Genre: Research Report (0.64)

Industry:

Transportation > Air (0.68)
Food & Agriculture > Agriculture (0.48)
Information Technology > Robotics & Automation (0.47)
Energy > Oil & Gas > Upstream (0.34)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles > Drones (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Effective Reward Specification in Deep Reinforcement Learning

Roy, Julien

In the last decade, Deep Reinforcement Learning has evolved into a powerful tool for complex sequential decision-making problems. It combines deep learning's proficiency in processing rich input signals with reinforcement learning's adaptability across diverse control tasks. At its core, an RL agent seeks to maximize its cumulative reward, enabling AI algorithms to uncover novel solutions previously unknown to experts. However, this focus on reward maximization also introduces a significant difficulty: improper reward specification can result in unexpected, misaligned agent behavior and inefficient learning. The complexity of accurately specifying the reward function is further amplified by the sequential nature of the task, the sparsity of learning signals, and the multifaceted aspects of the desired behavior. In this thesis, we survey the literature on effective reward specification strategies, identify core challenges relating to each of these approaches, and propose original contributions addressing the issue of sample efficiency and alignment in deep reinforcement learning. Reward specification represents one of the most challenging aspects of applying reinforcement learning in real-world domains. Our work underscores the absence of a universal solution to this complex and nuanced challenge; solving it requires selecting the most appropriate tools for the specific requirements of each unique application.

artificial intelligence, machine learning, survey article, (20 more...)

2412.07177

Country:

North America > Canada (0.45)
Europe > Switzerland (0.27)

Genre:

Research Report > New Finding (1.00)
Overview (1.00)
Instructional Material > Course Syllabus & Notes (0.67)
Research Report > Promising Solution (0.65)

Industry:

Leisure & Entertainment > Games > Computer Games (1.00)
Information Technology (1.00)
Education (1.00)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.92)
(2 more...)

A Review of Human Emotion Synthesis Based on Generative Technology

Ma, Fei, Li, Yukan, Xie, Yifan, He, Ying, Zhang, Yi, Ren, Hongwei, Liu, Zhou, Yao, Wei, Ren, Fuji, Yu, Fei Richard, Ni, Shiguang

Human emotion synthesis is a crucial aspect of affective computing. It involves using computational methods to mimic and convey human emotions through various modalities, with the goal of enabling more natural and effective human-computer interactions. Recent advancements in generative models, such as Autoencoders, Generative Adversarial Networks, Diffusion Models, Large Language Models, and Sequence-to-Sequence Models, have significantly contributed to the development of this field. However, there is a notable lack of comprehensive reviews in this field. To address this problem, this paper aims to address this gap by providing a thorough and systematic overview of recent advancements in human emotion synthesis based on generative models. Specifically, this review will first present the review methodology, the emotion models involved, the mathematical principles of generative models, and the datasets used. Then, the review covers the application of different generative models to emotion synthesis based on a variety of modalities, including facial images, speech, and text. It also examines mainstream evaluation metrics. Additionally, the review presents some major findings and suggests future research directions, providing a comprehensive understanding of the role of generative technology in the nuanced domain of emotion synthesis.

artificial intelligence, machine learning, natural language, (18 more...)

2412.07116

Country:

Asia > China > Guangdong Province > Shenzhen (0.04)
Europe > Finland > Northern Ostrobothnia > Oulu (0.04)
Asia > Singapore (0.04)
(6 more...)

Genre:

Research Report (1.00)
Overview (1.00)

Industry:

Information Technology (0.67)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Emotion (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.45)

Bolland, Adrien, Lambrechts, Gaspard, Ernst, Damien

Off-Policy Maximum Entropy RL with Future State and Action Visitation Measures

arXiv.org Machine LearningDec-9-2024

We introduce a new maximum entropy reinforcement learning framework based on the distribution of states and actions visited by a policy. More precisely, an intrinsic reward function is added to the reward function of the Markov decision process that shall be controlled. For each state and action, this intrinsic reward is the relative entropy of the discounted distribution of states and actions (or features from these states and actions) visited during the next time steps. We first prove that an optimal exploration policy, which maximizes the expected discounted sum of intrinsic rewards, is also a policy that maximizes a lower bound on the state-action value function of the decision process under some assumptions. We also prove that the visitation distribution used in the intrinsic reward definition is the fixed point of a contraction operator. Following, we describe how to adapt existing algorithms to learn this fixed point and compute the intrinsic rewards to enhance exploration. A new practical off-policy maximum entropy reinforcement learning algorithm is finally introduced. Empirically, exploration policies have good state-action space coverage, and high-performing control policies are computed efficiently.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

arXiv.org Machine Learning

2412.06655

Country:

North America > United States > Illinois > Cook County > Chicago (0.04)
Europe > Portugal > Braga > Braga (0.04)
Europe > Belgium > Wallonia > Liège Province > Liège (0.04)

Genre: Research Report (0.64)

Industry: Energy (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Maximum Entropy (0.82)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Bredell, F., Engelbrecht, H. A., Schoeman, J. C.

Augmenting the action space with conventions to improve multi-agent cooperation in Hanabi

The card game Hanabi is considered a strong medium for the testing and development of multi-agent reinforcement learning (MARL) algorithms, due to its cooperative nature, hidden information, limited communication and remarkable complexity. Previous research efforts have explored the capabilities of MARL algorithms within Hanabi, focusing largely on advanced architecture design and algorithmic manipulations to achieve state-of-the-art performance for a various number of cooperators. However, this often leads to complex solution strategies with high computational cost and requiring large amounts of training data. For humans to solve the Hanabi game effectively, they require the use of conventions, which often allows for a means to implicitly convey ideas or knowledge based on a predefined, and mutually agreed upon, set of ``rules''. Multi-agent problems containing partial observability, especially when limited communication is present, can benefit greatly from the use of implicit knowledge sharing. In this paper, we propose a novel approach to augmenting the action space using conventions, which act as special cooperative actions that span over multiple time steps and multiple agents, requiring agents to actively opt in for it to reach fruition. These conventions are based on existing human conventions, and result in a significant improvement on the performance of existing techniques for self-play and cross-play across a various number of cooperators within Hanabi.

convention, machine learning, reinforcement learning, (16 more...)

2412.06333

Country: Africa > South Africa (0.04)

Genre: Research Report (1.00)

Industry:

Leisure & Entertainment > Games (1.00)
Energy (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.46)

Hanzálek, Adam Kollarčík adn Zdeněk

Parameter Adjustments in POMDP-Based Trajectory Planning for Unsignalized Intersections

This paper investigates the problem of trajectory planning for autonomous vehicles at unsignalized intersections, specifically focusing on scenarios where the vehicle lacks the right of way and yet must cross safely. To address this issue, we have employed a method based on the Partially Observable Markov Decision Processes (POMDPs) framework designed for planning under uncertainty. The method utilizes the Adaptive Belief Tree (ABT) algorithm as an approximate solver for the POMDPs. We outline the POMDP formulation, beginning with discretizing the intersection's topology. Additionally, we present a dynamics model for the prediction of the evolving states of vehicles, such as their position and velocity. Using an observation model, we also describe the connection of those states with the imperfect (noisy) available measurements. Our results confirmed that the method is able to plan collision-free trajectories in a series of simulations utilizing real-world traffic data from aerial footage of two distinct intersections. Furthermore, we studied the impact of parameter adjustments of the ABT algorithm on the method's performance. This provides guidance in determining reasonable parameter settings, which is valuable for future method applications.

artificial intelligence, machine learning, vehicle, (16 more...)

doi: 10.5220/0012742400003702

2412.06405

Country: Europe > Czechia > Prague (0.05)

Genre: Research Report > New Finding (0.48)

Industry:

Transportation > Ground > Road (0.30)
Transportation > Infrastructure & Services (0.30)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)