behavior model
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > United States > Indiana > Tippecanoe County > West Lafayette (0.04)
- North America > United States > Indiana > Tippecanoe County > Lafayette (0.04)
- (2 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.94)
Toward Efficient and Robust Behavior Models for Multi-Agent Driving Simulation
Konstantinidis, Fabian, Sackmann, Moritz, Hofmann, Ulrich, Stiller, Christoph
Scalable multi-agent driving simulation requires behavior models that are both realistic and computationally efficient. We address this by optimizing the behavior model that controls individual traffic participants. To improve efficiency, we adopt an instance-centric scene representation, where each traffic participant and map element is modeled in its own local coordinate frame. This design enables efficient, viewpoint-invariant scene encoding and allows static map tokens to be reused across simulation steps. To model interactions, we employ a query-centric symmetric context encoder with relative positional encodings between local frames. We use Adversarial Inverse Reinforcement Learning to learn the behavior model and propose an adaptive reward transformation that automatically balances robustness and realism during training. Experiments demonstrate that our approach scales efficiently with the number of tokens, significantly reducing training and inference times, while outperforming several agent-centric baselines in terms of positional accuracy and robustness.
- Europe > Germany > Baden-Württemberg > Karlsruhe Region > Karlsruhe (0.04)
- North America > United States (0.04)
- Asia > Middle East > Republic of Türkiye > Karaman Province > Karaman (0.04)
- (2 more...)
- Information Technology > Artificial Intelligence > Robots (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)
Nested-ReFT: Efficient Reinforcement Learning for Large Language Model Fine-Tuning via Off-Policy Rollouts
Heuillet, Maxime, Cui, Yufei, Chen, Boxing, Durand, Audrey, Parthasarathi, Prasanna
Advanced reasoning in LLMs on challenging domains like mathematical reasoning can be tackled using verifiable rewards based reinforced fine-tuning (ReFT). In standard ReFT frameworks, a behavior model generates multiple completions with answers per problem, for the answer to be then scored by a reward function. While such RL post-training methods demonstrate significant performance improvements across challenging reasoning domains, the computational cost of generating completions during training with multiple inference steps makes the training cost non-trivial. To address this, we draw inspiration from off-policy RL, and speculative decoding to introduce a novel ReFT framework, dubbed Nested-ReFT, where a subset of layers of the target model acts as the behavior model to generate off-policy completions during training. The behavior model configured with dynamic layer skipping per batch during training decreases the inference cost compared to the standard ReFT frameworks. Our theoretical analysis shows that Nested-ReFT yields unbiased gradient estimates with controlled variance. Our empirical analysis demonstrates improved computational efficiency measured as tokens/sec across multiple math reasoning benchmarks and model sizes. Additionally, we explore three variants of bias mitigation to minimize the off-policyness in the gradient updates that allows for maintaining performance that matches the baseline ReFT performance.
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.89)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.65)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)
- Information Technology > Artificial Intelligence > Robots (0.67)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)
- Information Technology > Artificial Intelligence > Robots (0.67)
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > United States > Indiana > Tippecanoe County > West Lafayette (0.04)
- North America > United States > Indiana > Tippecanoe County > Lafayette (0.04)
- (2 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
DigiT4TAF -- Bridging Physical and Digital Worlds for Future Transportation Systems
Zipfl, Maximilian, Zwick, Pascal, Schulz, Patrick, Zofka, Marc Rene, Schotschneider, Albert, Gremmelmaier, Helen, Polley, Nikolai, Mütsch, Ferdinand, Simon, Kevin, Gottselig, Fabian, Frey, Michael, Marschall, Sergio, Stark, Akim, Müller, Maximilian, Wehmer, Marek, Kocsis, Mihai, Waldenmayer, Dominic, Schnepf, Florian, Heinrich, Erik, Pletz, Sabrina, Kölle, Matthias, Langbein-Euchner, Karin, Viehl, Alexander, Zöllner, Raoul, Zöllner, J. Marius
In the future, mobility will be strongly shaped by the increasing use of digitalization. Not only will individual road users be highly interconnected, but also the road and associated infrastructure. At that point, a Digital Twin becomes particularly appealing because, unlike a basic simulation, it offers a continuous, bilateral connection linking the real and virtual environments. This paper describes the digital reconstruction used to develop the Digital Twin of the Test Area Autonomous Driving-Baden-Württemberg (TAF-BW), Germany. The TAF-BW offers a variety of different road sections, from high-traffic urban intersections and tunnels to multilane motorways. The test area is equipped with a comprehensive Vehicle-to-Everything (V2X) communication infrastructure and multiple intelligent intersections equipped with camera sensors to facilitate real-time traffic flow monitoring. The generation of authentic data as input for the Digital Twin was achieved by extracting object lists at the intersections. This process was facilitated by the combined utilization of camera images from the intelligent infrastructure and LiDAR sensors mounted on a test vehicle. Using a unified interface, recordings from real-world detections of traffic participants can be resimulated. Additionally, the simulation framework's design and the reconstruction process is discussed. The resulting framework is made publicly available for download and utilization at: https://digit4taf-bw.fzi.de The demonstration uses two case studies to illustrate the application of the digital twin and its interfaces: the analysis of traffic signal systems to optimize traffic flow and the simulation of security-related scenarios in the communications sector.
- Europe > Germany > Baden-Württemberg > Karlsruhe Region > Karlsruhe (0.05)
- Europe > Greece (0.04)
- Europe > Germany > Baden-Württemberg > Stuttgart Region > Stuttgart (0.04)
- Transportation > Infrastructure & Services (1.00)
- Transportation > Ground > Road (1.00)
Off-Policy Evaluation of Ranking Policies via Embedding-Space User Behavior Modeling
Takahashi, Tatsuki, Maru, Chihiro, Shoji, Hiroko
Off-policy evaluation (OPE) in ranking settings with large ranking action spaces, which stems from an increase in both the number of unique actions and length of the ranking, is essential for assessing new recommender policies using only logged bandit data from previous versions. To address the high variance issues associated with existing estimators, we introduce two new assumptions: no direct effect on rankings and user behavior model on ranking embedding spaces. We then propose the generalized marginalized inverse propensity score (GMIPS) estimator with statistically desirable properties compared to existing ones. Finally, we demonstrate that the GMIPS achieves the lowest MSE. Notably, among GMIPS variants, the marginalized reward interaction IPS (MRIPS) incorporates a doubly marginalized importance weight based on a cascade behavior assumption on ranking embeddings. MRIPS effectively balances the trade-off between bias and variance, even as the ranking action spaces increase and the above assumptions may not hold, as evidenced by our experiments.
AI2-Active Safety: AI-enabled Interaction-aware Active Safety Analysis with Vehicle Dynamics
Wu, Keshu, Li, Zihao, Li, Sixu, Ye, Xinyue, Lord, Dominique, Zhou, Yang
This paper introduces an AI-enabled, interaction-aware active safety analysis framework that accounts for groupwise vehicle interactions. Specifically, the framework employs a bicycle model-augmented with road gradient considerations-to accurately capture vehicle dynamics. In parallel, a hypergraph-based AI model is developed to predict probabilistic trajectories of ambient traffic. By integrating these two components, the framework derives vehicle intra-spacing over a 3D road surface as the solution of a stochastic ordinary differential equation, yielding high-fidelity surrogate safety measures such as time-to-collision (TTC). To demonstrate its effectiveness, the framework is analyzed using stochastic numerical methods comprising 4th-order Runge-Kutta integration and AI inference, generating probability-weighted high-fidelity TTC (HF-TTC) distributions that reflect complex multi-agent maneuvers and behavioral uncertainties. Evaluated with HF-TTC against traditional constant-velocity TTC and non-interaction-aware approaches on highway datasets, the proposed framework offers a systematic methodology for active safety analysis with enhanced potential for improving safety perception in complex traffic environments.
- Pacific Ocean > North Pacific Ocean > San Francisco Bay (0.04)
- North America > United States > Texas > Brazos County > College Station (0.04)
- North America > United States > California > San Francisco County > San Francisco (0.04)
- (4 more...)
- Transportation > Ground > Road (1.00)
- Automobiles & Trucks (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)
- Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
CHARMS: A Cognitive Hierarchical Agent for Reasoning and Motion Stylization in Autonomous Driving
Wang, Jingyi, Chu, Duanfeng, Deng, Zejian, Lu, Liping, Wang, Jinxiang, Sun, Chen
To address the limitations of these approaches, we propose CHARMS, a decision-making model based on Level-k game theory [20]. The distinction between our approach and the existing methods is illustrated in Figure 1. CHARMS incorporates cognitive hierarchy theory to model diverse reasoning depths among agents, coupled with Social V alue Orientation (SVO) to capture individual preferences in driving behavior. We employ a two-stage training process consisting of reinforcement learning pretraining and supervised fine-tuning (SFT) to generate decision-making models that exhibit a wide range of human-like driving styles. Additionally, we integrate Poisson cognitive hierarchy (PCH) theory to enable CHARMS to generate more complex simulation scenarios with diverse vehicle styles. The main contributions of this paper can be summarized as follows. A behavior model integrating Level-k reasoning and SVO is proposed to simulate cognitively diverse driving styles. A two-stage training scheme (DRL + SFT) ensures both style distinctiveness and behavioral realism. A scenario generation method based on PCH theory is used to control driving style distributions, with the aim of creating more realistic and behaviorally diverse simulation scenarios.
- Asia > China > Hubei Province > Wuhan (0.05)
- Asia > China > Hong Kong (0.04)
- Asia > Middle East > Republic of Türkiye > Karaman Province > Karaman (0.04)
- Asia > China > Jiangsu Province > Nanjing (0.04)
- Transportation > Ground > Road (1.00)
- Automobiles & Trucks (1.00)