AITopics | Huang, Yu-Kai

Collaborating Authors

Huang, Yu-Kai

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

ModeSeq: Taming Sparse Multimodal Motion Prediction with Sequential Mode Modeling

Zhou, Zikang, Zhou, Hengjian, Hu, Haibo, Wen, Zihao, Wang, Jianping, Li, Yung-Hui, Huang, Yu-Kai

arXiv.org Artificial IntelligenceNov-17-2024

Anticipating the multimodality of future events lays the foundation for safe autonomous driving. However, multimodal motion prediction for traffic agents has been clouded by the lack of multimodal ground truth. Existing works predominantly adopt the winner-take-all training strategy to tackle this challenge, yet still suffer from limited trajectory diversity and misaligned mode confidence. While some approaches address these limitations by generating excessive trajectory candidates, they necessitate a post-processing stage to identify the most representative modes, a process lacking universal principles and compromising trajectory accuracy. We are thus motivated to introduce ModeSeq, a new multimodal prediction paradigm that models modes as sequences. Unlike the common practice of decoding multiple plausible trajectories in one shot, ModeSeq requires motion decoders to infer the next mode step by step, thereby more explicitly capturing the correlation between modes and significantly enhancing the ability to reason about multimodality. Leveraging the inductive bias of sequential mode prediction, we also propose the Early-Match-Take-All (EMTA) training strategy to diversify the trajectories further. Without relying on dense mode prediction or rule-based trajectory selection, ModeSeq considerably improves the diversity of multimodal output while attaining satisfactory trajectory accuracy, resulting in balanced performance on motion prediction benchmarks. Moreover, ModeSeq naturally emerges with the capability of mode extrapolation, which supports forecasting more behavior modes when the future is highly uncertain.

artificial intelligence, machine learning, prediction, (18 more...)

arXiv.org Artificial Intelligence

2411.11911

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.48)

Add feedback

BehaviorGPT: Smart Agent Simulation for Autonomous Driving with Next-Patch Prediction

Zhou, Zikang, Hu, Haibo, Chen, Xinhong, Wang, Jianping, Guan, Nan, Wu, Kui, Li, Yung-Hui, Huang, Yu-Kai, Xue, Chun Jason

arXiv.org Artificial IntelligenceMay-27-2024

Simulating realistic interactions among traffic agents is crucial for efficiently validating the safety of autonomous driving systems. Existing leading simulators primarily use an encoder-decoder structure to encode the historical trajectories for future simulation. However, such a paradigm complicates the model architecture, and the manual separation of history and future trajectories leads to low data utilization. To address these challenges, we propose Behavior Generative Pre-trained Transformers (BehaviorGPT), a decoder-only, autoregressive architecture designed to simulate the sequential motion of multiple agents. Crucially, our approach discards the traditional separation between "history" and "future," treating each time step as the "current" one, resulting in a simpler, more parameter- and data-efficient design that scales seamlessly with data and computation. Additionally, we introduce the Next-Patch Prediction Paradigm (NP3), which enables models to reason at the patch level of trajectories and capture long-range spatial-temporal interactions. BehaviorGPT ranks first across several metrics on the Waymo Sim Agents Benchmark, demonstrating its exceptional performance in multi-agent and agent-map interactions. We outperformed state-of-the-art models with a realism score of 0.741 and improved the minADE metric to 1.540, with an approximately 91.6% reduction in model parameters.

artificial intelligence, behaviorgpt, natural language, (13 more...)

arXiv.org Artificial Intelligence

2405.17372

Genre: Research Report (1.00)

Industry:

Transportation > Ground > Road (0.71)
Automobiles & Trucks (0.71)
Information Technology > Robotics & Automation (0.61)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)

Add feedback

QCNeXt: A Next-Generation Framework For Joint Multi-Agent Trajectory Prediction

Zhou, Zikang, Wen, Zihao, Wang, Jianping, Li, Yung-Hui, Huang, Yu-Kai

arXiv.org Artificial IntelligenceJun-18-2023

Estimating the joint distribution of on-road agents' future trajectories is essential for autonomous driving. In this technical report, we propose a next-generation framework for joint multi-agent trajectory prediction called QCNeXt. First, we adopt the query-centric encoding paradigm for the task of joint multi-agent trajectory prediction. Powered by this encoding scheme, our scene encoder is equipped with permutation equivariance on the set elements, roto-translation invariance in the space dimension, and translation invariance in the time dimension. These invariance properties not only enable accurate multi-agent forecasting fundamentally but also empower the encoder with the capability of streaming processing. Second, we propose a multi-agent DETR-like decoder, which facilitates joint multi-agent trajectory prediction by modeling agents' interactions at future time steps. For the first time, we show that a joint prediction model can outperform marginal prediction models even on the marginal metrics, which opens up new research opportunities in trajectory prediction. Our approach ranks 1st on the Argoverse 2 multi-agent motion forecasting benchmark, winning the championship of the Argoverse Challenge at the CVPR 2023 Workshop on Autonomous Driving.

agent, artificial intelligence, prediction, (18 more...)

arXiv.org Artificial Intelligence

2306.10508

Country: Asia > China (0.15)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)

Add feedback

Modeling Melodic Feature Dependency with Modularized Variational Auto-Encoder

Wang, Yu-An, Huang, Yu-Kai, Lin, Tzu-Chuan, Su, Shang-Yu, Chen, Yun-Nung

arXiv.org Artificial IntelligenceOct-31-2018

Automatic melody generation has been a long-time aspiration for both AI researchers and musicians. However, learning to generate euphonious melodies has turned out to be highly challenging. This paper introduces 1) a new variant of variational autoencoder (VAE), where the model structure is designed in a modularized manner in order to model polyphonic and dynamic music with domain knowledge, and 2) a hierarchical encoding/decoding strategy, which explicitly models the dependency between melodic features. The proposed framework is capable of generating distinct melodies that sounds natural, and the experiments for evaluating generated music clips show that the proposed model outperforms the baselines in human evaluation.

arxiv preprint arxiv, deep learning, neural network, (17 more...)

arXiv.org Artificial Intelligence

1811.00162

Country: Asia > Taiwan (0.14)

Genre: Research Report (1.00)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback