Goto

Collaborating Authors

 Instructional Material


The Sample Complexity of Online Reinforcement Learning: A Multi-model Perspective

arXiv.org Machine Learning

We study the sample complexity of online reinforcement learning for nonlinear dynamical systems with continuous state and action spaces. Our analysis accommodates a large class of dynamical systems ranging from a finite set of nonlinear candidate models to models with bounded and Lipschitz continuous dynamics, to systems that are parametrized by a compact and real-valued set of parameters. In the most general setting, our algorithm achieves a policy regret of $\mathcal{O}(N \epsilon^2 + \mathrm{ln}(m(\epsilon))/\epsilon^2)$, where $N$ is the time horizon, $\epsilon$ is a user-specified discretization width, and $m(\epsilon)$ measures the complexity of the function class under consideration via its packing number. In the special case where the dynamics are parametrized by a compact and real-valued set of parameters (such as neural networks, transformers, etc.), we prove a policy regret of $\mathcal{O}(\sqrt{N p})$, where $p$ denotes the number of parameters, recovering earlier sample-complexity results that were derived for linear time-invariant dynamical systems. While this article focuses on characterizing sample complexity, the proposed algorithms are likely to be useful in practice, due to their simplicity, the ability to incorporate prior knowledge, and their benign transient behavior.


Data Mining in Transportation Networks with Graph Neural Networks: A Review and Outlook

arXiv.org Artificial Intelligence

Data mining in transportation networks (DMTNs) refers to using diverse types of spatio-temporal data for various transportation tasks, including pattern analysis, traffic prediction, and traffic controls. Graph neural networks (GNNs) are essential in many DMTN problems due to their capability to represent spatial correlations between entities. Between 2016 and 2024, the notable applications of GNNs in DMTNs have extended to multiple fields such as traffic prediction and operation. However, existing reviews have primarily focused on traffic prediction tasks. To fill this gap, this study provides a timely and insightful summary of GNNs in DMTNs, highlighting new progress in prediction and operation from academic and industry perspectives since 2023. First, we present and analyze various DMTN problems, followed by classical and recent GNN models. Second, we delve into key works in three areas: (1) traffic prediction, (2) traffic operation, and (3) industry involvement, such as Google Maps, Amap, and Baidu Maps. Along these directions, we discuss new research opportunities based on the significance of transportation problems and data availability. Finally, we compile resources such as data, code, and other learning materials to foster interdisciplinary communication. This review, driven by recent trends in GNNs in DMTN studies since 2023, could democratize abundant datasets and efficient GNN methods for various transportation problems including prediction and operation.


Improving Vision-Language-Action Model with Online Reinforcement Learning

arXiv.org Artificial Intelligence

Recent studies have successfully integrated large vision-language models (VLMs) into low-level robotic control by supervised fine-tuning (SFT) with expert robotic datasets, resulting in what we term vision-language-action (VLA) models. Although the VLA models are powerful, how to improve these large models during interaction with environments remains an open question. In this paper, we explore how to further improve these VLA models via Reinforcement Learning (RL), a commonly used fine-tuning technique for large models. However, we find that directly applying online RL to large VLA models presents significant challenges, including training instability that severely impacts the performance of large models, and computing burdens that exceed the capabilities of most local machines. To address these challenges, we propose iRe-VLA framework, which iterates between Reinforcement Learning and Supervised Learning to effectively improve VLA models, leveraging the exploratory benefits of RL while maintaining the stability of supervised learning. Experiments in two simulated benchmarks and a real-world manipulation suite validate the effectiveness of our method.


3D-MoE: A Mixture-of-Experts Multi-modal LLM for 3D Vision and Pose Diffusion via Rectified Flow

arXiv.org Artificial Intelligence

In recent years, 3D instruction-following data has become more common, and with the advent of large language models 3D vision and spatial reasoning have long been (LLMs), a range of multi-modal LLMs (MLLMs) has recognized as preferable for accurately perceiving emerged. Following the success of LLaVA (Liu et al., our three-dimensional world, especially when 2023a) for 2D images, recent approaches (e.g., LEO (Huang compared with traditional visual reasoning based et al., 2024) and ShapeLLM (Qi et al., 2024)) also integrate on 2D images. Due to the difficulties in collecting 3D encoders into LLMs through simple linear projection high-quality 3D data, research in this area has layers. Although these models handle tasks such as 3D only recently gained momentum. With the advent question answering, 3D dialogue, and some embodied tasks, of powerful large language models (LLMs), multimodal they devote relatively little attention to optimizing the LLM LLMs for 3D vision have been developed itself for multi-modal data.


Reviews: Online EXP3 Learning in Adversarial Bandits with Delayed Feedback

Neural Information Processing Systems

However I have a major issue with their proposed algorithms which seems erroneous due to the choice of the learning rate (\eta) which requires the knowledge of the delays (d_t) -- but according to the problem formulation this is unknown to the learner. Then the whole technique seems to be pointless! The optimal learning rate seem to depend on the delays (d_t), e.g. Thm 1, Line-146 etc., but those are unknown to the learner. The claims of the paper stands vacuous if the proposed technique requires the knowledge of delays, where lies the major challenge of the problem addressed.


It's only 30 to learn how to automate your job with AI

PCWorld

TL;DR: Learn how to use AI to simplify your job with the ChatGPT and Automation E-Degree on sale for 29.99 (reg. For many, AI tools kind of came out of nowhere. It can be overwhelming, especially if your industry is rapidly changing, but you don't have to get swept away in the AI sprint. Instead, check out the ChatGPT and Automation E-Degree for beginner-friendly guidance to help you use AI tools to streamline your workload. This course has 12 lectures and about 25 hours of content that you can access anywhere at any time.


CODA: Generalizing to Open and Unseen Domains with Compaction and Disambiguation

Neural Information Processing Systems

Recently, Domain Generalization (DG) has been gaining momentum in enabling machine learning models to generalize to unseen domains. However, most DG methods assume that training and test data share an identical label space, ignoring the potential unseen categories in many real-world applications. In this paper, we delve into a more general but difficult problem termed Open Test-Time DG (OTDG), where both domain shift and open class may occur on the unseen test data. We propose Compaction and Disambiguation (CODA), a novel two-stage framework for learning compact representations and adapting to open classes in the wild. To meaningfully regularize the model's decision boundary, CODA introduces virtual unknown classes and optimizes a new training objective to insert unknowns into the latent space by compacting the embedding space of source known classes.


LLM-powered Multi-agent Framework for Goal-oriented Learning in Intelligent Tutoring System

arXiv.org Artificial Intelligence

Intelligent Tutoring Systems (ITSs) have revolutionized education by offering personalized learning experiences. However, as goal-oriented learning, which emphasizes efficiently achieving specific objectives, becomes increasingly important in professional contexts, existing ITSs often struggle to deliver this type of targeted learning experience. In this paper, we propose GenMentor, an LLM-powered multi-agent framework designed to deliver goal-oriented, personalized learning within ITS. GenMentor begins by accurately mapping learners' goals to required skills using a fine-tuned LLM trained on a custom goal-to-skill dataset. After identifying the skill gap, it schedules an efficient learning path using an evolving optimization approach, driven by a comprehensive and dynamic profile of learners' multifaceted status. Additionally, GenMentor tailors learning content with an exploration-drafting-integration mechanism to align with individual learner needs. Extensive automated and human evaluations demonstrate GenMentor's effectiveness in learning guidance and content quality. Furthermore, we have deployed it in practice and also implemented it as an application. Practical human study with professional learners further highlights its effectiveness in goal alignment and resource targeting, leading to enhanced personalization. Supplementary resources are available at https://github.com/GeminiLight/gen-mentor.


Exploring the Collaborative Co-Creation Process with AI: A Case Study in Novice Music Production

arXiv.org Artificial Intelligence

Artificial intelligence is reshaping creative domains, yet its co-creative processes, especially in group settings with novice users, remain under explored. To bridge this gap, we conducted a case study in a college-level course where nine undergraduate students were tasked with creating three original music tracks using AI tools over 10 weeks. The study spanned the entire creative journey from ideation to releasing these songs on Spotify. Participants leveraged AI for music and lyric production, cover art, and distribution. Our findings highlight how AI transforms creative workflows: accelerating ideation but compressing the traditional preparation stage, and requiring novices to navigate a challenging idea selection and validation phase. We also identified a new "collaging and refinement" stage, where participants creatively combined diverse AI-generated outputs into cohesive works. Furthermore, AI influenced group social dynamics and role division among human creators. Based on these insights, we propose the Human-AI Co-Creation Stage Model and the Human-AI Agency Model, offering new perspectives on collaborative co-creation with AI.


Prompting ChatGPT for Chinese Learning as L2: A CEFR and EBCL Level Study

arXiv.org Artificial Intelligence

The use of chatbots in language learning has evolved significantly since the 1960s, becoming more sophisticated platforms as generative AI emerged. These tools now simulate natural conversations, adapting to individual learners' needs, including those studying Chinese. Our study explores how learners can use specific prompts to engage Large Language Models (LLM) as personalized chatbots, aiming to target their language level based on the Common European Framework of Reference for Languages (CEFR) and the European Benchmarking Chinese Language (EBCL) project. Focusing on A1, A1+ and A2 levels, we examine the teaching of Chinese, which presents unique challenges due to its logographic writing system. Our goal is to develop prompts that integrate oral and written skills, using high-frequency character lists and controlling oral lexical productions. These tools, powered by generative AI, aim to enhance language practice by crossing lexical and sinographic recurrence. While generative AI shows potential as a personalized tutor, further evaluation is needed to assess its effectiveness. We conducted a systematic series of experiments using ChatGPT models to evaluate their adherence to constraints specified in the prompts. The results indicate that incorporating level A1 and A1+ characters, along with the associated reference list, significantly enhances compliance with the EBCL character set. Properly prompted, LLMs can increase exposure to the target language and offer interactive exchanges to develop language skills.