Xu, Yaosheng
NeuralMOVES: A lightweight and microscopic vehicle emission estimation model based on reverse engineering and surrogate learning
Ramirez-Sanchez, Edgar, Tang, Catherine, Xu, Yaosheng, Renganathan, Nrithya, Jayawardana, Vindula, He, Zhengbing, Wu, Cathy
This significant contribution makes it a critical sector for climate change mitigation, as reducing emissions from transportation is essential for achieving global climate goals. The sector's transformation through electrification, automation, and intelligent infrastructure offers promising avenues for substantial emissions reductions (Sciarretta et al., 2020; International Energy Agency, 2023; McKinsey Center for Future Mobility, 2023). However, the success of these innovations is critically dependent on the availability of suitable and accurate emission estimation models to guide the design and deployment of new technologies. Motor Vehicle Emission Simulation (MOVES) (U.S. Environmental Protection Agency, 2022), one of the most well-established emission estimation models, serves as the official and state-of-the-art emission estimation model in the U.S., provided, enforced, and maintained by the U.S. Environmental Protection Agency (EPA). Despite its technical certification, MOVES' processing and software is tailored for two specific governmental uses: State Implementation Plans and Conformity Analyses U.S. Environmental Protection Agency (2021), which are for states to achieve and maintain air quality standards; and its use beyond trained practitioners and these specific analyses poses two main limitations. First, a steep learning curve, computational demands, and complex inputs make it difficult for researchers and practitioners to use. In particular, MOVES has rigid input requirements, including a combination of toggle-based settings within its GUI and structured input files in specific formats. Second, MOVES is tailored for macroscopic analysis and is unsuitable for microscopic applications, such as control and optimization, which commonly require second-by-second emission calculations for individual actions and vehicles.
Target Entropy Annealing for Discrete Soft Actor-Critic
Xu, Yaosheng, Hu, Dailin, Liang, Litian, McAleer, Stephen, Abbeel, Pieter, Fox, Roy
Soft Actor-Critic (SAC) is considered the state-of-the-art algorithm in continuous action space settings. It uses the maximum entropy framework for efficiency and stability, and applies a heuristic temperature Lagrange term to tune the temperature $\alpha$, which determines how "soft" the policy should be. It is counter-intuitive that empirical evidence shows SAC does not perform well in discrete domains. In this paper we investigate the possible explanations for this phenomenon and propose Target Entropy Scheduled SAC (TES-SAC), an annealing method for the target entropy parameter applied on SAC. Target entropy is a constant in the temperature Lagrange term and represents the target policy entropy in discrete SAC. We compare our method on Atari 2600 games with different constant target entropy SAC, and analyze on how our scheduling affects SAC.
Temporal-Difference Value Estimation via Uncertainty-Guided Soft Updates
Liang, Litian, Xu, Yaosheng, McAleer, Stephen, Hu, Dailin, Ihler, Alexander, Abbeel, Pieter, Fox, Roy
Temporal-Difference (TD) learning methods, such as Q-Learning, have proven effective at learning a policy to perform control tasks. One issue with methods like Q-Learning is that the value update introduces bias when predicting the TD target of a unfamiliar state. Estimation noise becomes a bias after the max operator in the policy improvement step, and carries over to value estimations of other states, causing Q-Learning to overestimate the Q value. Algorithms like Soft Q-Learning (SQL) introduce the notion of a soft-greedy policy, which reduces the estimation bias via soft updates in early stages of training. However, the inverse temperature $\beta$ that controls the softness of an update is usually set by a hand-designed heuristic, which can be inaccurate at capturing the uncertainty in the target estimate. Under the belief that $\beta$ is closely related to the (state dependent) model uncertainty, Entropy Regularized Q-Learning (EQL) further introduces a principled scheduling of $\beta$ by maintaining a collection of the model parameters that characterizes model uncertainty. In this paper, we present Unbiased Soft Q-Learning (UQL), which extends the work of EQL from two action, finite state spaces to multi-action, infinite state space Markov Decision Processes. We also provide a principled numerical scheduling of $\beta$, extended from SQL and using model uncertainty, during the optimization process. We show the theoretical guarantees and the effectiveness of this update method in experiments on several discrete control environments.