Fuzzy Logic
COM Adjustment Mechanism Control for Multi-Configuration Motion Stability of Unmanned Deformable Vehicle
Liu, Jun, Liu, Hongxun, Zhang, Cheng, Xing, Jiandang, Jiang, Shang, Jiang, Ping
An unmanned deformable vehicle is a wheel -legged robot that can transform between two configurations: a vehicular state and a humanoid state, which have different motion modes and stability characteristics. Aiming at the motion stability of an unmanned deformable vehicle in multiple configurations, a center -of -mass adjustment mechanism was designed in this study. Further, a motion stability hierarchical control algorithm was proposed based on this mechanism, and an electromechanical model based on a two -degree-of -freedom center -of -mass adjustment mechanism was established. An unmanned -deformable-vehicle vehicular-state steady -state steering dynamics model and a gait planning kinematic model of humanoid state walking were established. A stability hierarchical control strategy was designed b ased on the hybrid automata model, Fuzzy -PID control, K -means clustering algorithm, and variable universe fuzzy control - active disturbance rejection control (VUFC -ADRC) to realize the stability control of the unmanned deformable vehicle in multi -configuration motion. The simulation and test results showed that the steady-state steering stabi lity in the vehicular state and the walking stability in the humanoid state could be significantly improved by controlling the slider motion in the center-of -mass adjustment mechanism.
Agnostic Q -learning with Function Approximation in Deterministic Systems: Near-Optimal Bounds on Approximation Error and Sample Complexity
The current paper studies the problem of agnostic Q -learning with function approximation in deterministic systems where the optimal Q -function is approximable by a function in the class \mathcal{F} with approximation error \delta \ge 0 . We propose a novel recursion-based algorithm and show that if \delta O\left(\rho/\sqrt{\dim_E}\right), then one can find the optimal policy using O(\dim_E) trajectories, where \rho is the gap between the optimal Q -value of the best actions and that of the second-best actions and \dim_E is the Eluder dimension of \mathcal{F} . Our result has two implications: \begin{enumerate} \item In conjunction with the lower bound in [Du et al., 2020], our upper bound suggests that the condition \delta \widetilde{\Theta}\left(\rho/\sqrt{\dim_E}\right) is necessary and sufficient for algorithms with polynomial sample complexity. We further extend our algorithm to the stochastic reward setting and obtain similar results.
A Nearly Optimal and Low-Switching Algorithm for Reinforcement Learning with General Function Approximation
The exploration-exploitation dilemma has been a central challenge in reinforcement learning (RL) with complex model classes. In this paper, we propose a new algorithm, Monotonic Q-Learning with Upper Confidence Bound (MQL-UCB) for RL with general function approximation. Our key algorithmic design includes (1) a general deterministic policy-switching strategy that achieves low switching cost, (2) a monotonic value function structure with carefully controlled function class complexity, and (3) a variance-weighted regression scheme that exploits historical trajectories with high data efficiency. MQL-UCB achieves minimax optimal regret of \tilde{O}(d\sqrt{HK}) when K is sufficiently large and near-optimal policy switching cost of \tilde{O}(dH), with d being the eluder dimension of the function class, H being the planning horizon, and K being the number of episodes.
On Reward-Free Reinforcement Learning with Linear Function Approximation
Reward-free reinforcement learning (RL) is a framework which is suitable for both the batch RL setting and the setting where there are many reward functions of interest. During the exploration phase, an agent collects samples without using a pre-specified reward function. After the exploration phase, a reward function is given, and the agent uses samples collected during the exploration phase to compute a near-optimal policy. Jin et al. [2020] showed that in the tabular setting, the agent only needs to collect polynomial number of samples (in terms of the number states, the number of actions, and the planning horizon) for reward-free RL. However, in practice, the number of states and actions can be large, and thus function approximation schemes are required for generalization.
Provably Efficient Q-learning with Function Approximation via Distribution Shift Error Checking Oracle
Q-learning with function approximation is one of the most popular methods in reinforcement learning. Though the idea of using function approximation was proposed at least 60 years ago, even in the simplest setup, i.e, approximating Q-functions with linear functions, it is still an open problem how to design a provably efficient algorithm that learns a near-optimal policy. The key challenges are how to efficiently explore the state space and how to decide when to stop exploring in conjunction with the function approximation scheme. The current paper presents a provably efficient algorithm for Q-learning with linear function approximation. Under certain regularity assumptions, our algorithm, Difference Maximization Q-learning, combined with linear function approximation, returns a near-optimal policy using polynomial number of trajectories.
Randomized Exploration for Reinforcement Learning with Multinomial Logistic Function Approximation
We study reinforcement learning with _multinomial logistic_ (MNL) function approximation where the underlying transition probability kernel of the _Markov decision processes_ (MDPs) is parametrized by an unknown transition core with features of state and action. For the finite horizon episodic setting with inhomogeneous state transitions, we propose provably efficient algorithms with randomized exploration having frequentist regret guarantees. Here, d is the dimension of the transition core, H is the horizon length, T is the total number of steps, and \kappa is a problem-dependent constant. Despite the simplicity and practicality of \texttt{RRL-MNL}, its regret bound scales with \kappa {-1}, which is potentially large in the worst case. To improve the dependence on \kappa {-1}, we propose \texttt{ORRL-MNL}, which estimates the value function using local gradient information of the MNL transition model.
Provably Efficient Reinforcement Learning with Kernel and Neural Function Approximations
Reinforcement learning (RL) algorithms combined with modern function approximators such as kernel functions and deep neural networks have achieved significant empirical successes in large-scale application problems with a massive number of states. From a theoretical perspective, however, RL with functional approximation poses a fundamental challenge to developing algorithms with provable computational and statistical efficiency, due to the need to take into consideration both the exploration-exploitation tradeoff that is inherent in RL and the bias-variance tradeoff that is innate in statistical estimation. To address such a challenge, focusing on the episodic setting where the action-value functions are represented by a kernel function or over-parametrized neural network, we propose the first provable RL algorithm with both polynomial runtime and sample complexity, without additional assumptions on the data-generating model. In particular, for both the kernel and neural settings, we prove that an optimistic modification of the least-squares value iteration algorithm incurs an \tilde{\mathcal{O}}(\delta_{\cF} H 2 \sqrt{T}) regret, where \delta_{\cF} characterizes the intrinsic complexity of the function class \cF, H is the length of each episode, and T is the total number of episodes. Our regret bounds are independent of the number of states and therefore even allows it to diverge, which exhibits the benefit of function approximation.
Provably and Practically Efficient Adversarial Imitation Learning with General Function Approximation
As a prominent category of imitation learning methods, adversarial imitation learning (AIL) has garnered significant practical success powered by neural network approximation. However, existing theoretical studies on AIL are primarily limited to simplified scenarios such as tabular and linear function approximation and involve complex algorithmic designs that hinder practical implementation, highlighting a gap between theory and practice. In this paper, we explore the theoretical underpinnings of online AIL with general function approximation. We introduce a new method called optimization-based AIL (OPT-AIL), which centers on performing online optimization for reward functions and optimism-regularized Bellman error minimization for Q-value functions. Theoretically, we prove that OPT-AIL achieves polynomial expert sample complexity and interaction complexity for learning near-expert policies.
Provably Efficient Reinforcement Learning with Multinomial Logit Function Approximation
We study a new class of MDPs that employs multinomial logit (MNL) function approximation to ensure valid probability distributions over the state space. Despite its significant benefits, incorporating the non-linear function raises substantial challenges in both *statistical* and *computational* efficiency. The best-known result of Hwang and Oh [2023] has achieved an \widetilde{\mathcal{O}}(\kappa {-1}dH 2\sqrt{K}) regret upper bound, where \kappa is a problem-dependent quantity, d is the feature dimension, H is the episode length, and K is the number of episodes. However, we observe that \kappa {-1} exhibits polynomial dependence on the number of reachable states, which can be as large as the state space size in the worst case and thus undermines the motivation for function approximation. Additionally, their method requires storing all historical data and the time complexity scales linearly with the episode count, which is computationally expensive. In this work, we propose a statistically efficient algorithm that achieves a regret of \widetilde{\mathcal{O}}(dH 2\sqrt{K} \kappa {-1}d 2H 2), eliminating the dependence on \kappa {-1} in the dominant term for the first time.
Indoor Air Quality Detection Robot Model Based on the Internet of Things (IoT)
Simamora, Anggiat Mora, Denih, Asep, Suriansyah, Mohamad Iqbal
This paper presents the design, implementation, and evaluation of an IoT-based robotic system for mapping and monitoring indoor air quality. The primary objective was to develop a mobile robot capable of autonomously mapping a closed environment, detecting concentrations of CO$_2$, volatile organic compounds (VOCs), smoke, temperature, and humidity, and transmitting real-time data to a web interface. The system integrates a set of sensors (SGP30, MQ-2, DHT11, VL53L0X, MPU6050) with an ESP32 microcontroller. It employs a mapping algorithm for spatial data acquisition and utilizes a Mamdani fuzzy logic system for air quality classification. Empirical tests in a model room demonstrated average localization errors below $5\%$, actuator motion errors under $2\%$, and sensor measurement errors within $12\%$ across all modalities. The contributions of this work include: (1) a low-cost, integrated IoT robotic platform for simultaneous mapping and air quality detection; (2) a web-based user interface for real-time visualization and control; and (3) validation of system accuracy under laboratory conditions.