Goto

Collaborating Authors

 Fuzzy Logic


An Actor-Critic Algorithm with Function Approximation for Risk Sensitive Cost Markov Decision Processes

arXiv.org Machine Learning

In this paper, we consider the risk-sensitive cost criterion with exponentiated costs for Markov decision processes and develop a model-free policy gradient algorithm in this setting. Unlike additive cost criteria such as average or discounted cost, the risk-sensitive cost criterion is less studied due to the complexity resulting from the multiplicative structure of the resulting Bellman equation. We develop an actor-critic algorithm with function approximation in this setting and provide its asymptotic convergence analysis. We also show the results of numerical experiments that demonstrate the superiority in performance of our algorithm over other recent algorithms in the literature.


Robot Deformable Object Manipulation via NMPC-generated Demonstrations in Deep Reinforcement Learning

arXiv.org Artificial Intelligence

In this work, we conducted research on deformable object manipulation by robots based on demonstration-enhanced reinforcement learning (RL). To improve the learning efficiency of RL, we enhanced the utilization of demonstration data from multiple aspects and proposed the HGCR-DDPG algorithm. It uses a novel high-dimensional fuzzy approach for grasping-point selection, a refined behavior-cloning method to enhance data-driven learning in Rainbow-DDPG, and a sequential policy-learning strategy. Compared to the baseline algorithm (Rainbow-DDPG), our proposed HGCR-DDPG achieved 2.01 times the global average reward and reduced the global average standard deviation to 45% of that of the baseline algorithm. To reduce the human labor cost of demonstration collection, we proposed a low-cost demonstration collection method based on Nonlinear Model Predictive Control (NMPC). Simulation experiment results show that demonstrations collected through NMPC can be used to train HGCR-DDPG, achieving comparable results to those obtained with human demonstrations. To validate the feasibility of our proposed methods in real-world environments, we conducted physical experiments involving deformable object manipulation. We manipulated fabric to perform three tasks: diagonal folding, central axis folding, and flattening. The experimental results demonstrate that our proposed method achieved success rates of 83.3%, 80%, and 100% for these three tasks, respectively, validating the effectiveness of our approach. Compared to current large-model approaches for robot manipulation, the proposed algorithm is lightweight, requires fewer computational resources, and offers task-specific customization and efficient adaptability for specific tasks.


Convergence of Policy Mirror Descent Beyond Compatible Function Approximation

arXiv.org Machine Learning

Modern policy optimization methods roughly follow the policy mirror descent (PMD) algorithmic template, for which there are by now numerous theoretical convergence results. However, most of these either target tabular environments, or can be applied effectively only when the class of policies being optimized over satisfies strong closure conditions, which is typically not the case when working with parametric policy classes in large-scale environments. In this work, we develop a theoretical framework for PMD for general policy classes where we replace the closure conditions with a strictly weaker variational gradient dominance assumption, and obtain upper bounds on the rate of convergence to the best-in-class policy. Our main result leverages a novel notion of smoothness with respect to a local norm induced by the occupancy measure of the current policy, and casts PMD as a particular instance of smooth non-convex optimization in non-Euclidean space.


Analysis of Off-Policy $n$-Step TD-Learning with Linear Function Approximation

arXiv.org Artificial Intelligence

This paper analyzes multi-step temporal difference (TD)-learning algorithms within the ``deadly triad'' scenario, characterized by linear function approximation, off-policy learning, and bootstrapping. In particular, we prove that $n$-step TD-learning algorithms converge to a solution as the sampling horizon $n$ increases sufficiently. The paper is divided into two parts. In the first part, we comprehensively examine the fundamental properties of their model-based deterministic counterparts, including projected value iteration, gradient descent algorithms, which can be viewed as prototype deterministic algorithms whose analysis plays a pivotal role in understanding and developing their model-free reinforcement learning counterparts. In particular, we prove that these algorithms converge to meaningful solutions when $n$ is sufficiently large. Based on these findings, in the second part, two $n$-step TD-learning algorithms are proposed and analyzed, which can be seen as the model-free reinforcement learning counterparts of the model-based deterministic algorithms.


Enhancing Neural Function Approximation: The XNet Outperforming KAN

arXiv.org Artificial Intelligence

XNet is a single-layer neural network architecture that leverages Cauchy integral-based activation functions for high-order function approximation. Through theoretical analysis, we show that the Cauchy activation functions used in XNet can achieve arbitrary-order polynomial convergence, fundamentally outperforming traditional MLPs and Kolmogorov-Arnold Networks (KANs) that rely on increased depth or B-spline activations. Our extensive experiments on function approximation, PDE solving, and reinforcement learning demonstrate XNet's superior performance - reducing approximation error by up to 50000 times and accelerating training by up to 10 times compared to existing approaches. These results establish XNet as a highly efficient architecture for both scientific computing and AI applications.


Application of Artificial Intelligence (AI) in Civil Engineering

arXiv.org Artificial Intelligence

Hard computing generally deals with precise data, which provides ideal solutions to problems. However, in the civil engineering field, amongst other disciplines, that is not always the case as real-world systems are continuously changing. Here lies the need to explore soft computing methods and artificial intelligence to solve civil engineering shortcomings. The integration of advanced computational models, including Artificial Neural Networks (ANNs), Fuzzy Logic, Genetic Algorithms (GAs), and Probabilistic Reasoning, has revolutionized the domain of civil engineering. These models have significantly advanced diverse sub-fields by offering innovative solutions and improved analysis capabilities. Sub-fields such as: slope stability analysis, bearing capacity, water quality and treatment, transportation systems, air quality, structural materials, etc. ANNs predict non-linearities and provide accurate estimates. Fuzzy logic uses an efficient decision-making process to provide a more precise assessment of systems. Lastly, while GAs optimizes models (based on evolutionary processes) for better outcomes, probabilistic reasoning lowers their statistical uncertainties.


Infinite-Horizon Value Function Approximation for Model Predictive Control

arXiv.org Artificial Intelligence

Model Predictive Control has emerged as a popular tool for robots to generate complex motions. However, the real-time requirement has limited the use of hard constraints and large preview horizons, which are necessary to ensure safety and stability. In practice, practitioners have to carefully design cost functions that can imitate an infinite horizon formulation, which is tedious and often results in local minima. In this work, we study how to approximate the infinite horizon value function of constrained optimal control problems with neural networks using value iteration and trajectory optimization. Furthermore, we demonstrate how using this value function approximation as a terminal cost provides global stability to the model predictive controller. The approach is validated on two toy problems and a real-world scenario with online obstacle avoidance on an industrial manipulator where the value function is conditioned to the goal and obstacle.


Integrating Artificial Intelligence and Geophysical Insights for Earthquake Forecasting: A Cross-Disciplinary Review

arXiv.org Artificial Intelligence

Earthquake forecasting remains a significant scientific challenge, with current methods falling short of achieving the performance necessary for meaningful societal benefits. Traditional models, primarily based on past seismicity and geomechanical data, struggle to capture the complexity of seismic patterns and often overlook valuable non-seismic precursors such as geophysical, geochemical, and atmospheric anomalies. The integration of such diverse data sources into forecasting models, combined with advancements in AI technologies, offers a promising path forward. AI methods, particularly deep learning, excel at processing complex, large-scale datasets, identifying subtle patterns, and handling multidimensional relationships, making them well-suited for overcoming the limitations of conventional approaches. This review highlights the importance of combining AI with geophysical knowledge to create robust, physics-informed forecasting models. It explores current AI methods, input data types, loss functions, and practical considerations for model development, offering guidance to both geophysicists and AI researchers. While many AI-based studies oversimplify earthquake prediction, neglecting critical features such as data imbalance and spatio-temporal clustering, the integration of specialized geophysical insights into AI models can address these shortcomings. We emphasize the importance of interdisciplinary collaboration, urging geophysicists to experiment with AI architectures thoughtfully and encouraging AI experts to deepen their understanding of seismology. By bridging these disciplines, we can develop more accurate, reliable, and societally impactful earthquake forecasting tools.


Weighted importance sampling for off-policy learning with linear function approximation

Neural Information Processing Systems

Importance sampling is an essential component of off-policy model-free reinforcement learning algorithms. However, its most effective variant, weighted importance sampling, does not carry over easily to function approximation and, because of this, it is not utilized in existing off-policy learning algorithms. In this paper, we take two steps toward bridging this gap. First, we show that weighted importance sampling can be viewed as a special case of weighting the error of individual training samples, and that this weighting has theoretical and empirical benefits similar to those of weighted importance sampling. Second, we show that these benefits extend to a new weighted-importance-sampling version of offpolicy LSTD(). We show empirically that our new WIS-LSTD() algorithm can result in much more rapid and reliable convergence than conventional off-policy LSTD() (Yu 2010, Bertsekas & Yu 2009).


Bayes-Adaptive Simulation-based Search with Value Function Approximation

Neural Information Processing Systems

Bayes-adaptive planning offers a principled solution to the explorationexploitation trade-off under model uncertainty. It finds the optimal policy in belief space, which explicitly accounts for the expected effect on future rewards of reductions in uncertainty. However, the Bayes-adaptive solution is typically intractable in domains with large or continuous state spaces. We present a tractable method for approximating the Bayes-adaptive solution by combining simulationbased search with a novel value function approximation technique that generalises appropriately over belief space. Our method outperforms prior approaches in both discrete bandit tasks and simple continuous navigation and control tasks.