Goto

Collaborating Authors

 Undirected Networks


Learning to reflect: A unifying approach for data-driven stochastic control strategies

arXiv.org Machine Learning

Stochastic optimal control problems have a long tradition in applied probability, with the questions addressed being of high relevance in a multitude of fields. Even though theoretical solutions are well understood in many scenarios, their practicability suffers from the assumption of known dynamics of the underlying stochastic process, raising the statistical challenge of developing purely data-driven strategies. For the mathematically separated classes of continuous diffusion processes and L\'evy processes, we show that developing efficient strategies for related singular stochastic control problems can essentially be reduced to finding rate-optimal estimators with respect to the sup-norm risk of objects associated to the invariant distribution of ergodic processes which determine the theoretical solution of the control problem. From a statistical perspective, we exploit the exponential $\beta$-mixing property as the common factor of both scenarios to drive the convergence analysis, indicating that relying on general stability properties of Markov processes is a sufficiently powerful and flexible approach to treat complex applications requiring statistical methods. We show moreover that in the L\'evy case $-$ even though per se jump processes are more difficult to handle both in statistics and control theory $-$ a fully data-driven strategy with regret of significantly better order than in the diffusion case can be constructed.


DRL: Deep Reinforcement Learning for Intelligent Robot Control -- Concept, Literature, and Future

arXiv.org Artificial Intelligence

Combination of machine learning (for generating machine intelligence), computer vision (for better environment perception), and robotic systems (for controlled environment interaction) motivates this work toward proposing a vision-based learning framework for intelligent robot control as the ultimate goal (vision-based learning robot). This work specifically introduces deep reinforcement learning as the the learning framework, a General-purpose framework for AI (AGI) meaning application-independent and platform-independent. In terms of robot control, this framework is proposing specifically a high-level control architecture independent of the low-level control, meaning these two required level of control can be developed separately from each other. In this aspect, the high-level control creates the required intelligence for the control of the platform using the recorded low-level controlling data from that same platform generated by a trainer. The recorded low-level controlling data is simply indicating the successful and failed experiences or sequences of experiments conducted by a trainer using the same robotic platform. The sequences of the recorded data are composed of observation data (input sensor), generated reward (feedback value) and action data (output controller). For experimental platform and experiments, vision sensors are used for perception of the environment, different kinematic controllers create the required motion commands based on the platform application, deep learning approaches generate the required intelligence, and finally reinforcement learning techniques incrementally improve the generated intelligence until the mission is accomplished by the robot.


CVLight: Deep Reinforcement Learning for Adaptive Traffic Signal Control with Connected Vehicles

arXiv.org Artificial Intelligence

This paper develops a reinforcement learning (RL) scheme for adaptive traffic signal control (ATSC), called "CVLight", that leverages data collected only from connected vehicles (CV). Seven types of RL models are proposed within this scheme that contain various state and reward representations, including incorporation of CV delay and green light duration into state and the usage of CV delay as reward. To further incorporate information of both CV and non-CV into CVLight, an algorithm based on actor-critic, A2C-Full, is proposed where both CV and non-CV information is used to train the critic network, while only CV information is used to update the policy network and execute optimal signal timing. These models are compared at an isolated intersection under various CV market penetration rates. A full model with the best performance (i.e., minimum average travel delay per vehicle) is then selected and applied to compare with state-of-the-art benchmarks under different levels of traffic demands, turning proportions, and dynamic traffic demands, respectively. Two case studies are performed on an isolated intersection and a corridor with three consecutive intersections located in Manhattan, New York, to further demonstrate the effectiveness of the proposed algorithm under real-world scenarios. Compared to other baseline models that use all vehicle information, the trained CVLight agent can efficiently control multiple intersections solely based on CV data and can achieve a similar or even greater performance when the CV penetration rate is no less than 20%.


Network-wide traffic signal control optimization using a multi-agent deep reinforcement learning

arXiv.org Artificial Intelligence

Inefficient traffic control may cause numerous problems such as traffic congestion and energy waste. This paper proposes a novel multi-agent reinforcement learning method, named KS-DDPG (Knowledge Sharing Deep Deterministic Policy Gradient) to achieve optimal control by enhancing the cooperation between traffic signals. By introducing the knowledge-sharing enabled communication protocol, each agent can access to the collective representation of the traffic environment collected by all agents. The proposed method is evaluated through two experiments respectively using synthetic and real-world datasets. The comparison with state-of-the-art reinforcement learning-based and conventional transportation methods demonstrate the proposed KS-DDPG has significant efficiency in controlling large-scale transportation networks and coping with fluctuations in traffic flow. In addition, the introduced communication mechanism has also been proven to speed up the convergence of the model without significantly increasing the computational burden.


Markov models and Markov chains explained in real life: probabilistic workout routine

#artificialintelligence

Andrei Markov didn't agree with Pavel Nebrasov, when he said independence between variables was necessary for the Weak Law of Large Numbers to be applied. When you collect independent samples, as the number of samples gets bigger, the mean of those samples converges to the true mean of the population. But Markov believed independence was not a necessary condition for the mean to converge. So he set out to define how the average of the outcomes from a process involving dependent random variables could converge over time. Thanks to this intellectual disagreement, Markov created a way to describe how random, also called stochastic, systems or processes evolve over time.


A Simulated Experiment to Explore Robotic Dialogue Strategies for People with Dementia

arXiv.org Artificial Intelligence

People with Alzheimer's disease and related dementias (ADRD) often show the problem of repetitive questioning, which brings a great burden on persons with ADRD (PwDs) and their caregivers. Conversational robots hold promise of coping with this problem and hence alleviating the burdens on caregivers. In this paper, we proposed a partially observable markov decision process (POMDP) model for the PwD-robot interaction in the context of repetitive questioning, and used Q-learning to learn an adaptive conversation strategy (i.e., rate of follow-up question and difficulty of follow-up question) towards PwDs with different cognitive capabilities and different engagement levels. The results indicated that Q-learning was helpful for action selection for the robot. This may be a useful step towards the application of conversational social robots to cope with repetitive questioning in PwDs.


A Self-Supervised Auxiliary Loss for Deep RL in Partially Observable Settings

arXiv.org Artificial Intelligence

In this work we explore an auxiliary loss useful for reinforcement learning in environments where strong performing agents are required to be able to navigate a spatial environment. The auxiliary loss proposed is to minimize the classification error of a neural network classifier that predicts whether or not a pair of states sampled from the agents current episode trajectory are in order. The classifier takes as input a pair of states as well as the agent's memory. The motivation for this auxiliary loss is that there is a strong correlation with which of a pair of states is more recent in the agents episode trajectory and which of the two states is spatially closer to the agent. Our hypothesis is that learning features to answer this question encourages the agent to learn and internalize in memory representations of states that facilitate spatial reasoning. We tested this auxiliary loss on a navigation task in a gridworld and achieved 9.6% increase in accumulative episode reward compared to a strong baseline approach.


Financial Engineering and Artificial Intelligence in Python

#artificialintelligence

Have you ever thought about what would happen if you combined the power of machine learning and artificial intelligence with financial engineering? Today, you can stop imagining, and start doing. This course will teach you the core fundamentals of financial engineering, with a machine learning twist. We will learn about the greatest flub made in the past decade by marketers posing as "machine learning experts" who promise to teach unsuspecting students how to "predict stock prices with LSTMs". You will learn exactly why their methodology is fundamentally flawed and why their results are complete nonsense.


Extractive Summarization of Call Transcripts

arXiv.org Artificial Intelligence

Text summarization is the process of extracting the most important information from the text and presenting it concisely in fewer sentences. Call transcript is a text that involves textual description of a phone conversation between a customer (caller) and agent(s) (customer representatives). This paper presents an indigenously developed method that combines topic modeling and sentence selection with punctuation restoration in condensing ill-punctuated or un-punctuated call transcripts to produce summaries that are more readable. Extensive testing, evaluation and comparisons have demonstrated the efficacy of this summarizer for call transcript summarization.


Self-Supervised Exploration via Latent Bayesian Surprise

arXiv.org Artificial Intelligence

Training with Reinforcement Learning requires a reward function that is used to guide the agent towards achieving its objective. However, designing smooth and well-behaved rewards is in general not trivial and requires significant human engineering efforts. Generating rewards in self-supervised way, by inspiring the agent with an intrinsic desire to learn and explore the environment, might induce more general behaviours. In this work, we propose a curiosity-based bonus as intrinsic reward for Reinforcement Learning, computed as the Bayesian surprise with respect to a latent state variable, learnt by reconstructing fixed random features. We extensively evaluate our model by measuring the agent's performance in terms of environment exploration, for continuous tasks, and looking at the game scores achieved, for video games. Our model is computationally cheap and empirically shows state-of-the-art performance on several problems. Furthermore, experimenting on an environment with stochastic actions, our approach emerged to be the most resilient to simple stochasticity. Further visualization is available on the project webpage.(https://lbsexploration.github.io/)