AITopics

2009.11459

Country:

North America > United States > Texas > Travis County > Austin (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)
Europe > Netherlands > North Brabant > Eindhoven (0.04)

Genre: Research Report (0.82)

Industry:

Transportation > Air (0.46)
Government > Regional Government > North America Government > United States Government (0.41)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Vajjha, Koundinya, Shinnar, Avraham, Pestun, Vasily, Trager, Barry, Fulton, Nathan

CertRL: Formalizing Convergence Proofs for Value and Policy Iteration in Coq

arXiv.org Artificial IntelligenceSep-23-2020

Reinforcement learning algorithms solve sequential decision-making problems in probabilistic environments by optimizing for long-term reward. The desire to use reinforcement learning in safety-critical settings inspires a recent line of work on formally constrained reinforcement learning; however, these methods place the implementation of the learning algorithm in their Trusted Computing Base. The crucial correctness property of these implementations is a guarantee that the learning algorithm converges to an optimal policy. This paper begins the work of closing this gap by developing a Coq formalization of two canonical reinforcement learning algorithms: value and policy iteration for finite state Markov decision processes. The central results are a formalization of Bellman's optimality principle and its proof, which uses a contraction property of Bellman optimality operator to establish that a sequence converges in the infinite horizon limit. The CertRL development exemplifies how the Giry monad and mechanized metric coinduction streamline optimality proofs for reinforcement learning algorithms. The CertRL library provides a general framework for proving properties about Markov decision processes and reinforcement learning algorithms, paving the way for further work on formalization of reinforcement learning algorithms.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

2009.11403

Country:

North America > United States > Oregon > Multnomah County > Portland (0.14)
Oceania > Australia > New South Wales > Sydney (0.04)
North America > United States > New York > New York County > New York City (0.04)
(6 more...)

Genre: Research Report (0.40)

Industry: Information Technology (0.35)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.73)

Seddik, Bassem, Amara, Najoua Essoukri Ben

Visual Methods for Sign Language Recognition: A Modality-Based Review

Sign language visual recognition from continuous multi-modal streams is still one of the most challenging fields. Recent advances in human actions recognition are exploiting the ascension of GPU-based learning from massive data, and are getting closer to human-like performances. They are then prone to creating interactive services for the deaf and hearing-impaired communities. A population that is expected to grow considerably in the years to come. This paper aims at reviewing the human actions recognition literature with the sign-language visual understanding as a scope. The methods analyzed will be mainly organized according to the different types of unimodal inputs exploited, their relative multi-modal combinations and pipeline steps. In each section, we will detail and compare the related datasets, approaches then distinguish the still open contribution paths suitable for the creation of sign language related services. Special attention will be paid to the approaches and commercial solutions handling facial expressions and continuous signing.

artificial intelligence, machine learning, recognition, (15 more...)

2009.1037

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
Europe > Switzerland > Zürich > Zürich (0.14)
North America > United States > New York > New York County > New York City (0.04)
(16 more...)

Genre:

Research Report (1.00)
Overview (1.00)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Education > Curriculum > Subject-Specific Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision > Face Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Vision > Image Understanding (0.93)
(2 more...)

Wells, Andrew M., Lahijanian, Morteza, Kavraki, Lydia E., Vardi, Moshe Y.

LTLf Synthesis on Probabilistic Systems

Many systems are naturally modeled as Markov Decision Processes (MDPs), combining probabilities and strategic actions. Given a model of a system as an MDP and some logical specification of system behavior, the goal of synthesis is to find a policy that maximizes the probability of achieving this behavior. A popular choice for defining behaviors is Linear Temporal Logic (LTL). Policy synthesis on MDPs for properties specified in LTL has been well studied. LTL, however, is defined over infinite traces, while many properties of interest are inherently finite. Linear Temporal Logic over finite traces (LTLf) has been used to express such properties, but no tools exist to solve policy synthesis for MDP behaviors given finite-trace properties. We present two algorithms for solving this synthesis problem: the first via reduction of LTLf to LTL and the second using native tools for LTLf. We compare the scalability of these two approaches for synthesis and show that the native approach offers better scalability compared to existing automaton generation tools for LTL.

artificial intelligence, logic & formal reasoning, machine learning, (20 more...)

doi: 10.4204/EPTCS.326.11

2009.10883

Country:

North America > United States > Colorado > Boulder County > Boulder (0.14)
North America > United States > Texas > Harris County > Houston (0.04)
North America > United States > Maryland (0.04)
(6 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.48)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.42)

The relationship between dynamic programming and active inference: the discrete, finite-horizon case

Da Costa, Lancelot, Sajid, Noor, Parr, Thomas, Friston, Karl, Smith, Ryan

Active inference is a normative framework for generating behaviour based upon the free energy principle, a theory of self-organisation. This framework has been successfully used to solve reinforcement learning and stochastic control problems, yet, the formal relation between active inference and reward maximisation has not been fully explicated. In this paper, we consider the relation between active inference and dynamic programming under the Bellman equation, which underlies many approaches to reinforcement learning and control. We show that, on partially observable Markov decision processes, dynamic programming is a limiting case of active inference. In active inference, agents select actions to minimise expected free energy. In the absence of ambiguity about states, this reduces to matching expected states with a target distribution encoding the agent's preferences. When target states correspond to rewarding states, this maximises expected reward, as in reinforcement learning. When states are ambiguous, active inference agents will choose actions that simultaneously minimise ambiguity. This allows active inference agents to supplement their reward maximising (or exploitative) behaviour with novelty-seeking (or exploratory) behaviour. This clarifies the connection between active inference and reinforcement learning, and how both frameworks may benefit from each other.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

2009.08111

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > New York > New York County > New York City (0.14)
Asia > Middle East > Jordan (0.05)
(5 more...)

Genre:

Research Report (0.50)
Instructional Material > Course Syllabus & Notes (0.46)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

HasanzadeZonuzy, Aria, Bura, Archana, Kalathil, Dileep, Shakkottai, Srinivas

Learning with Safety Constraints: Sample Complexity of Reinforcement Learning for Constrained MDPs

arXiv.org Machine LearningSep-22-2020

Many physical systems have underlying safety considerations that require that the policy employed ensures the satisfaction of a set of constraints. The analytical formulation usually takes the form of a Constrained Markov Decision Process (CMDP). We focus on the case where the CMDP is unknown, and RL algorithms obtain samples to discover the model and compute an optimal constrained policy. Our goal is to characterize the relationship between safety constraints and the number of samples needed to ensure a desired level of accuracy---both objective maximization and constraint satisfaction---in a PAC sense. We explore two classes of RL algorithms, namely, (i) a generative model based approach, wherein samples are taken initially to estimate a model, and (ii) an online approach, wherein the model is updated as samples are obtained. Our main finding is that compared to the best known bounds of the unconstrained regime, the sample complexity of constrained RL algorithms are increased by a factor that is logarithmic in the number of constraints, which suggests that the approach may be easily utilized in real systems.

log 4, machine learning, reinforcement learning, (17 more...)

arXiv.org Machine Learning

2008.00311

Country:

South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > United States > Texas > Brazos County > College Station (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Khan, Tariq M., Robles-Kelly, Antonio

A Derivative-free Method for Quantum Perceptron Training in Multi-layered Neural Networks

In this paper, we present a gradient-free approach for training multi-layered neural networks based upon quantum perceptrons. Here, we depart from the classical perceptron and the elemental operations on quantum bits, i.e. qubits, so as to formulate the problem in terms of quantum perceptrons. We then make use of measurable operators to define the states of the network in a manner consistent with a Markov process. This yields a Dirac-Von Neumann formulation consistent with quantum mechanics. Moreover, the formulation presented here has the advantage of having a computational efficiency devoid of the number of layers in the network. This, paired with the natural efficiency of quantum computing, can imply a significant improvement in efficiency, particularly for deep networks. Finally, but not least, the developments here are quite general in nature since the approach presented here can also be used for quantum-inspired neural networks implemented on conventional computers.

artificial intelligence, machine learning, neural network, (18 more...)

2009.13264

Country:

Oceania > Australia (0.04)
North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Satsangi, Yash, Whiteson, Shimon, Oliehoek, Frans A., Spaan, Matthijs T. J.

Exploiting Submodular Value Functions For Scaling Up Active Perception

arXiv.org Artificial IntelligenceSep-21-2020

In active perception tasks, an agent aims to select sensory actions that reduce its uncertainty about one or more hidden variables. While partially observable Markov decision processes (POMDPs) provide a natural model for such problems, reward functions that directly penalize uncertainty in the agent's belief can remove the piecewise-linear and convex property of the value function required by most POMDP planners. Furthermore, as the number of sensors available to the agent grows, the computational cost of POMDP planning grows exponentially with it, making POMDP planning infeasible with traditional methods. In this article, we address a twofold challenge of modeling and planning for active perception tasks. We show the mathematical equivalence of $\rho$POMDP and POMDP-IR, two frameworks for modeling active perception tasks, that restore the PWLC property of the value function. To efficiently plan for active perception tasks, we identify and exploit the independence properties of POMDP-IR to reduce the computational cost of solving POMDP-IR (and $\rho$POMDP). We propose greedy point-based value iteration (PBVI), a new POMDP planning method that uses greedy maximization to greatly improve scalability in the action space of an active perception POMDP. Furthermore, we show that, under certain conditions, including submodularity, the value function computed using greedy PBVI is guaranteed to have bounded error with respect to the optimal value function. We establish the conditions under which the value function of an active perception POMDP is guaranteed to be submodular. Finally, we present a detailed empirical analysis on a dataset collected from a multi-camera tracking system employed in a shopping mall. Our method achieves similar performance to existing methods but at a fraction of the computational cost leading to better scalability for solving active perception tasks.

artificial intelligence, machine learning, value function, (14 more...)

doi: 10.1007/s10514-017-9666-5

2009.09696

Country:

North America > Canada > Ontario > Toronto (0.14)
Europe > Netherlands > North Holland > Amsterdam (0.04)
North America > United States > Florida > Hillsborough County > University (0.04)
(5 more...)

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Akalin, Neziha, Loutfi, Amy

Reinforcement Learning Approaches in Social Robotics

arXiv.org Artificial IntelligenceSep-21-2020

In order to facilitate natural interaction, researchers in social robotics have focused on robots that can adapt to diverse conditions and to the different users with whom they interact. Recently, there has been great interest in the use of machine learning methods for adaptive social robots [48], [29], [106], [45], [49], [86]. Machine Learning (ML) algorithms can be categorized into three subfields [2]: supervised learning, unsupervised learning and reinforcement learning. In supervised learning, correct input/output pairs are available and the goal is to find a correct mapping from input to output space. In unsupervised learning, output data is not available and the goal is to find patterns in the input data. Reinforcement Learning (RL) [96] is a framework for decision-making problems in which an agent interacts through trial-and-error with its environment to discover an optimal behavior. The agent does not receive direct feedback of correctness, instead it receives scarce feedback about the actions it has taken in the past.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

2009.09689

Country:

North America > United States > Massachusetts > Hampshire County > Amherst (0.14)
Europe > Sweden > Örebro County > Örebro (0.04)
North America > United States > New York > New York County > New York City (0.04)
(9 more...)

Genre:

Overview (1.00)
Research Report > New Finding (0.46)

Industry:

Leisure & Entertainment > Games (1.00)
Health & Medicine > Consumer Health (1.00)
Education > Educational Setting (1.00)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

arXiv.org Artificial IntelligenceSep-20-2020

An Improved Approach of Intention Discovery with Machine Learning for POMDP-based Dialogue Management

Raval, Ruturaj

An Embodied Conversational Agent (ECA) is an intelligent agent that works as the front end of software applications to interact with users through verbal/nonverbal expressions and to provide online assistance without the limits of time, location, and language. To help to improve the experience of human-computer interaction, there is an increasing need to empower ECA with not only the realistic look of its human counterparts but also a higher level of intelligence. This thesis first highlights the main topics related to the construction of ECA, including different approaches of dialogue management, and then discusses existing techniques of trend analysis for its application in user classification. As a further refinement and enhancement to prior work on ECA, this thesis research proposes a cohesive framework to integrate emotion-based facial animation with improved intention discovery. In addition, a machine learning technique is introduced to support sentiment analysis for the adjustment of policy design in POMDP-based dialogue management. The proposed research work is going to improve the accuracy of intention discovery while reducing the length of dialogues.

artificial intelligence, machine learning, natural language, (17 more...)

2009.09354

Country:

Europe > Netherlands (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
Europe > Portugal > Lisbon > Lisbon (0.04)
(2 more...)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area (0.46)
Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)