AITopics

2210.00121

Country:

North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)
North America > United States > California > San Diego County > San Diego (0.04)
Europe > Italy > Tuscany > Florence (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
(3 more...)

Pace, Alizée, Chan, Alex J., van der Schaar, Mihaela

POETREE: Interpretable Policy Learning with Adaptive Decision Trees

arXiv.org Artificial IntelligenceSep-30-2022

Building models of human decision-making from observed behaviour is critical to better understand, diagnose and support real-world policies such as clinical care. As established policy learning approaches remain focused on imitation performance, they fall short of explaining the demonstrated decision-making process. Policy Extraction through decision Trees (POETREE) is a novel framework for interpretable policy learning, compatible with fully-offline and partially-observable clinical decision environments -- and builds probabilistic tree policies determining physician actions based on patients' observations and medical history. Fully-differentiable tree architectures are grown incrementally during optimization to adapt their complexity to the modelling task, and learn a representation of patient history through recurrence, resulting in decision tree policies that adapt over time with patient information. This policy learning method outperforms the state-of-the-art on real and synthetic medical datasets, both in terms of understanding, quantifying and evaluating observed behaviour as well as in accurately replicating it -- with potential to improve future decision support systems.

artificial intelligence, decision tree learning, machine learning, (20 more...)

2203.08057

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
Europe > Switzerland > Zürich > Zürich (0.14)
North America > United States > Texas (0.04)
(3 more...)

Genre:

Research Report > New Finding (0.67)
Research Report > Experimental Study (0.46)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Diagnostic Medicine (1.00)
Health & Medicine > Therapeutic Area > Oncology (0.67)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

arXiv.org Artificial IntelligenceSep-29-2022

Gradient flows and randomised thresholding: sparse inversion and classification

Latz, Jonas

Sparse inversion and classification problems are ubiquitous in modern data science and imaging. They are often formulated as non-smooth minimisation problems. In sparse inversion, we minimise, e.g., the sum of a data fidelity term and an L1/LASSO regulariser. In classification, we consider, e.g., the sum of a data fidelity term and a non-smooth Ginzburg--Landau energy. Standard (sub)gradient descent methods have shown to be inefficient when approaching such problems. Splitting techniques are much more useful: here, the target function is partitioned into a sum of two subtarget functions -- each of which can be efficiently optimised. Splitting proceeds by performing optimisation steps alternately with respect to each of the two subtarget functions. In this work, we study splitting from a stochastic continuous-time perspective. Indeed, we define a differential inclusion that follows one of the two subtarget function's negative subdifferential at each point in time. The choice of the subtarget function is controlled by a binary continuous-time Markov process. The resulting dynamical system is a stochastic approximation of the underlying subgradient flow. We investigate this stochastic approximation for an L1-regularised sparse inversion flow and for a discrete Allen-Cahn equation minimising a Ginzburg--Landau energy. In both cases, we study the longtime behaviour of the stochastic dynamical system and its ability to approximate the underlying subgradient flow at any accuracy. We illustrate our theoretical findings in a simple sparse estimation problem and also in low- and high-dimensional classification problems.

artificial intelligence, machine learning, stochastic approximation, (16 more...)

doi: 10.1088/1361-6420/ac9b84

2203.11555

Country:

North America > United States > New York > New York County > New York City (0.14)
Africa (0.04)

Genre: Research Report (0.81)

Industry: Health & Medicine (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.34)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Poudel, Rudra P. K., Pandya, Harit, Cipolla, Roberto

Contrastive Unsupervised Learning of World Model with Invariant Causal Features

arXiv.org Artificial IntelligenceSep-29-2022

In this paper we present a world model, which learns causal features using the invariance principle. In particular, we use contrastive unsupervised learning to learn the invariant causal features, which enforces invariance across augmentations of irrelevant parts or styles of the observation. The world-model-based reinforcement learning methods independently optimize representation learning and the policy. Thus naive contrastive loss implementation collapses due to a lack of supervisory signals to the representation learning module. We propose an intervention invariant auxiliary task to mitigate this issue. Specifically, we utilize depth prediction to explicitly enforce the invariance and use data augmentation as style intervention on the RGB observation space. Our design leverages unsupervised representation learning to learn the world model with invariant causal features. Our proposed method significantly outperforms current state-of-the-art model-based and model-free reinforcement learning methods on out-of-distribution point navigation tasks on the iGibson dataset. Moreover, our proposed model excels at the sim-to-real transfer of our perception learning module. Finally, we evaluate our approach on the DeepMind control suite and enforce invariance only implicitly since depth is not available. Nevertheless, our proposed model performs on par with the state-of-the-art counterpart.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

2209.14932

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.28)
Europe > Austria > Vienna (0.14)

Genre:

Research Report (1.00)
Instructional Material > Course Syllabus & Notes (0.45)

Industry: Leisure & Entertainment > Games > Computer Games (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Hamilton, Jordan, Stefanakos, Ioannis, Calinescu, Radu, Cámara, Javier

Towards Adaptive Planning of Assistive-care Robot Tasks

Whilst assistive robots [7] have been embedded into social and health care environments [1, 2, 10], they have largely been limited to simple applications, such as support for social and physical activities and hall monitoring, but often without considering potential interactions with humans. To expand the range of these applications, the human user and the robot need to interact in order to perform tasks together [4]. As such, this interaction, which is still underexplored in the social care domain, should be prioritised, with an emphasis on the safety of the human [3, 9]. To enable the development of applications that support such interaction and to ensure its safety, we propose an adaptive mission and path finding framework for an autonomous robot operating in a homecare environment. The framework models the environment as a graph, with nodes representing key locations within the environment where the robot can perform local tasks. Missions are modelled as a repertoire of locations within the environment where a task requires completion. The main contributions of our'research preview' paper are: (i) a generalised approach for modelling environments as graphs with edges represented as levels of risk, (ii) a modified Dijkstra's algorithm for performing path finding in uncertain environments with a cost function to reduce risk, (iii) simple human predictive behaviour model that forecasts human intention allowing for adaptive path finding using heat maps to artificially increase the risk associated with specific edges in the graph, (iv) a framework that combines modelling methods, adaptive path finding techniques and run-time probabilistic model generation for safety verification into an end-to-end solution for autonomous robotic mission planning, (v) finally, a simulation-based case study that shows the effectiveness of the framework.

artificial intelligence, machine learning, robot, (18 more...)

doi: 10.4204/EPTCS.371.12

2209.14041

Country:

Asia > Middle East > Jordan (0.05)
Europe > United Kingdom > England > North Yorkshire > York (0.05)
Europe > Spain > Andalusia > Málaga Province > Málaga (0.04)

Genre: Research Report (0.40)

Industry:

Health & Medicine (0.87)
Government > Military (0.67)

Technology:

Information Technology > Artificial Intelligence > Robots > Robots in the Home (0.49)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.34)

Murthy, Surya, Neogi, Natasha A., Bharadwaj, Suda

Scheduling for Urban Air Mobility using Safe Learning

This work considers the scheduling problem for Urban Air Mobility (UAM) vehicles travelling between origin-destination pairs with both hard and soft trip deadlines. Each route is described by a discrete probability distribution over trip completion times (or delay) and over inter-arrival times of requests (or demand) for the route along with a fixed hard or soft deadline. Soft deadlines carry a cost that is incurred when the deadline is missed. An online, safe scheduler is developed that ensures that hard deadlines are never missed, and that average cost of missing soft deadlines is minimized. The system is modelled as a Markov Decision Process (MDP) and safe model-based learning is used to find the probabilistic distributions over route delays and demand. Monte Carlo Tree Search (MCTS) Earliest Deadline First (EDF) is used to safely explore the learned models in an online fashion and develop a near-optimal non-preemptive scheduling policy. These results are compared with Value Iteration (VI) and MCTS (Random) scheduling solutions.

artificial intelligence, machine learning, planning & scheduling, (17 more...)

doi: 10.4204/EPTCS.371.7

2209.15457

Country:

North America > United States > Illinois > Champaign County > Urbana (0.14)
North America > United States > Virginia > Hampton (0.04)
North America > United States > Texas > Travis County > Austin (0.04)
Europe > France > Île-de-France > Paris > Paris (0.04)

Genre: Research Report (0.50)

Industry: Transportation > Air (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Hierarchical Reinforcement Learning with AI Planning Models

Lee, Junkyu, Katz, Michael, Agravante, Don Joven, Liu, Miao, Tasse, Geraud Nangue, Klinger, Tim, Sohrabi, Shirin

Two common approaches to sequential decision-making are AI planning (AIP) and reinforcement learning (RL). Each has strengths and weaknesses. AIP is interpretable, easy to integrate with symbolic knowledge, and often efficient, but requires an up-front logical domain specification and is sensitive to noise; RL only requires specification of rewards and is robust to noise but is sample inefficient and not easily supplied with external knowledge. We propose an integrative approach that combines high-level planning with RL, retaining interpretability, transfer, and efficiency, while allowing for robust learning of the lower-level plan actions. Our approach defines options in hierarchical reinforcement learning (HRL) from AIP operators by establishing a correspondence between the state transition model of AI planning problem and the abstract state transition system of a Markov Decision Process (MDP). Options are learned by adding intrinsic rewards to encourage consistency between the MDP and AIP transition models. We demonstrate the benefit of our integrated approach by comparing the performance of RL and HRL algorithms in both MiniGrid and N-rooms environments, showing the advantage of our method over the existing ones.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

2203.00669

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.48)

Online Policy Optimization for Robust MDP

Dong, Jing, Li, Jingwei, Wang, Baoxiang, Zhang, Jingzhao

The rapid progress of reinforcement learning (RL) algorithms enables trained agents to navigate around complicated environments and solve complex tasks. The standard reinforcement learning methods, however, may fail catastrophically in another environment, even if the two environments only differ slightly in dynamics [Farebrother et al., 2018, Packer et al., 2018, Cobbe et al., 2019, Song et al., 2019, Raileanu and Fergus, 2021]. In practical applications, such mismatch of environment dynamics are common and can be caused by a number of reasons, e.g., model deviation due to incomplete data, unexpected perturbation and possible adversarial attacks. Part of the sensitivity of standard RL algorithms stems from the formulation of the underlying Markov decision process (MDP). In a sequence of interactions, MDP assumes the dynamic to be unchanged, and the trained agent to be tested on the same dynamic thereafter. To model the potential mismatch between system dynamics, the framework of robust MDP is introduced to account for the uncertainty of the parameters of the MDP [Satia and Lave Jr, 1973, White III and Eldeib, 1994, Nilim and El Ghaoui, 2005, Iyengar, 2005].

artificial intelligence, machine learning, reinforcement learning, (13 more...)

2209.13841

Country:

North America > United States > Washington > King County > Seattle (0.04)
Asia > China > Hong Kong (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Optimistic Posterior Sampling for Reinforcement Learning with Few Samples and Tight Guarantees

Tiapkin, Daniil, Belomestny, Denis, Calandriello, Daniele, Moulines, Eric, Munos, Remi, Naumov, Alexey, Rowland, Mark, Valko, Michal, Menard, Pierre

We consider reinforcement learning in an environment modeled by an episodic, finite, stage-dependent Markov decision process of horizon $H$ with $S$ states, and $A$ actions. The performance of an agent is measured by the regret after interacting with the environment for $T$ episodes. We propose an optimistic posterior sampling algorithm for reinforcement learning (OPSRL), a simple variant of posterior sampling that only needs a number of posterior samples logarithmic in $H$, $S$, $A$, and $T$ per state-action pair. For OPSRL we guarantee a high-probability regret bound of order at most $\widetilde{\mathcal{O}}(\sqrt{H^3SAT})$ ignoring $\text{poly}\log(HSAT)$ terms. The key novel technical ingredient is a new sharp anti-concentration inequality for linear forms which may be of independent interest. Specifically, we extend the normal approximation-based lower bound for Beta distributions by Alfers and Dinges [1984] to Dirichlet distributions. Our bound matches the lower bound of order $\Omega(\sqrt{H^3SAT})$, thereby answering the open problems raised by Agrawal and Jia [2017b] for the episodic setting.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

2209.14414

Country:

North America > United States > New York > New York County > New York City (0.14)
North America > United States > Virginia > Arlington County > Arlington (0.04)
North America > United States > Massachusetts > Norfolk County > Wellesley (0.04)
(4 more...)

Genre:

Research Report (0.50)
Workflow (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Vázquez, Gricel, Calinescu, Radu, Cámara, Javier

Scheduling of Missions with Constrained Tasks for Heterogeneous Robot Systems

We present a formal tasK AllocatioN and scheduling apprOAch for multi-robot missions (KANOA). KANOA supports two important types of task constraints: task ordering, which requires the execution of several tasks in a specified order; and joint tasks, which indicates tasks that must be performed by more than one robot. To mitigate the complexity of robotic mission planning, KANOA handles the allocation of the mission tasks to robots, and the scheduling of the allocated tasks separately. To that end, the task allocation problem is formalised in first-order logic and resolved using the Alloy model analyzer, and the task scheduling problem is encoded as a Markov decision process and resolved using the PRISM probabilistic model checker. We illustrate the application of KANOA through a case study in which a heterogeneous robotic team is assigned a hospital maintenance mission.

artificial intelligence, logic & formal reasoning, machine learning, (19 more...)

doi: 10.4204/EPTCS.371.11

2209.1404

Country:

Europe > United Kingdom > England > North Yorkshire > York (0.04)
Europe > Spain > Andalusia > Málaga Province > Málaga (0.04)
Europe > Middle East > Malta (0.04)
Europe > Ireland (0.04)

Genre: Research Report (0.50)

Industry:

Health & Medicine (1.00)
Government > Military (0.34)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.48)