AITopics

In recent years, robust Markov decision processes (MDPs) have emerged as a prominent modeling framework for dynamic decision problems affected by uncertainty. In contrast to classical MDPs, which only account for stochasticity by modeling the dynamics through a stochastic process with a known transition kernel, robust MDPs additionally account for ambiguity by optimizing in view of the most adverse transition kernel from a prescribed ambiguity set. In this paper, we develop a novel solution framework for robust MDPs with s-rectangular ambiguity sets that decomposes the problem into a sequence of robust Bellman updates and simplex projections. Exploiting the rich structure present in the simplex projections corresponding to phi-divergence ambiguity sets, we show that the associated s-rectangular robust MDPs can be solved substantially faster than with state-of-the-art commercial solvers as well as a recent first-order solution scheme, thus rendering them attractive alternatives to classical MDPs in practical applications.

ambiguity, artificial intelligence, machine learning, (16 more...)

2205.14202

Country:

North America > United States > New Hampshire (0.04)
North America > United States > Massachusetts (0.04)
Asia > China > Hong Kong (0.04)

Genre: Research Report (1.00)

Industry:

Energy (0.67)
Transportation (0.46)
Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.49)

Multitask Weakly Supervised Learning for Origin Destination Travel Time Estimation

Wang, Hongjun, Zhang, Zhiwen, Fan, Zipei, Chen, Jiyuan, Zhang, Lingyu, Shibasaki, Ryosuke, Song, Xuan

Travel time estimation from GPS trips is of great importance to order duration, ridesharing, taxi dispatching, etc. However, the dense trajectory is not always available due to the limitation of data privacy and acquisition, while the origin destination (OD) type of data, such as NYC taxi data, NYC bike data, and Capital Bikeshare data, is more accessible. To address this issue, this paper starts to estimate the OD trips travel time combined with the road network. Subsequently, a Multitask Weakly Supervised Learning Framework for Travel Time Estimation (MWSL TTE) has been proposed to infer transition probability between roads segments, and the travel time on road segments and intersection simultaneously. Technically, given an OD pair, the transition probability intends to recover the most possible route. And then, the output of travel time is equal to the summation of all segments' and intersections' travel time in this route. A novel route recovery function has been proposed to iteratively maximize the current route's co occurrence probability, and minimize the discrepancy between routes' probability distribution and the inverse distribution of routes' estimation loss. Moreover, the expected log likelihood function based on a weakly supervised framework has been deployed in optimizing the travel time from road segments and intersections concurrently. We conduct experiments on a wide range of real world taxi datasets in Xi'an and Chengdu and demonstrate our method's effectiveness on route recovery and travel time estimation.

artificial intelligence, data mining, machine learning, (21 more...)

2301.05336

Country:

Asia > China > Shaanxi Province > Xi'an (0.25)
Asia > China > Sichuan Province > Chengdu (0.25)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.15)
(4 more...)

Genre: Research Report (0.50)

Industry:

Transportation > Infrastructure & Services (1.00)
Transportation > Ground > Road (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(3 more...)

Simão, Thiago D., Suilen, Marnix, Jansen, Nils

Safe Policy Improvement for POMDPs via Finite-State Controllers

We study safe policy improvement (SPI) for partially observable Markov decision processes (POMDPs). SPI is an offline reinforcement learning (RL) problem that assumes access to (1) historical data about an environment, and (2) the so-called behavior policy that previously generated this data by interacting with the environment. SPI methods neither require access to a model nor the environment itself, and aim to reliably improve the behavior policy in an offline manner. Existing methods make the strong assumption that the environment is fully observable. In our novel approach to the SPI problem for POMDPs, we assume that a finite-state controller (FSC) represents the behavior policy and that finite memory is sufficient to derive optimal policies. This assumption allows us to map the POMDP to a finite-state fully observable MDP, the history MDP. We estimate this MDP by combining the historical data and the memory of the FSC, and compute an improved policy using an off-the-shelf SPI algorithm. The underlying SPI method constrains the policy-space according to the available data, such that the newly computed policy only differs from the behavior policy when sufficient data was available. We show that this new policy, converted into a new FSC for the (unknown) POMDP, outperforms the behavior policy with high probability. Experimental results on several well-established benchmarks show the applicability of the approach, even in cases where finite memory is not sufficient.

artificial intelligence, behavior policy, machine learning, (17 more...)

2301.04939

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
Europe > Netherlands > Gelderland > Nijmegen (0.04)
Asia > Middle East > Jordan (0.04)

Genre:

Research Report (0.70)
Overview (0.48)

Industry: Government > Regional Government > North America Government > United States Government (0.61)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Gallici, Matteo, Martin, Mario, Masmitja, Ivan

TransfQMix: Transformers for Leveraging the Graph Structure of Multi-Agent Reinforcement Learning Problems

Coordination is one of the most difficult aspects of multi-agent reinforcement learning (MARL). One reason is that agents normally choose their actions independently of one another. In order to see coordination strategies emerging from the combination of independent policies, the recent research has focused on the use of a centralized function (CF) that learns each agent's contribution to the team reward. However, the structure in which the environment is presented to the agents and to the CF is typically overlooked. We have observed that the features used to describe the coordination problem can be represented as vertex features of a latent graph structure. Here, we present TransfQMix, a new approach that uses transformers to leverage this latent structure and learn better coordination policies. Our transformer agents perform a graph reasoning over the state of the observable entities. Our transformer Q-mixer learns a monotonic mixing-function from a larger graph that includes the internal and external states of the agents. TransfQMix is designed to be entirely transferable, meaning that same parameters can be used to control and train larger or smaller teams of agents. This enables to deploy promising approaches to save training time and derive general policies in MARL, such as transfer learning, zero-shot transfer, and curriculum learning. We report TransfQMix's performances in the Spread and StarCraft II environments. In both settings, it outperforms state-of-the-art Q-Learning models, and it demonstrates effectiveness in solving problems that other methods can not solve.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

2301.05334

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.05)
Europe > United Kingdom > England > Greater London > London (0.04)
(3 more...)

Genre:

Research Report > Promising Solution (0.48)
Research Report > New Finding (0.46)

Industry:

Leisure & Entertainment > Games > Computer Games (0.49)
Education > Focused Education > Special Education (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Dave, Aditya, Venkatesh, Nishanth, Malikopoulos, Andreas A.

Approximate Information States for Worst-Case Control and Learning in Uncertain Systems

In this paper, we investigate discrete-time decision-making problems in uncertain systems with partially observed states. We consider a non-stochastic model, where uncontrolled disturbances acting on the system take values in bounded sets with unknown distributions. We present a general framework for decision-making in such problems by developing the notions of information states and approximate information states. In our definition of an information state, we introduce conditions to identify for an uncertain variable sufficient to construct a dynamic program (DP) that computes an optimal strategy. We show that many information states from the literature on worst-case control actions, e.g., the conditional range, are examples of our more general definition. Next, we relax these conditions to define approximate information states using only output variables, which can be learned from output data without knowledge of system dynamics. We use this notion to formulate an approximate DP that yields a strategy with a bounded performance loss. Finally, we illustrate the application of our results in control and reinforcement learning using numerical examples.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

2301.05089

Country:

North America > United States > Delaware > New Castle County > Newark (0.14)
North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)
Asia > India (0.04)
(4 more...)

Genre: Research Report (0.70)

Industry: Health & Medicine (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Ouyang, Yi, Tavafoghi, Hamidreza, Teneketzis, Demosthenis

An Approach to Stochastic Dynamic Games with Asymmetric Information and Hidden Actions

We study, in discrete time, a general class of sequential stochastic dynamic games with asymmetric information. We consider a setting where the underlying system has Markovian dynamics controlled by the agents' joint actions. Each agent's instantaneous utility depends on the agents' joint actions and the system state. At each time instant each agent makes a private noisy observation that depends on the current system state and the agents' actions in the previous time instant. In addition, at each time instant all agents may have a common noisy observation of the system state and their actions in the previous time instant. The agents' actions are hidden, that is, each agent's actions are not directly observable by the other agents. Therefore, at every time instant agents have asymmetric and imperfect information about the game's history. Dynamic games with the above features arise in engineering (cybersecurity, transportation, energy markets), in economics (industrial organization), and in socio-technological applications. As pointed out in Tang et al (2022), the key challenges in the study of dynamic games with asymmetric information are: (i) The domain of agents' strategies increases with time, as the agents acquire information over time.

artificial intelligence, information, machine learning, (15 more...)

2301.05288

Country:

North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > California > Santa Clara County > Mountain View (0.04)
(2 more...)

Genre: Research Report (0.81)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

arXiv.org Artificial IntelligenceJan-11-2023

Graph based Environment Representation for Vision-and-Language Navigation in Continuous Environments

Wang, Ting, Wu, Zongkai, Yao, Feiyu, Wang, Donglin

Vision-and-Language Navigation in Continuous Environments (VLN-CE) is a navigation task that requires an agent to follow a language instruction in a realistic environment. The understanding of environments is a crucial part of the VLN-CE task, but existing methods are relatively simple and direct in understanding the environment, without delving into the relationship between language instructions and visual environments. Therefore, we propose a new environment representation in order to solve the above problems. First, we propose an Environment Representation Graph (ERG) through object detection to express the environment in semantic level. This operation enhances the relationship between language and environment. Then, the relational representations of object-object, object-agent in ERG are learned through GCN, so as to obtain a continuous expression about ERG. Sequentially, we combine the ERG expression with object label embeddings to obtain the environment representation. Finally, a new cross-modal attention navigation framework is proposed, incorporating our environment representation and a special loss function dedicated to training ERG. Experimental result shows that our method achieves satisfactory performance in terms of success rate on VLN-CE tasks. Further analysis explains that our method attains better cross-modal matching and strong generalization ability.

machine learning, natural language, navigation, (15 more...)

2301.04352

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
(3 more...)

Abbas, Ammar N., Chasparis, Georgios, Kelleher, John D.

Interpretable Hidden Markov Model-Based Deep Reinforcement Learning Hierarchical Framework for Predictive Maintenance of Turbofan Engines

arXiv.org Artificial IntelligenceJan-11-2023

An open research question in deep reinforcement learning is how to focus the policy learning of key decisions within a sparse domain. This paper emphasizes combining the advantages of inputoutput hidden Markov models and reinforcement learning towards interpretable maintenance decisions. We propose a novel hierarchical-modeling methodology that, at a high level, detects and interprets the root cause of a failure as well as the health degradation of the turbofan engine, while, at a low level, it provides the optimal replacement policy. It outperforms the baseline performance of deep reinforcement learning methods applied directly to the raw data or when using a hidden Markov model without such a specialized hierarchy. It also provides comparable performance to prior work, however, with the additional benefit of interpretability.

artificial intelligence, deep reinforcement learning hierarchical framework, machine learning, (2 more...)

doi: 10.1007/978-3-031-12670-3_12

2206.13433

Genre: Research Report (0.89)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

arXiv.org Artificial IntelligenceJan-10-2023

Unifying Consciousness and Time to Enhance Artificial Intelligence

Samarawickrama, Mahendra

Consciousness is a sequential process of awareness which can focus on one piece of information at a time. This process of awareness experiences causation which underpins the notion of time while it interplays with matter and energy, forming reality. The study of Consciousness, time and reality is complex and evolving fast in many fields, including metaphysics and fundamental physics. Reality composes patterns in human Consciousness in response to the regularities in nature. These regularities could be physical (e.g., astronomical, environmental), biological, chemical, mental, social, etc. The patterns that emerged in Consciousness were correlated to the environment, life and social behaviours followed by constructed frameworks, systems and structures. The complex constructs evolved as cultures, customs, norms and values, which created a diverse society. In the evolution of responsible AI, it is important to be attuned to the evolved cultural, ethical and moral values through Consciousness. This requires the advocated design of self-learning AI aware of time perception and human ethics.

artificial intelligence, consciousness, machine learning, (14 more...)

2301.08742

Country:

Oceania > Australia (0.04)
Europe > Greece (0.04)

Genre: Research Report (0.83)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.69)

Technology:

Information Technology > Artificial Intelligence > Cognitive Science (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)
Information Technology > Artificial Intelligence > Issues > Philosophy (0.35)

Dash, Adyasha, Agres, Kat R.

AI-Based Affective Music Generation Systems: A Review of Methods, and Challenges

arXiv.org Artificial IntelligenceJan-10-2023

Music is a powerful medium for altering the emotional state of the listener. In recent years, with significant advancement in computing capabilities, artificial intelligence-based (AI-based) approaches have become popular for creating affective music generation (AMG) systems that are empowered with the ability to generate affective music. Entertainment, healthcare, and sensor-integrated interactive system design are a few of the areas in which AI-based affective music generation (AI-AMG) systems may have a significant impact. Given the surge of interest in this topic, this article aims to provide a comprehensive review of AI-AMG systems. The main building blocks of an AI-AMG system are discussed, and existing systems are formally categorized based on the core algorithm used for music generation. In addition, this article discusses the main musical features employed to compose affective music, along with the respective AI-based approaches used for tailoring them. Lastly, the main challenges and open questions in this field, as well as their potential solutions, are presented to guide future research. We hope that this review will be useful for readers seeking to understand the state-of-the-art in AI-AMG systems, and gain an overview of the methods used for developing them, thereby helping them explore this field in the future.

artificial intelligence, evolutionary algorithm, machine learning, (18 more...)

2301.0689

Country:

Asia > Singapore > Central Region > Singapore (0.04)
North America > United States > New York > New York County > New York City (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > Finland (0.04)

Genre:

Overview (1.00)
Research Report > Promising Solution (0.34)

Industry:

Media > Music (1.00)
Leisure & Entertainment > Games > Computer Games (0.67)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
(5 more...)