AITopics

1912.03535

Country:

Asia > Japan > Honshū > Kansai > Kyoto Prefecture > Kyoto (0.04)
North America > United States > New York (0.04)
Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.04)

Genre: Research Report (1.00)

Industry: Transportation (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots > Humanoid Robots (0.50)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Kamalzadeh, Hossein, Ahmadi, Abbas, Mansour, Saeed

Clustering Time-Series by a Novel Slope-Based Similarity Measure Considering Particle Swarm Optimization

arXiv.org Machine LearningDec-5-2019

Recently there has been an increase in the studies on time - series data mining specifically time - series clustering due to the vast existe nce of time - series in various domains. The large volume of data in the form of time - series make s it necessary to employ various techniques such as clustering to understand the data and to extract information and hidden patterns. In the field of clustering specifically, time - series clustering, the most important aspects are the similarity measure used and the algorithm employed to conduct the clustering. In this paper, a new similarity measure for time - series clustering is developed based on a combination of a simple representation of time - series, slope of each segment of time - series, Euclidean distance and the so - called dynamic time warping. It is proved in this paper that the proposed distance measure is metric and thus indexing can be applied. For the task of clustering, the Particle Swarm Optimization algorithm is employed. The proposed similarity measure is compared to three existing measures in terms of various criteria used for the evaluation of clustering algorithms. The results indicate that the propo sed similarity measure outperforms the rest in almost every dataset used in this paper.

algorithm, distance measure, sin 1, (14 more...)

1912.02405

Country:

North America > United States > Texas > Dallas County > Dallas (0.04)
North America > United States > New York > New York County > New York City (0.04)
Asia > Middle East > Iran > Tehran Province > Tehran (0.04)
(2 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)

Gaon, Maor, Brafman, Ronen I.

Reinforcement Learning with Non-Markovian Rewards

arXiv.org Artificial IntelligenceDec-5-2019

The standard RL world model is that of a Markov Decision Process (MDP). A basic premise of MDPs is that the rewards depend on the last state and action only. Yet, many real-world rewards are non-Markovian. For example, a reward for bringing coffee only if requested earlier and not yet served, is non-Markovian if the state only records current requests and deliveries. Past work considered the problem of modeling and solving MDPs with non-Markovian rewards (NMR), but we know of no principled approaches for RL with NMR. Here, we address the problem of policy learning from experience with such rewards. We describe and evaluate empirically four combinations of the classical RL algorithm Q-learning and R-max with automata learning algorithms to obtain new RL algorithms for domains with NMR. We also prove that some of these variants converge to an optimal policy in the limit.

algorithm, automata, automaton, (16 more...)

1912.02552

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Israel > Southern District > Beer-Sheva (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Kamalzadeh, Hossein, Sobhan, Saeid Nassim, Boskabadi, Azam, Hatami, Mohsen, Gharehyakheh, Amin

Modeling and Prediction of Iran's Steel Consumption Based on Economic Activity Using Support Vector Machines

arXiv.org Machine LearningDec-4-2019

The steel industry has great impacts on the economy and the environment of both developed and underdeveloped countries. The importance of this industry and these impacts have led many researchers to investigate the relationship between a country's steel consumption and its economic activity resulting in the so-called intensity of use model. This paper investigates the validity of the intensity of use model for the case of Iran's steel consumption and extends this hypothesis by using the indexes of economic activity to model the steel consumption. We use the proposed model to train support vector machines and predict the future values for Iran's steel consumption. The paper provides detailed correlation tests for the factors used in the model to check for their relationships with the steel consumption. The results indicate that Iran's steel consumption is strongly correlated with its economic activity following the same pattern as the economy has been in the last four decades.

artificial intelligence, banking & finance, steel consumption, (18 more...)

1912.02373

Country:

Asia > Middle East > Iran (1.00)
North America > United States > Texas (0.28)
Europe > United Kingdom (0.14)
North America > United States > Florida > Alachua County > Gainesville (0.14)

Genre: Research Report > Experimental Study (0.48)

Industry:

Materials > Metals & Mining > Steel (1.00)
Energy > Oil & Gas (1.00)
Banking & Finance > Economy (1.00)
Transportation (0.96)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Russel, Reazul Hasan, Behzadian, Bahram, Petrik, Marek

Optimizing Norm-Bounded Weighted Ambiguity Sets for Robust MDPs

arXiv.org Artificial IntelligenceDec-4-2019

Optimal policies in Markov decision processes (MDPs) are very sensitive to model misspecification. This raises serious concerns about deploying them in high-stake domains. Robust MDPs (RMDP) provide a promising framework to mitigate vulnerabilities by computing policies with worst-case guarantees in reinforcement learning. The solution quality of an RMDP depends on the ambiguity set, which is a quantification of model uncertainties. In this paper, we propose a new approach for optimizing the shape of the ambiguity sets for RMDPs. Our method departs from the conventional idea of constructing a norm-bounded uniform and symmetric ambiguity set. We instead argue that the structure of a near-optimal ambiguity set is problem specific. Our proposed method computes a weight parameter from the value functions, and these weights then drive the shape of the ambiguity sets. Our theoretical analysis demonstrates the rationale of the proposed idea. We apply our method to several different problem domains, and the empirical results further furnish the practical promise of weighted near-optimal ambiguity sets.

ambiguity, artificial intelligence, machine learning, (13 more...)

1912.02696

Country:

North America > United States > New Hampshire (0.04)
North America > United States > Massachusetts > Middlesex County > Belmont (0.04)
North America > Canada (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.35)

Linzner, Dominik, Koeppl, Heinz

A Variational Perturbative Approach to Planning in Graph-based Markov Decision Processes

arXiv.org Machine LearningDec-4-2019

Coordinating multiple interacting agents to achieve a common goal is a difficult task with huge applicability. This problem remains hard to solve, even when limiting interactions to be mediated via a static interaction-graph. We present a novel approximate solution method for multi-agent Markov decision problems on graphs, based on variational perturbation theory. We adopt the strategy of planning via inference, which has been explored in various prior works. We employ a non-trivial extension of a novel high-order variational method that allows for approximate inference in large networks and has been shown to surpass the accuracy of existing variational methods. To compare our method to two state-of-the-art methods for multi-agent planning on graphs, we apply the method different standard GMDP problems. We show that in cases, where the goal is encoded as a non-local cost function, our method performs well, while state-of-the-art methods approach the performance of random guess. In a final experiment, we demonstrate that our method brings significant improvement for synchronization tasks.

gmdp, reward function, sabbadin, (14 more...)

1912.01849

Country: Europe > Germany > Hesse > Darmstadt Region > Darmstadt (0.04)

Genre: Research Report > Promising Solution (0.54)

Industry: Health & Medicine (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.83)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.66)

arXiv.org Artificial IntelligenceDec-3-2019

Optimal Farsighted Agents Tend to Seek Power

Turner, Alexander Matt

Some researchers have speculated that capable reinforcement learning (RL) agents pursuing misspecified objectives are often incentivized to seek resources and power in pursuit of those objectives. An agent seeking power is incentivized to behave in undesirable ways, including rationally preventing deactivation and correction. Others have voiced skepticism: humans seem idiosyncratic in their urges to power, which need not be present in the agents we design. We formalize a notion of power within the context of finite deterministic Markov decision processes (MDPs). We prove that, with respect to a wide class of reward function distributions, optimal policies tend to seek power over the environment.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

1912.01683

Country:

North America > United States > Oregon (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.40)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

arXiv.org Machine LearningDec-3-2019

Continuous Online Learning and New Insights to Online Imitation Learning

Lee, Jonathan, Cheng, Ching-An, Goldberg, Ken, Boots, Byron

Online learning is a powerful tool for analyzing iterative algorithms. However, the classic adversarial setup sometimes fails to capture certain regularity in online problems in practice. Motivated by this, we establish a new setup, called Continuous Online Learning (COL), where the gradient of online loss function changes continuously across rounds with respect to the learner's decisions. We show that COL covers and more appropriately describes many interesting applications, from general equilibrium problems (EPs) to optimization in episodic MDPs. Using this new setup, we revisit the difficulty of achieving sublinear dynamic regret. We prove that there is a fundamental equivalence between achieving sublinear dynamic regret in COL and solving certain EPs, and we present a reduction from dynamic regret to both static regret and convergence rate of the associated EP. At the end, we specialize these new insights into online imitation learning and show improved understanding of its learning stability.

algorithm, dynamic regret, sublinear dynamic regret, (15 more...)

1912.01261

Country:

North America > United States > California > Alameda County > Berkeley (0.04)
North America > Canada (0.04)

Genre: Research Report (0.50)

Industry: Education > Educational Setting > Online (0.85)

Technology:

Information Technology > Enterprise Applications > Human Resources > Learning Management (0.85)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Hafner, Danijar, Lillicrap, Timothy, Ba, Jimmy, Norouzi, Mohammad

Dream to Control: Learning Behaviors by Latent Imagination

arXiv.org Artificial IntelligenceDec-3-2019

Learned world models summarize an agent's experience to facilitate learning complex behaviors. While learning world models from high-dimensional sensory inputs is becoming feasible through deep learning, there are many potential ways for deriving behaviors from them. We present Dreamer, a reinforcement learning agent that solves long-horizon tasks from images purely by latent imagination. We efficiently learn behaviors by propagating analytic gradients of learned state values back through trajectories imagined in the compact state space of a learned world model. On 20 challenging visual control tasks, Dreamer exceeds existing approaches in data-efficiency, computation time, and final performance.

arxiv preprint arxiv, dreamer, world model, (15 more...)

1912.01603

Country:

North America > Canada > Ontario > Toronto (0.14)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment > Games (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

#artificialintelligenceDec-2-2019, 01:59:04 GMT

Introduction to Markov Chains

Imagine that there were two possible states for weather: sunny or cloudy. You can always directly observe the current weather state, and it is guaranteed to always be one of the two aforementioned states.Now, you decide you want to be able to predict what the weather will be like tomorrow. Intuitively, you assume that there is an inherent transition in this process, in that the current weather has some bearing on what the next day's weather will be. So, being the dedicated person that you are, you collect weather data over several years, and calculate that the chance of a sunny day occurring after a cloudy day is 0.25. You also note that, by extension, the chance of a cloudy day occurring after a cloudy day must be 0.75, since there are only two possible states.You can now use this distribution to predict weather for days to come, based on what the current weather state is at the time.

cloudy day, markov chain, weather, (2 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.40)