AITopics

1901.08296

Country:

North America > United States > California > San Francisco County > San Francisco (0.13)
Europe > France (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
Europe > Belgium > Flanders > Flemish Brabant > Leuven (0.04)

Genre:

Workflow (0.92)
Overview (0.92)
Research Report > New Finding (0.45)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Energy (0.92)
Information Technology (0.92)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.45)

Azizzadenesheli, Kamyar, Anandkumar, Animashree

Efficient Exploration through Bayesian Deep Q-Networks

We propose Bayesian Deep Q-Networks (BDQN), a Thompson sampling approach for Deep Reinforcement Learning (DRL) in Markov decision processes (MDP). BDQN is an efficient exploration-exploitation algorithm which combines Thompson sampling with deep-Q networks (DQN) and directly incorporates uncertainty over the Q-value in the last layer of the DQN, on the feature representation layer. This allows us to efficiently carry out Thompson sampling through Gaussian sampling and Bayesian Linear Regression (BLR), which has fast closed-form updates. We apply our method to a wide range of Atari games and compare BDQN to a powerful baseline: the double deep Q-network (DDQN). Since BDQN carries out more efficient exploration, it is able to reach higher rewards substantially faster: in less than 5M-+1M interactions for almost half of the games to reach DDQN scores. We also establish theoretical guarantees for the special case when the feature representation is d-dimensional and fixed. We provide the Bayesian regret of posterior sampling RL (PSRL) and frequentist regret of the optimism in the face of uncertainty (OFU) for episodic MDPs.

computer game, exploration, upstream oil & gas, (20 more...)

1802.04412

Country:

Europe > Hungary (0.14)
North America > United States > California (0.14)

Genre: Research Report (0.82)

Industry:

Leisure & Entertainment > Games > Computer Games (0.69)
Energy > Oil & Gas > Upstream (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
(2 more...)

Rocher, Gérald, Tigli, Jean-Yves, Lavirotte, Stéphane, Thanh, Nhan Le

Effectiveness Assessment of Cyber-Physical Systems

arXiv.org Artificial IntelligenceJan-23-2019

By achieving their purposes through interactions with the physical world, Cyber Physical Systems (CPS) pose new challenges. Indeed, the evolution of the physical systems they control with transducers can be affected by surrounding physical processes over which they have no control and which may potentially hamper the achievement of their purposes. While it is illusory to hope for a comprehensive model of the physical environment at design time to anticipate and remove faults that may occur once these systems are deployed, it becomes necessary to evaluate their degree of effectiveness in vivo.In this paper, the degree of effectiveness is formally defined and generalized in the context of the measure theory and the mathematical properties it has to comply with are detailed. The measure is developed in the context of the Transferable Belief Model (TBM), an elaboration on the Dempster Shafer Theory (DST) of evidence so as to handle epistemic and aleatory uncertainties respectively pertaining the users expectations and the natural variability of the physical environment. This theoretical framework has several advantages over the probability and the possibility theories. (1) It is built on the Open World Assumption (OWA), (2) it allows to cope with dependent and possibly unreliable sources of information. The TBM is used in conjunction with the Input Output Hidden Markov Modeling framework (IOHMM) to specify the expected evolution of the physical system controlled by the CPS and the tolerances towards uncertainties. The measure of effectiveness is obtained from the forward algorithm, leveraging the conflict entailed by the successive combinations of the beliefs obtained from observations of the physical system and the beliefs corresponding to its expected evolution. The conflict, inherent to OWA, is meant to quantify the inability of the model at explaining observations.

data mining, effectiveness, machine learning, (21 more...)

arXiv.org Artificial Intelligence

1901.06343

Country:

Europe > Austria > Vienna (0.14)
Europe > Switzerland (0.04)
Europe > Romania > Sud-Est Development Region > Tulcea County > Tulcea (0.04)
(3 more...)

Genre: Research Report (0.40)

Industry:

Information Technology (0.46)
Water & Waste Management > Water Management (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.68)
(2 more...)

Radanovic, Goran, Devidze, Rati, Parkes, David, Singla, Adish

Learning to Collaborate in Markov Decision Processes

We consider a two-agent MDP framework where agents repeatedly solve a task in a collaborative setting. We study the problem of designing a learning algorithm for the first agent (A1) that facilitates a successful collaboration even in cases when the second agent (A2) is adapting its policy in an unknown way. The key challenge in our setting is that the presence of the second agent leads to non-stationarity and non-obliviousness of rewards and transitions for the first agent. We design novel online learning algorithms for agent A1 whose regret decays as $O(T^{1-\frac{3}{7} \cdot \alpha})$ with $T$ learning episodes provided that the magnitude of agent A2's policy changes between any two consecutive episodes are upper bounded by $O(T^{-\alpha})$. Here, the parameter $\alpha$ is assumed to be strictly greater than $0$, and we show that this assumption is necessary provided that the {\em learning parity with noise} problem is computationally hard. We show that sub-linear regret of agent A1 further implies near-optimality of the agents' joint return for MDPs that manifest the properties of a {\em smooth} game.

agent, collaborate, learning, (14 more...)

1901.08029

Genre: Research Report (0.50)

Industry: Education (0.66)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.52)

Thirty Years of Machine Learning:The Road to Pareto-Optimal Next-Generation Wireless Networks

Wang, Jingjing, Jiang, Chunxiao, Zhang, Haijun, Ren, Yong, Chen, Kwang-Cheng, Hanzo, Lajos

Next-generation wireless networks (NGWN) have a substantial potential in terms of supporting a broad range of complex compelling applications both in military and civilian fields, where the users are able to enjoy high-rate, low-latency, low-cost and reliable information services. Achieving this ambitious goal requires new radio techniques for adaptive learning and intelligent decision making because of the complex heterogeneous nature of the network structures and wireless services. Machine learning algorithms have great success in supporting big data analytics, efficient parameter estimation and interactive decision making. Hence, in this article, we review the thirty-year history of machine learning by elaborating on supervised learning, unsupervised learning, reinforcement learning and deep learning, respectively. Furthermore, we investigate their employment in the compelling applications of NGWNs, including heterogeneous networks (HetNets), cognitive radios (CR), Internet of things (IoT), machine to machine networks (M2M), and so on. This article aims for assisting the readers in clarifying the motivation and methodology of the various machine learning algorithms, so as to invoke them for hitherto unexplored services as well as scenarios of future wireless networks.

algorithm, ieee transaction, wireless network, (14 more...)

1902.01946

Country:

North America > United States > Florida > Hillsborough County > Tampa (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
Asia > South Korea (0.14)
(37 more...)

Genre:

Research Report > New Finding (1.00)
Overview (1.00)

Industry:

Transportation (1.00)
Telecommunications > Networks (1.00)
Information Technology > Security & Privacy (1.00)
(5 more...)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
(8 more...)

Bhattacharyya, Rajarshi, Bura, Archana, Rengarajan, Desik, Rumuly, Mason, Shakkottai, Srinivas, Kalathil, Dileep, Mok, Ricky K. P., Dhamdhere, Amogh

QFlow: A Reinforcement Learning Approach to High QoE Video Streaming over Wireless Networks

Wireless Internet access has brought legions of heterogeneous applications all sharing the same resources. However, current wireless edge networks that cater to worst or average case performance lack the agility to best serve these diverse sessions. Simultaneously, software reconfigurable infrastructure has become increasingly mainstream to the point that dynamic per packet and per flow decisions are possible at multiple layers of the communications stack. Exploiting such reconfigurability requires the design of a system that can enable a configuration, measure the impact on the application performance (Quality of Experience), and adaptively select a new configuration. Effectively, this feedback loop is a Markov Decision Process whose parameters are unknown. The goal of this work is to design, develop and demonstrate QFlow that instantiates this feedback loop as an application of reinforcement learning (RL). Our context is that of reconfigurable (priority) queueing, and we use the popular application of video streaming as our use case. We develop both model-free and model-based RL approaches that are tailored to the problem of determining which clients should be assigned to which queue at each decision period. Through experimental validation, we show how the RL-based control policies on QFlow are able to schedule the right clients for prioritization in a high-load scenario to outperform the status quo, as well as the best known solutions with over 25% improvement in QoE, and a perfect QoE score of 5 over 85% of the time.

application, qoe, queue, (15 more...)

1901.00959

Country:

North America > United States > Texas > Brazos County > College Station (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
North America > United States > Arizona > Maricopa County > Phoenix (0.04)

Genre: Research Report (0.82)

Industry:

Telecommunications (0.46)
Leisure & Entertainment > Games (0.46)
Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.48)

arXiv.org Machine LearningJan-22-2019

Minimal penalties and the slope heuristics: a survey

Arlot, Sylvain

Birg{\'e} and Massart proposed in 2001 the slope heuristics as a way to choose optimally from data an unknown multiplicative constant in front of a penalty. It is built upon the notion of minimal penalty, and it has been generalized since to some 'minimal-penalty algorithms'. This paper reviews the theoretical results obtained for such algorithms, with a self-contained proof in the simplest framework, precise proof ideas for further generalizations, and a few new results. Explicit connections are made with residual-variance estimators-with an original contribution on this topic, showing that for this task the slope heuristics performs almost as well as a residual-based estimator with the best model choice-and some classical algorithms such as L-curve or elbow heuristics, Mallows' C p , and Akaike's FPE. Practical issues are also addressed, including two new practical definitions of minimal-penalty algorithms that are compared on synthetic data to previously-proposed definitions. Finally, several conjectures and open problems are suggested as future research directions.

estimator, penalty, soumis au journal, (15 more...)

1901.07277

Country:

North America > United States > New York (0.04)
North America > United States > Ohio > Montgomery County > Dayton (0.04)
North America > United States > Florida > Palm Beach County > Boca Raton (0.04)
(9 more...)

Genre:

Research Report > New Finding (1.00)
Overview (1.00)

Industry: Energy (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.93)
Information Technology > Data Science > Data Mining (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.45)

#artificialintelligenceJan-20-2019, 14:11:23 GMT

How AI could help you learn sign language

Sign languages aren't easy to learn and are even harder to teach. They use not just hand gestures but also mouthings, facial expressions and body posture to communicate meaning. This complexity means professional teaching programmes are still rare and often expensive. But this could all change soon, with a little help from artificial intelligence (AI). My colleagues and I are working on software for teaching yourself sign languages in an automated, intuitive way.

artificial intelligence, machine learning, sign language, (10 more...)

#artificialintelligence

Industry: Education > Curriculum > Subject-Specific Education (0.93)

Technology:

Information Technology > Artificial Intelligence > Vision (0.75)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.32)

#artificialintelligenceJan-19-2019, 15:36:13 GMT

How AI Could Help You Learn Sign Language

artificial intelligence, machine learning, sign language, (10 more...)

#artificialintelligence

Industry: Education > Curriculum > Subject-Specific Education (0.93)

Technology:

Information Technology > Artificial Intelligence > Vision (0.74)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.32)

Wong, Melvin, Farooq, Bilal

A combined entropy and utility based generative model for large scale multiple discrete-continuous travel behaviour data

arXiv.org Machine LearningJan-18-2019

Generative models, either by simple clustering algorithms or deep neural network architecture, have been developed as a probabilistic estimation method for dimension reduction or to model the underlying properties of data structures. Although their apparent use has largely been limited to image recognition and classification, generative machine learning algorithms can be a powerful tool for travel behaviour research. In this paper, we examine the generative machine learning approach for analyzing multiple discrete-continuous (MDC) travel behaviour data to understand the underlying heterogeneity and correlation, increasing the representational power of such travel behaviour models. We show that generative models are conceptually similar to choice selection behaviour process through information entropy and variational Bayesian inference. Specifically, we consider a restricted Boltzmann machine (RBM) based algorithm with multiple discrete-continuous layer, formulated as a variational Bayesian inference optimization problem. We systematically describe the proposed machine learning algorithm and develop a process of analyzing travel behaviour data from a generative learning perspective. We show parameter stability from model analysis and simulation tests on an open dataset with multiple discrete-continuous dimensions and a size of 293,330 observations. For interpretability, we derive analytical methods for conditional probabilities as well as elasticities. Our results indicate that latent variables in generative models can accurately represent joint distribution consistently w.r.t multiple discrete-continuous variables. Lastly, we show that our model can generate statistically similar data distributions for travel forecasting and prediction.

algorithm, generative model, latent variable, (16 more...)

1901.06415

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > Canada > Quebec > Montreal (0.05)
North America > United States > New York (0.04)
(2 more...)

Genre: Research Report > New Finding (0.88)

Industry:

Transportation > Passenger (1.00)
Transportation > Ground > Road (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(2 more...)