AITopics | Edmonton

Multi-step methods such as Retrace($\lambda$) and $n$-step $Q$-learning have become a crucial component of modern deep reinforcement learning agents. These methods are often evaluated as a part of bigger architectures and their evaluations rarely include enough samples to draw statistically significant conclusions about their performance. This type of methodology makes it difficult to understand how particular algorithmic details of multi-step methods influence learning. In this paper we combine the $n$-step action-value algorithms Retrace, $Q$-learning, Tree Backup, Sarsa, and $Q(\sigma)$ with an architecture analogous to DQN. We test the performance of all these algorithms in the mountain car environment; this choice of environment allows for faster training times and larger sample sizes. We present statistical analyses on the effects of the off-policy correction, the backup length parameter $n$, and the update frequency of the target network on the performance of these algorithms. Our results show that (1) using off-policy correction can have an adverse effect on the performance of Sarsa and $Q(\sigma)$; (2) increasing the backup length $n$ consistently improved performance across all the different algorithms; and (3) the performance of Sarsa and $Q$-learning was more robust to the effect of the target network update frequency than the performance of Tree Backup, $Q(\sigma)$, and Retrace in this particular task.

algorithm, off-policy correction, update frequency, (13 more...)

arXiv.org Machine Learning

1901.0751

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > Canada > Alberta > Census Division No. 11 > Edmonton Metropolitan Region > Edmonton (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > Canada > Quebec > Montreal (0.04)

Genre: Research Report > New Finding (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Private Q-Learning with Functional Noise in Continuous Spaces

Wang, Baoxiang, Hegde, Nidhi

arXiv.org Machine LearningJan-29-2019

We consider privacy-preserving algorithms for deep reinforcement learning. State-of-the-art methods that guarantee differential privacy are not extendable to very large state spaces because the noise level necessary to ensure privacy would scale to infinity. We address the problem of providing differential privacy in Q-learning where a function approximation through a neural network is used for parametrization. We develop a rigorous and efficient algorithm by inspecting the reproducing kernel Hilbert space in which the neural network is embedded. Our approach uses functional noise to guarantee privacy, while the noise level scales linearly with the complexity of the neural network architecture. There are no known theoretical guarantees on the performance of deep reinforcement learning, but we gain some insight by providing a utility analysis under the discrete space setting.

algorithm, exp, privacy, (11 more...)

arXiv.org Machine Learning

1901.10634

Country:

North America > Canada > Alberta > Census Division No. 11 > Edmonton Metropolitan Region > Edmonton (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Supervised autoencoders: Improving generalization performance with unsupervised regularizers

Le, Lei, Patterson, Andrew, White, Martha

Neural Information Processing SystemsDec-31-2018

Generalization performance is a central goal in machine learning, particularly when learning representations with large neural networks. A common strategy to improve generalization has been through the use of regularizers, typically as a norm constraining the parameters. Regularizing hidden layers in a neural network architecture, however, is not straightforward. There have been a few effective layer-wise suggestions, but without theoretical guarantees for improved performance. In this work, we theoretically and empirically analyze one such model, called a supervised auto-encoder: a neural network that predicts both inputs (reconstruction error) and targets jointly. We provide a novel generalization result for linear auto-encoders, proving uniform stability based on the inclusion of the reconstruction error---particularly as an improvement on simplistic regularization such as norms or even on more advanced regularizations such as the use of auxiliary tasks. Empirically, we then demonstrate that, across an array of architectures with a different number of hidden units and activation functions, the supervised auto-encoder compared to the corresponding standard neural network never harms performance and can significantly improve generalization.

artificial intelligence, machine learning, reconstruction error, (17 more...)

Neural Information Processing Systems

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > Indiana > Monroe County > Bloomington (0.04)
North America > Canada > Quebec > Montreal (0.04)
(2 more...)

Genre: Research Report (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Supervised autoencoders: Improving generalization performance with unsupervised regularizers

Le, Lei, Patterson, Andrew, White, Martha

Neural Information Processing SystemsDec-31-2018

Generalization performance is a central goal in machine learning, particularly when learning representations with large neural networks. A common strategy to improve generalization has been through the use of regularizers, typically as a norm constraining the parameters. Regularizing hidden layers in a neural network architecture, however, is not straightforward. There have been a few effective layer-wise suggestions, but without theoretical guarantees for improved performance. In this work, we theoretically and empirically analyze one such model, called a supervised auto-encoder: a neural network that predicts both inputs (reconstruction error) and targets jointly. We provide a novel generalization result for linear auto-encoders, proving uniform stability based on the inclusion of the reconstruction error---particularly as an improvement on simplistic regularization such as norms or even on more advanced regularizations such as the use of auxiliary tasks. Empirically, we then demonstrate that, across an array of architectures with a different number of hidden units and activation functions, the supervised auto-encoder compared to the corresponding standard neural network never harms performance and can significantly improve generalization.

accuracy, generalization performance, reconstruction error, (15 more...)

Neural Information Processing Systems

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > Indiana > Monroe County > Bloomington (0.04)
North America > Canada > Quebec > Montreal (0.04)
(2 more...)

Genre: Research Report (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Championing Creativity, Artificial Intelligence and Machine Learning At Canada's C-Tribe Festival

#artificialintelligenceOct-30-2018, 22:00:59 GMT

#artificialintelligence

Country: North America > Canada > Alberta > Census Division No. 11 > Edmonton Metropolitan Region > Edmonton (0.33)

Industry: Media > News (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.40)

Add feedback

Google's AI powerhouse DeepMind is opening its first international lab in Canada

#artificialintelligenceOct-9-2018, 02:08:20 GMT

Although it was bought by Google in 2014, AI firm DeepMind has always been true to its British roots -- expanding its offices in London, working closely with UK institutions like the NHS, and even teaching in the country's universities. Now, though, the company is opening its "first ever international AI office" -- in Edmonton, Canada. It's a natural fit for DeepMind, which has close links with the AI research community in Edmonton's University of Alberta. The company says nearly a dozen Alberta grads have joined its ranks, and the firm has sponsored the university's machine learning lab for a number of years. Richard Sutton, professor of computing science at Alberta, was also DeepMind's first outside advisor, and will head up the company's new base along with colleagues Michael Bowling and Patrick Pilarski.

large language model, machine learning, natural language, (14 more...)

#artificialintelligence

Country: North America > Canada > Alberta > Census Division No. 11 > Edmonton Metropolitan Region > Edmonton (0.27)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Variance Reduction in Monte Carlo Counterfactual Regret Minimization (VR-MCCFR) for Extensive Form Games using Baselines

Schmid, Martin, Burch, Neil, Lanctot, Marc, Moravcik, Matej, Kadlec, Rudolf, Bowling, Michael

arXiv.org Artificial IntelligenceSep-9-2018

Learning strategies for imperfect information games from samples of interaction is a challenging problem. A common method for this setting, Monte Carlo Counterfactual Regret Minimization (MCCFR), can have slow long-term convergence rates due to high variance. In this paper, we introduce a variance reduction technique (VR-MCCFR) that applies to any sampling variant of MCCFR. Using this technique, per-iteration estimated values and updates are reformulated as a function of sampled values and state-action baselines, similar to their use in policy gradient reinforcement learning. The new formulation allows estimates to be bootstrapped from other estimates within the same episode, propagating the benefits of baselines along the sampled trajectory; the estimates remain unbiased even when bootstrapping from other estimates. Finally, we show that given a perfect baseline, the variance of the value estimates can be reduced to zero. Experimental evaluation shows that VR-MCCFR brings an order of magnitude speedup, while the empirical variance decreases by three orders of magnitude. The decreased variance allows for the first time CFR+ to be used with sampling, increasing the speedup to two orders of magnitude.

baseline, counterfactual value, information, (14 more...)

arXiv.org Artificial Intelligence

1809.03057

Country:

North America > United States > Texas (0.04)
North America > Canada > Alberta > Census Division No. 11 > Edmonton Metropolitan Region > Edmonton (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.40)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.89)

Add feedback

Distributed dynamic modeling and monitoring for large-scale industrial processes under closed-loop control

Li, Wenqing, Zhao, Chunhui, Huang, Biao

arXiv.org Machine LearningSep-7-2018

For large-scale industrial processes under closed-loop control, process dynamics directly resulting from control action are typical characteristics and may show different behaviors between real faults and normal changes of operating conditions. However, conventional distributed monitoring approaches do not consider the closed-loop control mechanism and only explore static characteristics, which thus are incapable of distinguishing between real process faults and nominal changes of operating conditions, leading to unnecessary alarms. In this regard, this paper proposes a distributed monitoring method for closed-loop industrial processes by concurrently exploring static and dynamic characteristics. First, the large-scale closed-loop process is decomposed into several subsystems by developing a sparse slow feature analysis (SSFA) algorithm which capture changes of both static and dynamic information. Second, distributed models are developed to separately capture static and dynamic characteristics from the local and global aspects. Based on the distributed monitoring system, a two-level monitoring strategy is proposed to check different influences on process characteristics resulting from changes of the operating conditions and control action, and thus the two changes can be well distinguished from each other. Case studies are conducted based on both benchmark data and real industrial process data to illustrate the effectiveness of the proposed method.

artificial intelligence, machine learning, subset, (18 more...)

arXiv.org Machine Learning

1809.03343

Country:

North America > United States > Tennessee (0.04)
North America > Canada > Alberta > Census Division No. 11 > Edmonton Metropolitan Region > Edmonton (0.04)
Asia > China > Zhejiang Province > Hangzhou (0.04)

Genre: Research Report (0.50)

Industry: Energy > Renewable > Geothermal > Geothermal Energy Systems and Facilities > Geothermal System for Power Generation > Advanced Geothermal System (AGS) (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.94)

Add feedback

Deep Learning-Based Decoding for Constrained Sequence Codes

Cao, Congzhe, Li, Duanshun, Fair, Ivan

arXiv.org Machine LearningSep-6-2018

Constrained sequence codes have been widely used in modern communication and data storage systems. Sequences encoded with constrained sequence codes satisfy constraints imposed by the physical channel, hence enabling efficient and reliable transmission of coded symbols. Traditional encoding and decoding of constrained sequence codes rely on table look-up, which is prone to errors that occur during transmission. In this paper, we introduce constrained sequence decoding based on deep learning. With multiple layer perception (MLP) networks and convolutional neural networks (CNNs), we are able to achieve low bit error rates that are close to maximum a posteriori probability (MAP) decoding as well as improve the system throughput. Moreover, implementation of capacity-achieving fixed-length codes, where the complexity is prohibitively high with table look-up decoding, becomes practical with deep learning-based decoding.

artificial intelligence, machine learning, mlp network, (18 more...)

arXiv.org Machine Learning

1809.01859

Country:

North America > Canada > Alberta > Census Division No. 11 > Edmonton Metropolitan Region > Edmonton (0.04)
North America > United States > Texas > Harris County > Houston (0.04)
North America > United States > California > Monterey County > Pacific Grove (0.04)
(2 more...)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Improbotics: Exploring the Imitation Game using Machine Intelligence in Improvised Theatre

Mathewson, Kory W., Mirowski, Piotr

arXiv.org Artificial IntelligenceSep-5-2018

Theatrical improvisation (impro or improv) is a demanding form of live, collaborative performance. Improv is a humorous and playful artform built on an open-ended narrative structure which simultaneously celebrates effort and failure. It is thus an ideal test bed for the development and deployment of interactive artificial intelligence (AI)-based conversational agents, or artificial improvisors. This case study introduces an improv show experiment featuring human actors and artificial improvisors. We have previously developed a deep-learning-based artificial improvisor, trained on movie subtitles, that can generate plausible, context-based, lines of dialogue suitable for theatre (Mathewson and Mirowski 2017). In this work, we have employed it to control what a subset of human actors say during an improv performance. We also give human-generated lines to a different subset of performers. All lines are provided to actors with headphones and all performers are wearing headphones. This paper describes a Turing test, or imitation game, taking place in a theatre, with both the audience members and the performers left to guess who is a human and who is a machine. In order to test scientific hypotheses about the perception of humans versus machines we collect anonymous feedback from volunteer performers and audience members. Our results suggest that rehearsal increases proficiency and possibility to control events in the performance. That said, consistency with real world experience is limited by the interface and the mechanisms used to perform the show. We also show that human-generated lines are shorter, more positive, and have less difficult words with more grammar and spelling mistakes than the artificial improvisor generated lines.

machine learning, natural language, performer, (19 more...)

arXiv.org Artificial Intelligence

1809.01807

Country:

North America > Canada > Alberta > Census Division No. 11 > Edmonton Metropolitan Region > Edmonton (0.14)
Europe > United Kingdom > England > Greater London > London (0.14)
North America > Canada > Ontario > Toronto (0.04)
(3 more...)

Genre: Research Report > New Finding (0.86)

Industry:

Media (1.00)
Leisure & Entertainment > Games (0.93)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Filters

Collaborating Authors

Edmonton

Understanding Multi-Step Deep Reinforcement Learning: A Systematic Study of the DQN Target

Private Q-Learning with Functional Noise in Continuous Spaces

Supervised autoencoders: Improving generalization performance with unsupervised regularizers

Supervised autoencoders: Improving generalization performance with unsupervised regularizers

Championing Creativity, Artificial Intelligence and Machine Learning At Canada's C-Tribe Festival

Google's AI powerhouse DeepMind is opening its first international lab in Canada

Variance Reduction in Monte Carlo Counterfactual Regret Minimization (VR-MCCFR) for Extensive Form Games using Baselines

Distributed dynamic modeling and monitoring for large-scale industrial processes under closed-loop control

Deep Learning-Based Decoding for Constrained Sequence Codes

Improbotics: Exploring the Imitation Game using Machine Intelligence in Improvised Theatre