AITopics | Edmonton

Collaborating Authors

Edmonton

Learning Expected Emphatic Traces for Deep RL

Jiang, Ray, Zhang, Shangtong, Chelu, Veronica, White, Adam, van Hasselt, Hado

arXiv.org Machine LearningJul-12-2021

Off-policy sampling and experience replay are key for improving sample efficiency and scaling model-free temporal difference learning methods. When combined with function approximation, such as neural networks, this combination is known as the deadly triad and is potentially unstable. Recently, it has been shown that stability and good performance at scale can be achieved by combining emphatic weightings and multi-step updates. This approach, however, is generally limited to sampling complete trajectories in order, to compute the required emphatic weighting. In this paper we investigate how to combine emphatic weightings with non-sequential, off-line data sampled from a replay buffer. We develop a multi-step emphatic weighting that can be combined with replay, and a time-reversed $n$-step TD learning algorithm to learn the required emphatic weighting. We show that these state weightings reduce variance compared with prior approaches, while providing convergence guarantees. We tested the approach at scale on Atari 2600 video games, and observed that the new X-ETD($n$) agent improved over baseline agents, highlighting both the scalability and broad applicability of our approach.

computer game, neural network, x-etd, (16 more...)

arXiv.org Machine Learning

2107.05405

Country:

North America > Canada > Quebec > Montreal (0.14)
North America > Canada > Alberta > Census Division No. 11 > Edmonton Metropolitan Region > Edmonton (0.14)

Genre: Research Report (0.64)

Industry: Leisure & Entertainment > Games > Computer Games (0.49)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

A Two-stage Framework and Reinforcement Learning-based Optimization Algorithms for Complex Scheduling Problems

He, Yongming, Wu, Guohua, Chen, Yingwu, Pedrycz, Witold

arXiv.org Artificial IntelligenceMar-9-2021

There hardly exists a general solver that is efficient for scheduling problems due to their diversity and complexity. In this study, we develop a two-stage framework, in which reinforcement learning (RL) and traditional operations research (OR) algorithms are combined together to efficiently deal with complex scheduling problems. The scheduling problem is solved in two stages, including a finite Markov decision process (MDP) and a mixed-integer programming process, respectively. This offers a novel and general paradigm that combines RL with OR approaches to solving scheduling problems, which leverages the respective strengths of RL and OR: The MDP narrows down the search space of the original problem through an RL method, while the mixed-integer programming process is settled by an OR algorithm. These two stages are performed iteratively and interactively until the termination criterion has been met. Under this idea, two implementation versions of the combination methods of RL and OR are put forward. The agile Earth observation satellite scheduling problem is selected as an example to demonstrate the effectiveness of the proposed scheduling framework and methods. The convergence and generalization capability of the methods are verified by the performance of training scenarios, while the efficiency and accuracy are tested in 50 untrained scenarios. The results show that the proposed algorithms could stably and efficiently obtain satisfactory scheduling schemes for agile Earth observation satellite scheduling problems. In addition, it can be found that RL-based optimization algorithms have stronger scalability than non-learning algorithms. This work reveals the advantage of combining reinforcement learning methods with heuristic methods or mathematical programming methods for solving complex combinatorial optimization problems.

algorithm, computer based training, educational technology, (21 more...)

arXiv.org Artificial Intelligence

2103.05847

Country: North America > Canada > Alberta > Census Division No. 11 > Edmonton Metropolitan Region > Edmonton (0.14)

Genre: Research Report > New Finding (0.86)

Industry:

Transportation (0.94)
Energy (0.67)
Education > Educational Technology > Educational Software > Computer Based Training (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Add feedback

Future of distracted driving technology makes Edmonton pitch CBC News

#artificialintelligenceAug-25-2019, 15:16:28 GMT

An Australian-based technology firm that uses artificial intelligence to catch distracted drivers made a pitch to an Edmonton conference on Friday. Acusensus presented its automatic camera enforcement technology at the International Conference on Urban Traffic Safety. Founded in early 2018, the company made international headlines with a pilot program in Australia earlier this year. The Acusensus camera system is mounted on the side or above the road, like photo radar. But unlike photo radar, the system takes high-resolution pictures of every passing car.

artificial intelligence, data mining, jannink, (7 more...)

#artificialintelligence

Country: North America > Canada > Alberta > Census Division No. 11 > Edmonton Metropolitan Region > Edmonton (0.40)

Industry:

Information Technology > Security & Privacy (0.68)
Transportation > Ground > Road (0.62)

Technology:

Information Technology > Artificial Intelligence (0.92)
Information Technology > Security & Privacy (0.68)
Information Technology > Data Science > Data Mining (0.52)
Information Technology > Communications > Web (0.52)

Add feedback

Predicting the Long-Term Outcomes of Biologics in Psoriasis Patients Using Machine Learning

Emam, Sepideh, Du, Amy X., Surmanowicz, Philip, Thomsen, Simon F., Greiner, Russ, Gniadecki, Robert

arXiv.org Machine LearningAug-25-2019

Background. Real-world data show that approximately 50% of psoriasis patients treated with a biologic agent will discontinue the drug because of loss of efficacy. History of previous therapy with another biologic, female sex and obesity were identified as predictors of drug discontinuations, but their individual predictive value is low. Objectives. To determine whether machine learning algorithms can produce models that can accurately predict outcomes of biologic therapy in psoriasis on individual patient level. Results. All tested machine learning algorithms could accurately predict the risk of drug discontinuation and its cause (e.g. lack of efficacy vs adverse event). The learned generalized linear model achieved diagnostic accuracy of 82%, requiring under 2 seconds per patient using the psoriasis patients dataset. Input optimization analysis established a profile of a patient who has best chances of long-term treatment success: biologic-naive patient under 49 years, early-onset plaque psoriasis without psoriatic arthritis, weight < 100 kg, and moderate-to-severe psoriasis activity (DLQI $\geq$ 16; PASI $\geq$ 10). Moreover, a different generalized linear model is used to predict the length of treatment for each patient with mean absolute error (MAE) of 4.5 months. However Pearson Correlation Coefficient indicates 0.935 linear dependencies between the actual treatment lengths and predicted ones. Conclusions. Machine learning algorithms predict the risk of drug discontinuation and treatment duration with accuracy exceeding 80%, based on a small set of predictive variables. This approach can be used as a decision-making tool, communicating expected outcomes to the patient, and development of evidence-based guidelines.

diabetes, discontinuation, rheumatology, (23 more...)

arXiv.org Machine Learning

1908.09251

Country: North America > Canada > Alberta > Census Division No. 11 > Edmonton Metropolitan Region > Edmonton (0.14)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.70)

Industry:

Health & Medicine > Therapeutic Area > Rheumatology (1.00)
Health & Medicine > Therapeutic Area > Dermatology (1.00)
Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (0.69)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.95)

Add feedback

Examining the Use of Temporal-Difference Incremental Delta-Bar-Delta for Real-World Predictive Knowledge Architectures

Günther, Johannes, Ady, Nadia M., Kearney, Alex, Dawson, Michael R., Pilarski, Patrick M.

arXiv.org Artificial IntelligenceAug-15-2019

Predictions and predictive knowledge have seen recent success in improving not only robot control but also other applications ranging from industrial process control to rehabilitation. A property that makes these predictive approaches well suited for robotics is that they can be learned online and incrementally through interaction with the environment. However, a remaining challenge for many prediction-learning approaches is an appropriate choice of prediction-learning parameters, especially parameters that control the magnitude of a learning machine's updates to its predictions (the learning rate or step size). To begin to address this challenge, we examine the use of online step-size adaptation using a sensor-rich robotic arm. Our method of choice, Temporal-Difference Incremental Delta-Bar-Delta (TIDBD), learns and adapts step sizes on a feature level; importantly, TIDBD allows step-size tuning and representation learning to occur at the same time. We show that TIDBD is a practical alternative for classic Temporal-Difference (TD) learning via an extensive parameter search. Both approaches perform comparably in terms of predicting future aspects of a robotic data stream. Furthermore, the use of a step-size adaptation method like TIDBD appears to allow a system to automatically detect and characterize common sensor failures in a robotic application. Together, these results promise to improve the ability of robotic devices to learn from interactions with their environments in a robust way, providing key capabilities for autonomous agents and robots.

artificial intelligence, professional services, step size, (19 more...)

arXiv.org Artificial Intelligence

1908.05751

Country:

North America > Canada > Alberta > Census Division No. 11 > Edmonton Metropolitan Region > Edmonton (0.14)
North America > United States > Texas > Travis County > Austin (0.14)

Genre: Research Report (0.64)

Industry: Health & Medicine (0.69)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Meta-descent for Online, Continual Prediction

Jacobsen, Andrew, Schlegel, Matthew, Linke, Cameron, Degris, Thomas, White, Adam, White, Martha

arXiv.org Machine LearningJul-17-2019

This paper investigates different vector step-size adaptation approaches for non-stationary online, continual prediction problems. Vanilla stochastic gradient descent can be considerably improved by scaling the update with a vector of appropriately chosen step-sizes. Many methods, including AdaGrad, RMSProp, and AMSGrad, keep statistics about the learning process to approximate a second order update---a vector approximation of the inverse Hessian. Another family of approaches use meta-gradient descent to adapt the step-size parameters to minimize prediction error. These meta-descent strategies are promising for non-stationary problems, but have not been as extensively explored as quasi-second order methods. We first derive a general, incremental meta-descent algorithm, called AdaGain, designed to be applicable to a much broader range of algorithms, including those with semi-gradient updates or even those with accelerations, such as RMSProp. We provide an empirical comparison of methods from both families. We conclude that methods from both families can perform well, but in non-stationary prediction problems the meta-descent methods exhibit advantages. Our method is particularly robust across several prediction problems, and is competitive with the state-of-the-art method on a large-scale, time-series prediction problem on real data from a mobile robot.

algorithm, artificial intelligence, educational setting, (20 more...)

arXiv.org Machine Learning

1907.07751

Country:

North America > United States > Massachusetts (0.28)
North America > Canada > Alberta > Census Division No. 11 > Edmonton Metropolitan Region > Edmonton (0.14)

Genre: Research Report (1.00)

Industry: Education > Educational Setting (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Add feedback

Ease-of-Teaching and Language Structure from Emergent Communication

Li, Fushan, Bowling, Michael

arXiv.org Artificial IntelligenceJun-5-2019

Artificial agents have been shown to learn to communicate when needed to complete a cooperative task. Some level of language structure (e.g., compositionality) has been found in the learned communication protocols. This observed structure is often the result of specific environmental pressures during training. By introducing new agents periodically to replace old ones, sequentially and within a population, we explore such a new pressure -- ease of teaching -- and show its impact on the structure of the resulting language.

artificial intelligence, listener, neural network, (17 more...)

arXiv.org Artificial Intelligence

1906.02403

Country: North America > Canada > Alberta > Census Division No. 11 > Edmonton Metropolitan Region > Edmonton (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

DeepMind is opening a huge new London headquarters in 2020

#artificialintelligenceMar-9-2019, 04:26:06 GMT

DeepMind, the Alphabet-owned artificial intelligence company, will be moving into a new flagship building in 2020. While a precise date is yet to be determined, the company hopes that the new headquarters – which will feature a double helix staircase descending through a library, a roof garden, lecture theatre and lobby artwork by creatives working with data and artificial intelligence – will be operational in the first half of next year. DeepMind is currently located near to the new site in London's Kings Cross, where it has two floors in the Google building. It also has smaller, satellite offices in Paris, Edmonton, Alberta and Mountain View, California. The new, 11-storey DeepMind headquarters will cement the reputation of the Kings Cross area as London's so-called'knowledge quarter': as well as Google, Facebook has taken office space nearby, as has Samsung.

deep learning, deepmind, neural network, (11 more...)

#artificialintelligence

Country:

Europe > United Kingdom (0.52)
North America > United States > California > Santa Clara County > Mountain View (0.26)
North America > Canada > Alberta > Census Division No. 11 > Edmonton Metropolitan Region > Edmonton (0.26)

Industry:

Information Technology (0.54)
Government > Regional Government > Europe Government > United Kingdom Government (0.33)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Differentially Private Contextual Linear Bandits

Shariff, Roshan, Sheffet, Or

Neural Information Processing SystemsDec-31-2018

We study the contextual linear bandit problem, a version of the standard stochastic multi-armed bandit (MAB) problem where a learner sequentially selects actions to maximize a reward which depends also on a user provided per-round context. Though the context is chosen arbitrarily or adversarially, the reward is assumed to be a stochastic function of a feature vector that encodes the context and selected action. Our goal is to devise private learners for the contextual linear bandit problem. We first show that using the standard definition of differential privacy results in linear regret. So instead, we adopt the notion of joint differential privacy, where we assume that the action chosen on day t is only revealed to user t and thus needn't be kept private that day, only on following days. We give a general scheme converting the classic linear-UCB algorithm into a joint differentially private algorithm using the tree-based algorithm. We then apply either Gaussian noise or Wishart noise to achieve joint-differentially private algorithms and bound the resulting algorithms' regrets. In addition, we give the first lower bound on the additional regret any private algorithms for the MAB problem must incur.

artificial intelligence, data mining, machine learning, (20 more...)

Neural Information Processing Systems

Country: