AITopics | Mann, Timothy A.

Collaborating Authors

Mann, Timothy A.

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Learning from Delayed Outcomes via Proxies with Applications to Recommender Systems

Mann, Timothy A., Gowal, Sven, György, András, Jiang, Ray, Hu, Huiyi, Lakshminarayanan, Balaji, Srinivasan, Prav

arXiv.org Machine LearningOct-15-2019

Predicting delayed outcomes is an important problem in recommender systems (e.g., if customers will finish reading an ebook). We formalize the problem as an adversarial, delayed online learning problem and consider how a proxy for the delayed outcome (e.g., if customers read a third of the book in 24 hours) can help minimize regret, even though the proxy is not available when making a prediction. Motivated by our regret analysis, we propose two neural network architectures: Factored Forecaster (FF) which is ideal if the proxy is informative of the outcome in hindsight, and Residual Factored Forecaster (RFF) that is robust to a non-informative proxy. Experiments on two real-world datasets for predicting human behavior show that RFF outperforms both FF and a direct forecaster that does not make use of the proxy. Our results suggest that exploiting proxies by factorization is a promising way to mitigate the impact of long delays in human-behavior prediction tasks.

artificial intelligence, neural network, proxy, (19 more...)

arXiv.org Machine Learning

1807.09387

Country: North America > United States > California (0.14)

Genre: Research Report > New Finding (0.86)

Industry: Education (0.55)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Soft-Robust Actor-Critic Policy-Gradient

Derman, Esther, Mankowitz, Daniel J., Mann, Timothy A., Mannor, Shie

arXiv.org Machine LearningMar-11-2018

Robust Reinforcement Learning aims to derive an optimal behavior that accounts for model uncertainty in dynamical systems. However, previous studies have shown that by considering the worst case scenario, robust policies can be overly conservative. Our \textit{soft-robust} framework is an attempt to overcome this issue. In this paper, we present a novel Soft-Robust Actor-Critic algorithm (SR-AC). It learns an optimal policy with respect to a distribution over an uncertainty set and stays robust to model uncertainty but avoids the conservativeness of robust strategies. We show convergence of the SR-AC and test the efficiency of our approach on different domains by comparing it against regular learning methods and their robust formulations.

algorithm, artificial intelligence, reinforcement learning, (18 more...)

arXiv.org Machine Learning

1803.04848

Country:

Asia > Middle East > Israel (0.14)
North America > United States (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

Optimizing Slate Recommendations via Slate-CVAE

Jiang, Ray, Gowal, Sven, Mann, Timothy A., Rezende, Danilo J.

arXiv.org Machine LearningMar-7-2018

The slate recommendation problem aims to find the "optimal" ordering of a subset of documents to be presented on a surface that we call "slate". The definition of "optimal" changes depending on the underlying applications but a typical goal is to maximize user engagement with the slate. Solving this problem at scale is hard due to the combinatorial explosion of documents to show and their display positions on the slate. In this paper, we introduce Slate Conditional Variational Auto-Encoders (Slate-CVAE) to generate optimal slates. To the best of our knowledge, this is the first conditional generative model that provides a unified framework for slate recommendation by direct generation. Slate-CVAE automatically takes into account the format of the slate and any biases that the representation causes, thus truly proposing the optimal slate. Additionally, to deal with large corpora of documents, we present a novel approach that uses pretrained document embeddings combined with a soft-nearest-neighbors layer within our CVAE model. Experiments show that on the simulated and real-world datasets, Slate-CVAE outperforms recommender systems that consists of greedily ranking documents by a significant margin while remaining scalable.

deep learning, neural network, slate, (18 more...)

arXiv.org Machine Learning

1803.01682

Country:

North America > United States (0.14)
North America > Canada (0.14)
Europe > United Kingdom (0.14)

Genre: Research Report (1.00)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Learning Robust Options

Mankowitz, Daniel J., Mann, Timothy A., Bacon, Pierre-Luc, Precup, Doina, Mannor, Shie

arXiv.org Machine LearningFeb-9-2018

Robust reinforcement learning aims to produce policies that have strong guarantees even in the face of environments/transition models whose parameters have strong uncertainty. Existing work uses value-based methods and the usual primitive action setting. In this paper, we propose robust methods for learning temporally abstract actions, in the framework of options. We present a Robust Options Policy Iteration (ROPI) algorithm with convergence guarantees, which learns options that are robust to model uncertainty. We utilize ROPI to learn robust options with the Robust Options Deep Q Network (RO-DQN) that solves multiple tasks and mitigates model misspecification due to model uncertainty. We present experimental results which suggest that policy iteration with linear features may have an inherent form of robustness when using coarse feature representations. In addition, we present experimental results which demonstrate that robustness helps policy iteration implemented on top of deep neural networks to generalize over a much broader range of dynamics than non-robust policy iteration.

algorithm, deep learning, neural network, (18 more...)

arXiv.org Machine Learning

1802.03236

Country:

Asia > Middle East > Israel (0.14)
North America > Canada > Quebec > Montreal (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback

Learning Robust Options

Mankowitz, Daniel J. (Technion Israel Institute of Technology) | Mann, Timothy A. (Google Deepmind) | Bacon, Pierre-Luc (McGill University) | Precup, Doina (McGill University) | Mannor, Shie (Technion Israel Institute of Technology)

AAAI ConferencesFeb-8-2018

algorithm, deep learning, neural network, (19 more...)

AAAI Conferences

Thirty-Second AAAI Conference on Artificial Intelligence

Country:

Asia > Middle East > Israel (0.14)
North America > Canada > Quebec > Montreal (0.14)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback

Adaptive Skills Adaptive Partitions (ASAP)

Mankowitz, Daniel J., Mann, Timothy A., Mannor, Shie

Neural Information Processing SystemsDec-31-2016

We introduce the Adaptive Skills, Adaptive Partitions (ASAP) framework that (1) learns skills (i.e., temporally extended actions or options) as well as (2) where to apply them. We believe that both (1) and (2) are necessary for a truly general skill learning framework, which is a key building block needed to scale up to lifelong learning agents. The ASAP framework is also able to solve related new tasks simply by adapting where it applies its existing learned skills. We prove that ASAP converges to a local optimum under natural conditions. Finally, our experimental results, which include a RoboCup domain, demonstrate the ability of ASAP to learn where to reuse skills as well as solve multiple tasks with considerably less experience than solving each task from scratch.

artificial intelligence, hyperplane, soccer, (18 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Israel (0.14)
Europe > Spain (0.14)

Industry:

Leisure & Entertainment > Sports > Soccer (1.00)
Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)

Add feedback

Adaptive Lambda Least-Squares Temporal Difference Learning

Mann, Timothy A., Penedones, Hugo, Mannor, Shie, Hester, Todd

arXiv.org Machine LearningDec-30-2016

Temporal Difference learning or TD($\lambda$) is a fundamental algorithm in the field of reinforcement learning. However, setting TD's $\lambda$ parameter, which controls the timescale of TD updates, is generally left up to the practitioner. We formalize the $\lambda$ selection problem as a bias-variance trade-off where the solution is the value of $\lambda$ that leads to the smallest Mean Squared Value Error (MSVE). To solve this trade-off we suggest applying Leave-One-Trajectory-Out Cross-Validation (LOTO-CV) to search the space of $\lambda$ values. Unfortunately, this approach is too computationally expensive for most practical applications. For Least Squares TD (LSTD) we show that LOTO-CV can be implemented efficiently to automatically tune $\lambda$ and apply function optimization methods to efficiently search the space of $\lambda$ values. The resulting algorithm, ALLSTD, is parameter free and our experiments demonstrate that ALLSTD is significantly computationally faster than the na\"{i}ve LOTO-CV implementation while achieving similar performance.

artificial intelligence, reinforcement learning, trajectory, (17 more...)

arXiv.org Machine Learning

1612.09465

Country:

Europe > United Kingdom (0.14)
Asia > Middle East > Israel (0.14)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Iterative Hierarchical Optimization for Misspecified Problems (IHOMP)

Mankowitz, Daniel J., Mann, Timothy A., Mannor, Shie

arXiv.org Artificial IntelligenceJun-7-2016

For complex, high-dimensional Markov Decision Processes (MDPs), it may be necessary to represent the policy with function approximation. A problem is misspecified whenever, the representation cannot express any policy with acceptable performance. We introduce IHOMP : an approach for solving misspecified problems. IHOMP iteratively learns a set of context specialized options and combines these options to solve an otherwise misspecified problem. Our main contribution is proving that IHOMP enjoys theoretical convergence guarantees. In addition, we extend IHOMP to exploit Option Interruption (OI) enabling it to decide where the learned options can be reused. Our experiments demonstrate that IHOMP can find near-optimal solutions to otherwise misspecified problems and that OI can further improve the solutions.

artificial intelligence, partition, reinforcement learning, (14 more...)

arXiv.org Artificial Intelligence

1602.03348

Country: Asia > Middle East > Israel (0.14)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Add feedback

Adaptive Skills, Adaptive Partitions (ASAP)

Mankowitz, Daniel J., Mann, Timothy A., Mannor, Shie

arXiv.org Artificial IntelligenceJun-7-2016

We introduce the Adaptive Skills, Adaptive Partitions (ASAP) framework that (1) learns skills (i.e., temporally extended actions or options) as well as (2) where to apply them. We believe that both (1) and (2) are necessary for a truly general skill learning framework, which is a key building block needed to scale up to lifelong learning agents. The ASAP framework can also solve related new tasks simply by adapting where it applies its existing learned skills. We prove that ASAP converges to a local optimum under natural conditions. Finally, our experimental results, which include a RoboCup domain, demonstrate the ability of ASAP to learn where to reuse skills as well as solve multiple tasks with considerably less experience than solving each task from scratch.

artificial intelligence, hyperplane, soccer, (18 more...)

arXiv.org Artificial Intelligence

1602.03351

Country: Asia > Middle East > Israel (0.14)

Industry:

Leisure & Entertainment > Sports > Soccer (1.00)
Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)

Add feedback

Actively Learning to Attract Followers on Twitter

Levine, Nir, Mann, Timothy A., Mannor, Shie

arXiv.org Machine LearningApr-16-2015

Twitter, a popular social network, presents great opportunities for on-line machine learning research. However, previous research has focused almost entirely on learning from passively collected data. We study the problem of learning to acquire followers through normative user behavior, as opposed to the mass following policies applied by many bots. We formalize the problem as a contextual bandit problem, in which we consider retweeting content to be the action chosen and each tweet (content) is accompanied by context. We design reward signals based on the change in followers. The result of our month long experiment with 60 agents suggests that (1) aggregating experience across agents can adversely impact prediction accuracy and (2) the Twitter community's response to different actions is non-stationary. Our findings suggest that actively learning on-line can provide deeper insights about how to attract followers than machine learning over passively collected data alone.

agent, artificial intelligence, social media, (19 more...)

arXiv.org Machine Learning

1504.04114

Country:

Asia > Middle East > Israel (0.15)
North America > United States (0.14)

Genre: Research Report > New Finding (0.87)

Industry: Information Technology > Services (0.69)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.96)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.70)

Add feedback