AITopics

1906.1235

Country: North America > United States (0.15)

Genre: Research Report (0.83)

Industry:

Health & Medicine > Therapeutic Area > Psychiatry/Psychology (1.00)
Health & Medicine > Therapeutic Area > Neurology > Attention Deficit/Hyperactivity Disorder (0.87)
Health & Medicine > Therapeutic Area > Neurology > Alzheimer's Disease (0.56)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

arXiv.org Machine LearningMay-31-2019

Optimal Exploitation of Clustering and History Information in Multi-Armed Bandit

Bouneffouf, Djallel, Parthasarathy, Srinivasan, Samulowitz, Horst, Wistub, Martin

We consider the stochastic multi-armed bandit problem and the contextual bandit problem with historical observations and pre-clustered arms. The historical observations can contain any number of instances for each arm, and the pre-clustering information is a fixed clustering of arms provided as part of the input. We develop a variety of algorithms which incorporate this offline information effectively during the online exploration phase and derive their regret bounds. In particular, we develop the META algorithm which effectively hedges between two other algorithms: one which uses both historical observations and clustering, and another which uses only the historical observations. The former outperforms the latter when the clustering quality is good, and vice-versa. Extensive experiments on synthetic and real world datasets on Warafin drug dosage and web server selection for latency minimization validate our theoretical insights and demonstrate that META is a robust strategy for optimally exploiting the pre-clustering information.

big data, health & medicine, information, (21 more...)

1906.03979

Country:

Asia (1.00)
North America > United States > New York > New York County > New York City (0.14)

Genre: Research Report (0.64)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.49)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.68)

arXiv.org Machine LearningMay-1-2019

Automated Machine Learning via ADMM

Liu, Sijia, Ram, Parikshit, Bouneffouf, Djallel, Bramble, Gregory, Conn, Andrew R, Samulowitz, Horst, Gray, Alexander

We study the automated machine learning (AutoML) problem of jointly selecting appropriate algorithms from an algorithm portfolio as well as optimizing their hyper-parameters for certain learning tasks. The main challenges include a) the coupling between algorithm selection and hyper-parameter optimization (HPO), and b) the black-box optimization nature of the problem where the optimizer cannot access the gradients of the loss function but may query function values. To circumvent these difficulties, we propose a new AutoML framework by leveraging the alternating direction method of multipliers (ADMM) scheme. Due to the splitting properties of ADMM, algorithm selection and HPO can be decomposed through the augmented Lagrangian function. As a result, HPO with mixed continuous and integer constraints are efficiently handled through a query-efficient Bayesian optimization approach and Euclidean projection operator that yields a closed-form solution. Algorithm selection in ADMM is naturally interpreted as a combinatorial bandit problem. The effectiveness of our proposed methodology is compared to state-of-the-art AutoML schemes such as TPOT and Auto-sklearn on numerous benchmark data sets.

artificial intelligence, optimization, optimization problem, (18 more...)

1905.00424

Country: Europe > Sweden (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

arXiv.org Machine LearningApr-2-2019

A Survey on Practical Applications of Multi-Armed and Contextual Bandits

Bouneffouf, Djallel, Rish, Irina

In recent years, multi-armed bandit (MAB) framework has attracted a lot of attention in various applications, from recommender systems and information retrieval to healthcare and finance, due to its stellar performance combined with certain attractive properties, such as learning from less feedback. The multi-armed bandit field is currently flourishing, as novel problem settings and algorithms motivated by various practical applications are being introduced, building on top of the classical bandit problem. This article aims to provide a comprehensive review of top recent developments in multiple real-life applications of the multi-armed bandit. Specifically, we introduce a taxonomy of common MAB-based applications and summarize state-of-art for each of those domains. Furthermore, we identify important current trends and provide new perspectives pertaining to the future of this exciting and fast-growing field.

bandit, deep learning, neural network, (24 more...)

1904.1004

Country:

South America > Brazil > Rio de Janeiro (0.14)
Europe > United Kingdom > Scotland (0.14)

Genre:

Research Report (1.00)
Overview (0.88)

Industry:

Health & Medicine > Therapeutic Area > Neurology (0.93)
Health & Medicine > Pharmaceuticals & Biotechnology (0.68)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)

arXiv.org Artificial IntelligenceDec-19-2018

Scalable Recollections for Continual Lifelong Learning

Riemer, Matthew, Klinger, Tim, Bouneffouf, Djallel, Franceschini, Michele

Given the recent success of Deep Learning applied to a variety of single tasks, it is natural to consider more human-realistic settings. Perhaps the most difficult of these settings is that of continual lifelong learning, where the model must learn online over a continuous stream of non-stationary data. A successful continual lifelong learning system must have three key capabilities: it must learn and adapt over time, it must not forget what it has learned, and it must be efficient in both training time and memory. Recent techniques have focused their efforts primarily on the first two capabilities while questions of efficiency remain largely unexplored. In this paper, we consider the problem of efficient and effective storage of experiences over very large time-frames. In particular we consider the case where typical experiences are O(n) bits and memories are limited to O(k) bits for k << n. We present a novel scalable architecture and training algorithm in this challenging domain and provide an extensive evaluation of its performance. Our results show that we can achieve considerable gains on top of state-of-the-art methods such as GEM.

autoencoder, deep learning, neural network, (20 more...)

1711.06761

Country: North America > United States > Texas > Travis County > Austin (0.14)

Genre: Research Report > New Finding (1.00)

Industry: Education > Educational Setting > Continuing Education (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

arXiv.org Artificial IntelligenceSep-21-2018

Interpretable Multi-Objective Reinforcement Learning through Policy Orchestration

Noothigattu, Ritesh, Bouneffouf, Djallel, Mattei, Nicholas, Chandra, Rachita, Madan, Piyush, Varshney, Kush, Campbell, Murray, Singh, Moninder, Rossi, Francesca

Autonomous cyber-physical agents and systems play an increasingly large role in our lives. To ensure that agents behave in ways aligned with the values of the societies in which they operate, we must develop techniques that allow these agents to not only maximize their reward in an environment, but also to learn and follow the implicit constraints of society. These constraints and norms can come from any number of sources including regulations, business process guidelines, laws, ethical principles, social norms, and moral values. We detail a novel approach that uses inverse reinforcement learning to learn a set of unspecified constraints from demonstrations of the task, and reinforcement learning to learn to maximize the environment rewards. More precisely, we assume that an agent can observe traces of behavior of members of the society but has no access to the explicit set of constraints that give rise to the observed behavior. Inverse reinforcement learning is used to learn such constraints, that are then combined with a possibly orthogonal value function through the use of a contextual bandit-based orchestrator that picks a contextually-appropriate choice between the two policies (constraint-based and environment reward-based) when taking actions. The contextual bandit orchestrator allows the agent to mix policies in novel ways, taking the best actions from either a reward maximizing or constrained policy. In addition, the orchestrator is transparent on which policy is being employed at each time step. We test our algorithms using a Pac-Man domain and show that the agent is able to learn to act optimally, act within the demonstrated constraints, and mix these two functions in complex ways.

computer game, constraint, ground transportation, (20 more...)

1809.08343

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > United Kingdom > England (0.14)

Genre:

Research Report > Promising Solution (0.34)
Overview > Innovation (0.34)

Industry: Leisure & Entertainment > Games > Computer Games (0.37)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

arXiv.org Artificial IntelligenceSep-15-2018

Incorporating Behavioral Constraints in Online AI Systems

Balakrishnan, Avinash, Bouneffouf, Djallel, Mattei, Nicholas, Rossi, Francesca

AI systems that learn through reward feedback about the actions they take are increasingly deployed in domains that have significant impact on our daily life. However, in many cases the online rewards should not be the only guiding criteria, as there are additional constraints and/or priorities imposed by regulations, values, preferences, or ethical principles. We detail a novel online agent that learns a set of behavioral constraints by observation and uses these learned constraints as a guide when making decisions in an online setting while still being reactive to reward feedback. To define this agent, we propose to adopt a novel extension to the classical contextual multi-armed bandit setting and we provide a new algorithm called Behavior Constrained Thompson Sampling (BCTS) that allows for online learning while obeying exogenous constraints. Our agent learns a constrained policy that implements the observed behavioral constraints demonstrated by a teacher agent, and then uses this constrained policy to guide the reward-based online exploration and exploitation. We characterize the upper bound on the expected regret of the contextual bandit algorithm that underlies our agent and provide a case study with real world data in two application domains. Our experiments show that the designed agent is able to act within the set of behavior constraints without significantly degrading its overall reward performance.

computer based training, constraint, educational technology, (25 more...)

1809.0572

Country: Europe > United Kingdom > England (0.14)

Genre: Research Report > New Finding (0.46)

Industry:

Media > Film (0.94)
Health & Medicine > Pharmaceuticals & Biotechnology (0.94)
Leisure & Entertainment (0.68)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.87)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.68)

arXiv.org Machine LearningJun-23-2018

Beyond Backprop: Alternating Minimization with co-Activation Memory

Choromanska, Anna, Kumaravel, Sadhana, Luss, Ronny, Rish, Irina, Kingsbury, Brian, Tejwani, Ravi, Bouneffouf, Djallel

We propose a novel online algorithm for training deep feedforward neural networks that employs alternating minimization (block-coordinate descent) between the weights and activation variables. It extends off-line alternating minimization approaches to online, continual learning, and improves over stochastic gradient descent (SGD) with backpropagation in several ways: it avoids the vanishing gradient issue, it allows for non-differentiable nonlinearities, and it permits parallel weight updates across the layers. Unlike SGD, our approach employs co-activation memory inspired by the online sparse coding algorithm of [Mairal et al, 2009]. Furthermore, local iterative optimization with explicit activation updates is a potentially more biologically plausible learning mechanism than backpropagation. We provide theoretical convergence analysis and promising empirical results on several datasets.

algorithm, deep learning, neural network, (18 more...)

1806.09077

Genre: Research Report (0.50)

Industry: Education > Educational Setting (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Machine LearningFeb-3-2018

Adaptive Representation Selection in Contextual Bandit with Unlabeled History

Lin, Baihan, Cecchi, Guillermo, Bouneffouf, Djallel, Rish, Irina

We consider an extension of the contextual bandit setting, motivated by several practical applications, where an unlabeled history of contexts can become available for pre-training before the online decision-making begins. We propose an approach for improving the performance of contextual bandit in such setting, via adaptive, dynamic representation learning, which combines offline pre-training on unlabeled history of contexts with online selection and modification of embedding functions. Our experiments on a variety of datasets and in different nonstationary environments demonstrate clear advantages of our approach over the standard contextual bandit.

bandit, big data, health & medicine, (18 more...)

1802.00981

Country:

North America > United States (0.14)
Europe > United Kingdom > Scotland (0.14)
Europe > France (0.14)

Genre: Research Report (1.00)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)
Information Technology > Data Science > Data Mining > Big Data (0.70)

arXiv.org Machine LearningJun-7-2017

Context Attentive Bandits: Contextual Bandit with Restricted Context

Bouneffouf, Djallel, Rish, Irina, Cecchi, Guillermo A., Feraud, Raphael

We consider a novel formulation of the multi-armed bandit model, which we call the contextual bandit with restricted context, where only a limited number of features can be accessed by the learner at every iteration. This novel formulation is motivated by different online problems arising in clinical trials, recommender systems and attention modeling. Herein, we adapt the standard multi-armed bandit algorithm known as Thompson Sampling to take advantage of our restricted context setting, and propose two novel algorithms, called the Thompson Sampling with Restricted Context(TSRC) and the Windows Thompson Sampling with Restricted Context(WTSRC), for handling stationary and nonstationary environments, respectively. Our empirical results demonstrate advantages of the proposed approaches on several real-life datasets

algorithm, big data, health & medicine, (19 more...)

1705.03821

Country:

Europe > United Kingdom > Scotland (0.14)
Asia > Malaysia (0.14)

Genre: Research Report > New Finding (0.69)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.49)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)