Plotting

 Durand, Audrey


WAPTS: A Weighted Allocation Probability Adjusted Thompson Sampling Algorithm for High-Dimensional and Sparse Experiment Settings

arXiv.org Machine Learning

Aiming for more effective experiment design, such as in video content advertising where different content options compete for user engagement, these scenarios can be modeled as multi-arm bandit problems. In cases where limited interactions are available due to external factors, such as the cost of conducting experiments, recommenders often face constraints due to the small number of user interactions. In addition, there is a trade-off between selecting the best treatment and the ability to personalize and contextualize based on individual factors. A popular solution to this dilemma is the Contextual Bandit framework. It aims to maximize outcomes while incorporating personalization (contextual) factors, customizing treatments such as a user's profile to individual preferences. Despite their advantages, Contextual Bandit algorithms face challenges like measurement bias and the 'curse of dimensionality.' These issues complicate the management of numerous interventions and often lead to data sparsity through participant segmentation. To address these problems, we introduce the Weighted Allocation Probability Adjusted Thompson Sampling (WAPTS) algorithm. WAPTS builds on the contextual Thompson Sampling method by using a dynamic weighting parameter. This improves the allocation process for interventions and enables rapid optimization in data-sparse environments. We demonstrate the performance of our approach on different numbers of arms and effect sizes.


Neural Active Learning Meets the Partial Monitoring Framework

arXiv.org Artificial Intelligence

We focus on the online-based active learning (OAL) setting where an agent operates over a stream of observations and trades-off between the costly acquisition of information (labelled observations) and the cost of prediction errors. We propose a novel foundation for OAL tasks based on partial monitoring, a theoretical framework specialized in online learning from partially informative actions. We show that previously studied binary and multi-class OAL tasks are instances of partial monitoring. We expand the real-world potential of OAL by introducing a new class of cost-sensitive OAL tasks. We propose NeuralCBP, the first PM strategy that accounts for predictive uncertainty with deep neural networks. Our extensive empirical evaluation on open source datasets shows that NeuralCBP has favorable performance against state-of-the-art baselines on multiple binary, multi-class and cost-sensitive OAL tasks.


Randomized Confidence Bounds for Stochastic Partial Monitoring

arXiv.org Artificial Intelligence

The partial monitoring (PM) framework provides a theoretical formulation of sequential learning problems with incomplete feedback. On each round, a learning agent plays an action while the environment simultaneously chooses an outcome. The agent then observes a feedback signal that is only partially informative about the (unobserved) outcome. The agent leverages the received feedback signals to select actions that minimize the (unobserved) cumulative loss. In contextual PM, the outcomes depend on some side information that is observable by the agent before selecting the action on each round. In this paper, we consider the contextual and non-contextual PM settings with stochastic outcomes. We introduce a new class of strategies based on the randomization of deterministic confidence bounds, that extend regret guarantees to settings where existing stochastic strategies are not applicable. Our experiments show that the proposed RandCBP and RandCBPside* strategies improve state-of-the-art baselines in PM games. To encourage the adoption of the PM framework, we design a use case on the real-world problem of monitoring the error rate of any deployed classification system.


Association Rules Mining with Auto-Encoders

arXiv.org Artificial Intelligence

Association rule mining (ARM) was first introduced by Agrawal [1] to solve the grocery basket problem, and since then it has found numerous applications in Knowledge Discovery in Database (KDD) problems ranging from financial analysis [2] to medical diagnostics [3]. An association rule (AR) is an implication of the form A C, which can be read as "if antecedent A is true then consequent C must be true", where A and C are sets of different items (itemsets) in a database. An AR is defined by its antecedent, its consequent and two measures [4].The first one is the support, which is the proportion of rows in the dataset where both the antecedent and the consequent appear. The second measure is the confidence, the conditional probability to observe the consequent given an observation of the antecedent. The most widely-used mining strategies Apriori [1] and other exhaustive strategies [5, 6, 7] typically work by first mining frequent itemsets, then combining those itemsets to produce association rules. However, all these algorithms face the same problems: the number of rules they produce increases exponentially with the number of items in the database, and thus it becomes impossible for a human to sort through the rules returned to pick out the best ones [8]. Their execution time also become an issue with massive datasets [8]. Finally, these algorithms need support and confidence thresholds in order to efficiently search through the solution space, and those thresholds need to be carefully chosen: low values can lead to long execution times and an overabundance of rules, while high values cause the algorithm to miss interesting rules.


Neural Bandits for Data Mining: Searching for Dangerous Polypharmacy

arXiv.org Artificial Intelligence

Polypharmacy, most often defined as the simultaneous consumption of five or more drugs at once, is a prevalent phenomenon in the older population. Some of these polypharmacies, deemed inappropriate, may be associated with adverse health outcomes such as death or hospitalization. Considering the combinatorial nature of the problem as well as the size of claims database and the cost to compute an exact association measure for a given drug combination, it is impossible to investigate every possible combination of drugs. Therefore, we propose to optimize the search for potentially inappropriate polypharmacies (PIPs). To this end, we propose the OptimNeuralTS strategy, based on Neural Thompson Sampling and differential evolution, to efficiently mine claims datasets and build a predictive model of the association between drug combinations and health outcomes. We benchmark our method using two datasets generated by an internally developed simulator of polypharmacy data containing 500 drugs and 100 000 distinct combinations. Empirically, our method can detect up to 72% of PIPs while maintaining an average precision score of 99% using 30 000 time steps.


Interpret Your Care: Predicting the Evolution of Symptoms for Cancer Patients

arXiv.org Artificial Intelligence

Cancer treatment is an arduous process for patients and causes many side-effects during and post-treatment. The treatment can affect almost all body systems and result in pain, fatigue, sleep disturbances, cognitive impairments, etc. These conditions are often under-diagnosed or under-treated. In this paper, we use patient data to predict the evolution of their symptoms such that treatment-related impairments can be prevented or effects meaningfully ameliorated. The focus of this study is on predicting the pain and tiredness level of a patient post their diagnosis. We implement an interpretable decision tree based model called LightGBM on real-world patient data consisting of 20163 patients. There exists a class imbalance problem in the dataset which we resolve using the oversampling technique of SMOTE. Our empirical results show that the value of the previous level of a symptom is a key indicator for prediction and the weighted average deviation in prediction of pain level is 3.52 and of tiredness level is 2.27.


Deep interpretability for GWAS

arXiv.org Machine Learning

These effects Genome-Wide Association Studies are typically may play an important role in complex diseases (Bell conducted using linear models to find genetic variants et al., 2011). Since deep networks are known to be able to associated with common diseases. In these model arbitrarily complicated nonlinear functions of their studies, association testing is done on a variant-byvariant inputs (Goodfellow et al., 2016), we aim to use them to basis, possibly missing out on nonlinear predict complex disease outcomes from genetic data, and interaction effects between variants. Deep networks then interpret them (sample by sample) to discover novel can be used to model these interactions, but interactions between SNPs that are significant predictors of they are difficult to train and interpret on large the disease outcome.


Temporal Regularization for Markov Decision Process

Neural Information Processing Systems

Several applications of Reinforcement Learning suffer from instability due to high variance. This is especially prevalent in high dimensional domains. Regularization is a commonly used technique in machine learning to reduce variance, at the cost of introducing some bias. Most existing regularization techniques focus on spatial (perceptual) regularization. Yet in reinforcement learning, due to the nature of the Bellman equation, there is an opportunity to also exploit temporal regularization based on smoothness in value estimates over trajectories.


Old Dog Learns New Tricks: Randomized UCB for Bandit Problems

arXiv.org Machine Learning

We propose $\tt RandUCB$, a bandit strategy that uses theoretically derived confidence intervals similar to upper confidence bound (UCB) algorithms, but akin to Thompson sampling (TS), uses randomization to trade off exploration and exploitation. In the $K$-armed bandit setting, we show that there are infinitely many variants of $\tt RandUCB$, all of which achieve the minimax-optimal $\widetilde{O}(\sqrt{K T})$ regret after $T$ rounds. Moreover, in a specific multi-armed bandit setting, we show that both UCB and TS can be recovered as special cases of $\tt RandUCB.$ For structured bandits, where each arm is associated with a $d$-dimensional feature vector and rewards are distributed according to a linear or generalized linear model, we prove that $\tt RandUCB$ achieves the minimax-optimal $\widetilde{O}(d \sqrt{T})$ regret even in the case of infinite arms. We demonstrate the practical effectiveness of $\tt RandUCB$ with experiments in both the multi-armed and structured bandit settings. Our results illustrate that $\tt RandUCB$ matches the empirical performance of TS while obtaining the theoretically optimal regret bounds of UCB algorithms, thus achieving the best of both worlds.


Attraction-Repulsion Actor-Critic for Continuous Control Reinforcement Learning

arXiv.org Artificial Intelligence

Continuous control tasks in reinforcement learning are important because they provide an important framework for learning in high-dimensional state spaces with deceptive rewards, where the agent can easily become trapped into suboptimal solutions. One way to avoid local optima is to use a population of agents to ensure coverage of the policy space, yet learning a population with the "best" coverage is still an open problem. In this work, we present a novel approach to population-based RL in continuous control that leverages properties of normalizing flows to perform attractive and repulsive operations between current members of the population and previously observed policies. Empirical results on the MuJoCo suite demonstrate a high performance gain for our algorithm compared to prior work, including Soft-Actor Critic (SAC).