AITopics | Aouali, Imad

Collaborating Authors

Aouali, Imad

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Unified PAC-Bayesian Study of Pessimism for Offline Policy Learning with Regularized Importance Sampling

Aouali, Imad, Brunel, Victor-Emmanuel, Rohde, David, Korba, Anna

arXiv.org Machine LearningJun-5-2024

Off-policy learning (OPL) often involves minimizing a risk estimator based on importance weighting to correct bias from the logging policy used to collect data. However, this method can produce an estimator with a high variance. A common solution is to regularize the importance weights and learn the policy by minimizing an estimator with penalties derived from generalization bounds specific to the estimator. This approach, known as pessimism, has gained recent attention but lacks a unified framework for analysis. To address this gap, we introduce a comprehensive PAC-Bayesian framework to examine pessimism with regularized importance weighting. We derive a tractable PAC-Bayesian generalization bound that universally applies to common importance weight regularizations, enabling their comparison within a single framework. Our empirical results challenge common understanding, demonstrating the effectiveness of standard IW regularization techniques.

artificial intelligence, bayesian inference, machine learning, (16 more...)

arXiv.org Machine Learning

2406.03434

Country:

Europe (0.28)
North America > United States > Virginia (0.14)

Genre: Research Report > New Finding (0.93)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.34)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.34)

Add feedback

Logarithmic Smoothing for Pessimistic Off-Policy Evaluation, Selection and Learning

Sakhi, Otmane, Aouali, Imad, Alquier, Pierre, Chopin, Nicolas

arXiv.org Machine LearningMay-23-2024

This work investigates the offline formulation of the contextual bandit problem, where the goal is to leverage past interactions collected under a behavior policy to evaluate, select, and learn new, potentially better-performing, policies. Motivated by critical applications, we move beyond point estimators. Instead, we adopt the principle of pessimism where we construct upper bounds that assess a policy's worst-case performance, enabling us to confidently select and learn improved policies. Precisely, we introduce novel, fully empirical concentration bounds for a broad class of importance weighting risk estimators. These bounds are general enough to cover most existing estimators and pave the way for the development of new ones. In particular, our pursuit of the tightest bound within this class motivates a novel estimator (LS), that logarithmically smooths large importance weights. The bound for LS is provably tighter than all its competitors, and naturally results in improved policy selection and learning strategies. Extensive policy evaluation, selection, and learning experiments highlight the versatility and favorable performance of LS.

artificial intelligence, data mining, machine learning, (17 more...)

arXiv.org Machine Learning

2405.14335

Country:

North America > United States > Virginia (0.14)
North America > United States > California (0.14)

Genre: Research Report > New Finding (0.92)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.93)

Add feedback

Bayesian Off-Policy Evaluation and Learning for Large Action Spaces

Aouali, Imad, Brunel, Victor-Emmanuel, Rohde, David, Korba, Anna

arXiv.org Machine LearningFeb-22-2024

In interactive systems, actions are often correlated, presenting an opportunity for more sample-efficient off-policy evaluation (OPE) and learning (OPL) in large action spaces. We introduce a unified Bayesian framework to capture these correlations through structured and informative priors. In this framework, we propose sDM, a generic Bayesian approach designed for OPE and OPL, grounded in both algorithmic and theoretical foundations. Notably, sDM leverages action correlations without compromising computational efficiency. Moreover, inspired by online Bayesian bandits, we introduce Bayesian metrics that assess the average performance of algorithms across multiple problem instances, deviating from the conventional worst-case assessments. We analyze sDM in OPE and OPL, highlighting the benefits of leveraging action correlations. Empirical evidence showcases the strong performance of sDM.

artificial intelligence, bayesian off-policy evaluation and learning, machine learning, (12 more...)

arXiv.org Machine Learning

2402.14664

Country: North America > United States > Virginia (0.14)

Genre: Research Report (0.81)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

Diffusion Models Meet Contextual Bandits with Large Action Spaces

Aouali, Imad

arXiv.org Artificial IntelligenceFeb-15-2024

Efficient exploration is a key challenge in contextual bandits due to the large size of their action space, where uninformed exploration can result in computational and statistical inefficiencies. Fortunately, the rewards of actions are often correlated and this can be leveraged to explore them efficiently. In this work, we capture such correlations using pre-trained diffusion models; upon which we design diffusion Thompson sampling (dTS). Both theoretical and algorithmic foundations are developed for dTS, and empirical evaluation also shows its favorable performance.

artificial intelligence, bandit, machine learning, (13 more...)

arXiv.org Artificial Intelligence

2402.10028

Country: Europe > France (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)

Add feedback

Prior-Dependent Allocations for Bayesian Fixed-Budget Best-Arm Identification in Structured Bandits

Nguyen, Nicolas, Aouali, Imad, György, András, Vernade, Claire

arXiv.org Artificial IntelligenceFeb-8-2024

Best arm identification (BAI) addresses the challenge of finding the optimal arm in a bandit environment (Lattimore and Szepesvári, 2020), with wide-ranging applications in online advertising, drug discovery or hyperparameter tuning. BAI is commonly approached through two primary paradigms: fixed-confidence and fixed-budget. In the fixed-confidence setting (Even-Dar et al., 2006; Kaufmann et al., 2016), the objective is to find the optimal arm with a pre-specified confidence level. Conversely, fixed-budget BAI (Audibert et al., 2010; Karnin et al., 2013; Carpentier and Locatelli, 2016) involves identifying the optimal arm within a fixed number of observations. Within this fixed-budget context, two main metrics are used: the probability of error (PoE) (Audibert et al., 2010; Karnin et al., 2013; Carpentier and Locatelli, 2016)--the likelihood of incorrectly identifying the optimal arm--and the simple regret (Bubeck et al., 2009; Russo, 2016; Komiyama et al., 2023)--the expected performance disparity between the chosen and the optimal arm.

artificial intelligence, machine learning, pi-bai, (19 more...)

arXiv.org Artificial Intelligence

2402.05878

Country:

Europe > Portugal (0.14)
Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.14)

Genre: Research Report > New Finding (0.67)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.92)

Add feedback

Exponential Smoothing for Off-Policy Learning

Aouali, Imad, Brunel, Victor-Emmanuel, Rohde, David, Korba, Anna

arXiv.org Artificial IntelligenceJun-5-2023

Off-policy learning (OPL) aims at finding improved policies from logged bandit data, often by minimizing the inverse propensity scoring (IPS) estimator of the risk. In this work, we investigate a smooth regularization for IPS, for which we derive a two-sided PAC-Bayes generalization bound. The bound is tractable, scalable, interpretable and provides learning certificates. In particular, it is also valid for standard IPS without making the assumption that the importance weights are bounded. We demonstrate the relevance of our approach and its favorable performance through a set of learning tasks. Since our bound holds for standard IPS, we are able to provide insight into when regularizing IPS is useful. Namely, we identify cases where regularization might not be needed. This goes against the belief that, in practice, clipped IPS often enjoys favorable performance than standard IPS in OPL.

artificial intelligence, machine learning, survey article, (15 more...)

arXiv.org Artificial Intelligence

2305.15877

Country:

North America > United States > Virginia (0.14)
North America > United States > Hawaii (0.14)

Genre:

Overview (0.92)
Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Mixed-Effect Thompson Sampling

Aouali, Imad, Kveton, Branislav, Katariya, Sumeet

arXiv.org Artificial IntelligenceMar-5-2023

A contextual bandit is a popular framework for online learning to act under uncertainty. In practice, the number of actions is huge and their expected rewards are correlated. In this work, we introduce a general framework for capturing such correlations through a mixed-effect model where actions are related through multiple shared effect parameters. To explore efficiently using this structure, we propose Mixed-Effect Thompson Sampling (meTS) and bound its Bayes regret. The regret bound has two terms, one for learning the action parameters and the other for learning the shared effect parameters. The terms reflect the structure of our model and the quality of priors. Our theoretical findings are validated empirically using both synthetic and real-world problems. We also propose numerous extensions of practical interest. While they do not come with guarantees, they perform well empirically and show the generality of the proposed framework.

artificial intelligence, bandit, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2205.15124

Country: Europe > Spain (0.14)

Genre: Research Report (0.64)

Industry:

Media (0.68)
Health & Medicine > Pharmaceuticals & Biotechnology (0.68)
Leisure & Entertainment (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.65)

Add feedback

Probabilistic Rank and Reward: A Scalable Model for Slate Recommendation

Aouali, Imad, Hammou, Achraf Ait Sidi, Ivanov, Sergey, Sakhi, Otmane, Rohde, David, Vasile, Flavian

arXiv.org Artificial IntelligenceNov-24-2022

We introduce Probabilistic Rank and Reward (PRR), a scalable probabilistic model for personalized slate recommendation. Our approach allows state-of-the-art estimation of the user interests in the ubiquitous scenario where the user interacts with at most one item from a slate of K items. We show that the probability of a slate being successful can be learned efficiently by combining the reward, whether the user successfully interacted with the slate, and the rank, the item that was selected within the slate. PRR outperforms competing approaches that use one signal or the other and is far more scalable to large action spaces. Moreover, PRR allows fast delivery of recommendations powered by maximum inner product search (MIPS), making it suitable in low latency domains such as computational advertising.

information retrieval, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2208.06263

Country: North America > United States (0.46)

Genre: Research Report (0.64)

Technology:

Information Technology > Information Management > Search (0.89)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.66)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback