AITopics | Garcelon, Evrard

Plotting

Garcelon, Evrard

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Conservative Exploration for Policy Optimization via Off-Policy Policy Evaluation

Daoudi, Paul, Formoso, Mathias, Gaizi, Othman, Azize, Achraf, Garcelon, Evrard

arXiv.org Machine LearningDec-24-2023

A precondition for the deployment of a Reinforcement Learning agent to a real-world system is to provide guarantees on the learning process. While a learning algorithm will eventually converge to a good policy, there are no guarantees on the performance of the exploratory policies. We study the problem of conservative exploration, where the learner must at least be able to guarantee its performance is at least as good as a baseline policy. We propose the first conservative provably efficient model-free algorithm for policy optimization in continuous finite-horizon problems. We leverage importance sampling techniques to counterfactually evaluate the conservative condition from the data self-generated by the algorithm. We derive a regret bound and show that (w.h.p.) the conservative constraint is never violated during learning. Finally, we leverage these insights to build a general schema for conservative exploration in DeepRL via off-policy policy evaluation techniques. We show empirically the effectiveness of our methods.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Machine Learning

2312.15458

Genre: Research Report (0.81)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

SALSA PICANTE: a machine learning attack on LWE with binary secrets

Li, Cathy, Sotáková, Jana, Wenger, Emily, Malhou, Mohamed, Garcelon, Evrard, Charton, Francois, Lauter, Kristin

arXiv.org Artificial IntelligenceOct-31-2023

Learning with Errors (LWE) is a hard math problem underpinning many proposed post-quantum cryptographic (PQC) systems. The only PQC Key Exchange Mechanism (KEM) standardized by NIST is based on module~LWE, and current publicly available PQ Homomorphic Encryption (HE) libraries are based on ring LWE. The security of LWE-based PQ cryptosystems is critical, but certain implementation choices could weaken them. One such choice is sparse binary secrets, desirable for PQ HE schemes for efficiency reasons. Prior work, SALSA, demonstrated a machine learning-based attack on LWE with sparse binary secrets in small dimensions ($n \le 128$) and low Hamming weights ($h \le 4$). However, this attack assumes access to millions of eavesdropped LWE samples and fails at higher Hamming weights or dimensions. We present PICANTE, an enhanced machine learning attack on LWE with sparse binary secrets, which recovers secrets in much larger dimensions (up to $n=350$) and with larger Hamming weights (roughly $n/10$, and up to $h=60$ for $n=350$). We achieve this dramatic improvement via a novel preprocessing step, which allows us to generate training data from a linear number of eavesdropped LWE samples ($4n$) and changes the distribution of the data to improve transformer training. We also improve the secret recovery methods of SALSA and introduce a novel cross-attention recovery mechanism allowing us to read off the secret directly from the trained models. While PICANTE does not threaten NIST's proposed LWE standards, it demonstrates significant improvement over SALSA and could scale further, highlighting the need for future investigation into machine learning attacks on LWE with sparse binary secrets.

artificial intelligence, machine learning, picante, (18 more...)

arXiv.org Artificial Intelligence

2303.04178

Country: North America > United States (0.93)

Genre: Research Report (0.82)

Industry:

Information Technology > Security & Privacy (1.00)
Government (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Top $K$ Ranking for Multi-Armed Bandit with Noisy Evaluations

Garcelon, Evrard, Avadhanula, Vashist, Lazaric, Alessandro, Pirotta, Matteo

arXiv.org Machine LearningDec-14-2021

Consider an idealized content reviewing task in a large social media firm, where the objective is to identify harmful content that violates the platforms' community standards. Given the large volume of content generated on a daily basis, it may not be possible to ask human reviewers to provide a thorough assessment of each piece of content. For this reason, the platform may automatically assign a badness score for each piece of content depending on their estimated level of severity. For example, a hate speech related post may be assigned a higher badness score in comparison to a click bait post. The content with higher badness score may then be prioritized for human review, which eventually leads to what we can consider as a "ground-truth" evaluation of the severity of the content. The more accurate the badness score is in predicting the actual severity, the higher the chance that harmful content is passed for human review and properly identified. In practice, the badness score may be obtained by aggregating predictions returned by different automatic systems (e.g., rule-based, ML-based systems). For instance, the platform could rely on NLP-based classifiers for hostile speech detection, or CV-based classifiers for graphic images. As such, it is crucial to properly calibrate the predictions returned by each of these classifiers to ensure that the scores can be compared meaningfully and then return an aggregate and reliable badness score that correctly prioritizes the most harmful content for human review.

artificial intelligence, data mining, machine learning, (18 more...)

arXiv.org Machine Learning

2112.06517

Country: North America > United States (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.82)

Add feedback

A Unified Framework for Conservative Exploration

Yang, Yunchang, Wu, Tianhao, Zhong, Han, Garcelon, Evrard, Pirotta, Matteo, Lazaric, Alessandro, Wang, Liwei, Du, Simon S.

arXiv.org Machine LearningJun-22-2021

We study bandits and reinforcement learning (RL) subject to a conservative constraint where the agent is asked to perform at least as well as a given baseline policy. This setting is particular relevant in real-world domains including digital marketing, healthcare, production, finance, etc. For multi-armed bandits, linear bandits and tabular RL, specialized algorithms and theoretical analyses were proposed in previous work. In this paper, we present a unified framework for conservative bandits and RL, in which our core technique is to calculate the necessary and sufficient budget obtained from running the baseline policy. For lower bounds, our framework gives a black-box reduction that turns a certain lower bound in the nonconservative setting into a new lower bound in the conservative setting. We strengthen the existing lower bound for conservative multi-armed bandits and obtain new lower bounds for conservative linear bandits, tabular RL and low-rank MDP. For upper bounds, our framework turns a certain nonconservative upper-confidence-bound (UCB) algorithm into a conservative algorithm with a simple analysis. For multi-armed bandits, linear bandits and tabular RL, our new upper bounds tighten or match existing ones with significantly simpler analyses. We also obtain a new upper bound for conservative low-rank MDP.

algorithm, big data, health & medicine, (22 more...)

arXiv.org Machine Learning

2106.11692

Country: Asia (0.14)

Genre: Research Report (1.00)

Industry: Health & Medicine (0.55)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.68)

Add feedback

Adversarial Attacks on Linear Contextual Bandits

Garcelon, Evrard, Roziere, Baptiste, Meunier, Laurent, Tarbouriech, Jean, Teytaud, Olivier, Lazaric, Alessandro, Pirotta, Matteo

arXiv.org Machine LearningFeb-11-2020

Contextual bandit algorithms are applied in a wide range of domains, from advertising to recommender systems, from clinical trials to education. In many of these domains, malicious agents may have incentives to attack the bandit algorithm to induce it to perform a desired behavior. For instance, an unscrupulous ad publisher may try to increase their own revenue at the expense of the advertisers; a seller may want to increase the exposure of their products, or thwart a competitor's advertising campaign. In this paper, we study several attack scenarios and show that a malicious agent can force a linear contextual bandit algorithm to pull any desired arm $T - o(T)$ times over a horizon of $T$ steps, while applying adversarial modifications to either rewards or contexts that only grow logarithmically as $O(\log T)$. We also investigate the case when a malicious agent is interested in affecting the behavior of the bandit algorithm in a single context (e.g., a specific user). We first provide sufficient conditions for the feasibility of the attack and we then propose an efficient algorithm to perform the attack. We validate our theoretical results on experiments performed on both synthetic and real-world datasets.

algorithm, artificial intelligence, big data, (19 more...)

arXiv.org Machine Learning

2002.03839

Country: Europe (0.28)

Genre: Research Report > New Finding (0.34)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Military (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.66)

Add feedback

No-Regret Exploration in Goal-Oriented Reinforcement Learning

Tarbouriech, Jean, Garcelon, Evrard, Valko, Michal, Pirotta, Matteo, Lazaric, Alessandro

arXiv.org Machine LearningDec-7-2019

Many popular reinforcement learning problems (e.g., navigation in a maze, some Atari games, mountain car) are instances of the so-called episodic setting or stochastic shortest path (SSP) problem, where an agent has to achieve a predefined goal state (e.g., the top of the hill) while maximizing the cumulative reward or minimizing the cumulative cost. Despite its popularity, most of the literature studying the exploration-exploitation dilemma either focused on different problems (i.e., fixed-horizon and infinite-horizon) or made the restrictive loop-free assumption (which implies that no same state can be visited twice during any episode). In this paper, we study the general SSP setting and introduce the algorithm UC-SSP whose regret scales as $\displaystyle \widetilde{O}(c_{\max}^{3/2} c_{\min}^{-1/2} D S \sqrt{ A D K})$ after $K$ episodes for any unknown SSP with $S$ non-terminal states, $A$ actions, an SSP-diameter of $D$ and positive costs in $[c_{\min}, c_{\max}]$. UC-SSP is thus the first learning algorithm with vanishing regret in the theoretically challenging setting of episodic RL.

artificial intelligence, computer game, null, (18 more...)

arXiv.org Machine Learning

1912.03517

Country: Europe (0.28)

Genre: Research Report (0.40)

Industry: Leisure & Entertainment > Games > Computer Games (0.54)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Bandits with Side Observations: Bounded vs. Logarithmic Regret

Degenne, Rémy, Garcelon, Evrard, Perchet, Vianney

arXiv.org Machine LearningJul-10-2018

We consider the classical stochastic multi-armed bandit but where, from time to time and roughly with frequency $\epsilon$, an extra observation is gathered by the agent for free. We prove that, no matter how small $\epsilon$ is the agent can ensure a regret uniformly bounded in time. More precisely, we construct an algorithm with a regret smaller than $\sum_i \frac{\log(1/\epsilon)}{\Delta_i}$, up to multiplicative constant and loglog terms. We also prove a matching lower-bound, stating that no reasonable algorithm can outperform this quantity.

algorithm, artificial intelligence, big data, (20 more...)

arXiv.org Machine Learning

1807.03558

Country: Europe > France (0.28)

Genre: Research Report (0.64)

Industry: Energy (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.93)
Information Technology > Data Science > Data Mining > Big Data (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.67)

Add feedback