Goto

Collaborating Authors

 Neuchâtel


Strategic Linear Contextual Bandits

Neural Information Processing Systems

Recommendation algorithms that select the most relevant item for sequentially arriving users or queries have become vital for navigating the internet and its many online platforms.



1ae6464c6b5d51b363d7d96f97132c75-Paper.pdf

Neural Information Processing Systems

Robust learning is a critical field that seeks to develop efficient algorithms that can recover an underlying model despite possibly malicious corruptions in the data. In recent decades, being able to deal with corrupted measurements has become of crucial importance.


Tumbleweeds inspire this rolling, resilient robot

Popular Science

HERMES is more energy efficient than a solid sphere. Breakthroughs, discoveries, and DIY tips sent every weekday. A robot inspired by desert tumbleweeds may be the first of a new generation of energy-efficient explorers rolling into future disaster zones. While the Hybrid Energy-efficient Rover Mechanism for Exploration Systems (HERMES) described in the journal recalls the desert ramblers, its creator initially envisioned the idea while watching humans enjoy wind simply for the thrill of it. "The inspiration struck on a windy winter afternoon along the shores of Lake Neuchâtel [in western Switzerland]," said Sanjay Manoharan, a study co-author and researcher at the École Polytechnique Fédérale de Lausanne (EPFL).

  Country:
  Genre: Research Report > New Finding (0.56)
  Industry:

Hail to the Thief: Exploring Attacks and Defenses in Decentralised GRPO

Blagoev, Nikolay, Ersoy, Oğuzhan, Chen, Lydia Yiyu

arXiv.org Artificial Intelligence

Group Relative Policy Optimization (GRPO) has demonstrated great utilization in post-training of Large Language Models (LLMs). In GRPO, prompts are answered by the model and, through reinforcement learning, preferred completions are learnt. Owing to the small communication volume, GRPO is inherently suitable for decentralised training as the prompts can be concurrently answered by multiple nodes and then exchanged in the forms of strings. In this work, we present the first adversarial attack in decentralised GRPO. We demonstrate that malicious parties can poison such systems by injecting arbitrary malicious tokens in benign models in both out-of-context and in-context attacks. Using empirical examples of math and coding tasks, we show that adversarial attacks can easily poison the benign nodes, polluting their local LLM post-training, achieving attack success rates up to 100% in as few as 50 iterations. We propose two ways to defend against these attacks, depending on whether all users train the same model or different models. We show that these defenses can achieve stop rates of up to 100%, making the attack impossible.


Diffusion annealed Langevin dynamics: a theoretical study

Cattiaux, Patrick, Cordero-Encinar, Paula, Guillin, Arnaud

arXiv.org Machine Learning

The aim of this paper is to give a rigorous presentation of the recently introduced diffusion annealed Langevin dynamics [39]. This stochastic process is a score based generative model and provides an alternative to the well known overdamped Langevin process and its reversed in time version commonly used for sampling purpose. In particular, we will fill some gaps in the main arguments used for building the annealed Langevin dynamics discussed in [39, 30, 24]. We will not discuss its practical efficiency nor its numerical counterparts, that is we will not introduce nor discuss the corresponding discrete algorithms, presented in [24] by the second author, and the references therein. However, some quantitative aspects, useful for discretization schemes or important from the statistical point of view, are discussed in details. Also, for distributions like the gaussian, an important idea introduced in the papers on diffusion annealed Langevin dynamics consists in using a functional inequality (namely the Poincaré inequality) to control some covariance. This inequality is crucial in [24] for proving that the score of the intermediate distributions is Lipschitz continuous, which, as we explain in Section 2, ensures the existence and uniqueness of strong solutions for the annealed Langevin diffusion. As a matter of fact, heavy tailed base distributions are also particularly well suited for the model as will see in an example.


Diverse Mini-Batch Selection in Reinforcement Learning for Efficient Chemical Exploration in de novo Drug Design

Svensson, Hampus Gummesson, Engkvist, Ola, Janet, Jon Paul, Tyrchan, Christian, Chehreghani, Morteza Haghir

arXiv.org Artificial Intelligence

In many real-world applications, evaluating the quality of instances is costly and time-consuming, e.g., human feedback and physics simulations, in contrast to proposing new instances. In particular, this is even more critical in reinforcement learning, since it relies on interactions with the environment (i.e., new instances) that must be evaluated to provide a reward signal for learning. At the same time, performing sufficient exploration is crucial in reinforcement learning to find high-rewarding solutions, meaning that the agent should observe and learn from a diverse set of experiences to find different solutions. Thus, we argue that learning from a diverse mini-batch of experiences can have a large impact on the exploration and help mitigate mode collapse. In this paper, we introduce mini-batch diversification for reinforcement learning and study this framework in the context of a real-world problem, namely, drug discovery. We extensively evaluate how our proposed framework can enhance the effectiveness of chemical exploration in de novo drug design, where finding diverse and high-quality solutions is crucial. Our experiments demonstrate that our proposed diverse mini-batch selection framework can substantially enhance the diversity of solutions while maintaining high-quality solutions. In drug discovery, such an outcome can potentially lead to fulfilling unmet medical needs faster.


Rawlsian many-to-one matching with non-linear utility

Nana, Hortence, Athanasopoulos, Andreas, Dimitrakakis, Christos

arXiv.org Artificial Intelligence

We study a many-to-one matching problem, such as the college admission problem, where each college can admit multiple students. Unlike classical models, colleges evaluate sets of students through non-linear utility functions that capture diversity between them. In this setting, we show that classical stable matchings may fail to exist. To address this, we propose alternative solution concepts based on Rawlsian fairness, aiming to maximize the minimum utility across colleges. We design both deterministic and stochastic algorithms that iteratively improve the outcome of the worst-off college, offering a practical approach to fair allocation when stability cannot be guaranteed.


Scales++: Compute Efficient Evaluation Subset Selection with Cognitive Scales Embeddings

Bean, Andrew M., Seedat, Nabeel, Chen, Shengzhuang, Schwarz, Jonathan Richard

arXiv.org Artificial Intelligence

The prohibitive cost of evaluating large language models (LLMs) on comprehensive benchmarks necessitates the creation of small yet representative data subsets (i.e., tiny benchmarks) that enable efficient assessment while retaining predictive fidelity. Current methods for this task operate under a model-centric paradigm, selecting benchmarking items based on the collective performance of existing models. Such approaches are limited by large upfront costs, an inability to immediately handle new benchmarks (`cold-start'), and the fragile assumption that future models will share the failure patterns of their predecessors. In this work, we challenge this paradigm and propose a item-centric approach to benchmark subset selection, arguing that selection should be based on the intrinsic properties of the task items themselves, rather than on model-specific failure patterns. We instantiate this item-centric efficient benchmarking approach via a novel method, Scales++, where data selection is based on the cognitive demands of the benchmark samples. Empirically, we show Scales++ reduces the upfront selection cost by over 18x while achieving competitive predictive fidelity. On the Open LLM Leaderboard, using just a 0.5\% data subset, we predict full benchmark scores with a 2.9% mean absolute error. We demonstrate that this item-centric approach enables more efficient model evaluation without significant fidelity degradation, while also providing better cold-start performance and more interpretable benchmarking.


Strategic Linear Contextual Bandits

Neural Information Processing Systems

Recommendation algorithms that select the most relevant item for sequentially arriving users or queries have become vital for navigating the internet and its many online platforms.