AITopics | Ammar, Haitham Bou

Collaborating Authors

Ammar, Haitham Bou

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Almost Surely Safe Alignment of Large Language Models at Inference-Time

Ji, Xiaotong, Ramesh, Shyam Sundhar, Zimmer, Matthieu, Bogunovic, Ilija, Wang, Jun, Ammar, Haitham Bou

arXiv.org Artificial IntelligenceFeb-5-2025

Even highly capable large language models (LLMs) can produce biased or unsafe responses, and alignment techniques, such as RLHF, aimed at mitigating this issue, are expensive and prone to overfitting as they retrain the LLM. This paper introduces a novel inference-time alignment approach that ensures LLMs generate safe responses almost surely, i.e., with a probability approaching one. We achieve this by framing the safe generation of inference-time responses as a constrained Markov decision process within the LLM's latent space. Crucially, we augment a safety state that tracks the evolution of safety constraints and enables us to demonstrate formal safety guarantees upon solving the MDP in the latent space. Building on this foundation, we propose InferenceGuard, a practical implementation that safely aligns LLMs without modifying the model weights. Empirically, we demonstrate InferenceGuard effectively balances safety and task performance, outperforming existing inference-time alignment methods in generating safe and aligned responses.

large language model, machine learning, preprint arxiv, (16 more...)

arXiv.org Artificial Intelligence

2502.01208

Country: North America > United States (0.28)

Genre: Research Report (0.64)

Industry:

Law (1.00)
Government (0.68)
Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.68)

Add feedback

Efficient Reinforcement Learning with Large Language Model Priors

Yan, Xue, Song, Yan, Feng, Xidong, Yang, Mengyue, Zhang, Haifeng, Ammar, Haitham Bou, Wang, Jun

arXiv.org Artificial IntelligenceOct-10-2024

In sequential decision-making (SDM) tasks, methods like reinforcement learning (RL) and heuristic search have made notable advances in specific cases. However, they often require extensive exploration and face challenges in generalizing across diverse environments due to their limited grasp of the underlying decision dynamics. In contrast, large language models (LLMs) have recently emerged as powerful general-purpose tools, due to their capacity to maintain vast amounts of domainspecific knowledge. To harness this rich prior knowledge for efficiently solving complex SDM tasks, we propose treating LLMs as prior action distributions and integrating them into RL frameworks through Bayesian inference methods, making use of variational inference and direct posterior sampling. The proposed approaches facilitate the seamless incorporation of fixed LLM priors into both policy-based and value-based RL frameworks. Our experiments show that incorporating LLMbased action priors significantly reduces exploration and optimization complexity, substantially improving sample efficiency compared to traditional RL techniques, e.g., using LLM priors decreases the number of required samples by over 90% in offline learning scenarios. Traditional approaches to SDM, such as optimal control (Garcia et al., 1989), heuristic search (Świechowski et al., 2023) and reinforcement learning (RL) (Mnih, 2013), have seen substantial success. Notably, AlphaGo (Silver et al., 2016) and AlphaStar (Vinyals et al., 2019), both based on deep reinforcement learning (DRL), have achieved human-level proficiency in the games of Go and StarCraft II, respectively. However, these methods still suffer from high computational complexity, along with poor generalizability and limited applicability across diverse domains (Dulac-Arnold et al., 2015; Cobbe et al., 2019). Recently, Large Language Models (LLMs) have emerged as effective tools for tackling diverse general-purpose tasks, such as in dialogue systems (Brooks et al., 2023), decision-making (Zhao et al., 2024a), and mathematical reasoning (Imani et al., 2023).

large language model, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

2410.07927

Country:

Asia > China (0.14)
Europe > Sweden (0.14)

Genre: Research Report (1.00)

Industry:

Leisure & Entertainment > Games > Go (0.54)
Leisure & Entertainment > Games > Computer Games (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Mixture of Attentions For Speculative Decoding

Zimmer, Matthieu, Gritta, Milan, Lampouras, Gerasimos, Ammar, Haitham Bou, Wang, Jun

arXiv.org Artificial IntelligenceOct-4-2024

The growth in the number of parameters of Large Language Models (LLMs) has led to a significant surge in computational requirements, making them challenging and costly to deploy. Speculative decoding (SD) leverages smaller models to efficiently propose future tokens, which are then verified by the LLM in parallel. Small models that utilise activations from the LLM currently achieve the fastest decoding speeds. However, we identify several limitations of SD models including the lack of on-policyness during training and partial observability. To address these shortcomings, we propose a more grounded architecture for small models by introducing a Mixture of Attentions for SD. Our novel architecture can be applied in two scenarios: a conventional single device deployment and a novel client-server deployment where the small model is hosted on a consumer device and the LLM on a server. In a single-device scenario, we demonstrate state-of-the-art speedups improving EAGLE-2 by 9.5% and its acceptance length by 25%. In a client-server setting, our experiments demonstrate: 1) state-of-the-art latencies with minimal calls to the server for different network conditions, and 2) in the event of a complete disconnection, our approach can maintain higher accuracy compared to other SD methods and demonstrates advantages over API calls to LLMs, which would otherwise be unable to continue the generation process.

arxiv preprint arxiv, large language model, natural language, (14 more...)

arXiv.org Artificial Intelligence

2410.03804

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Group Robust Preference Optimization in Reward-free RLHF

Ramesh, Shyam Sundhar, Hu, Yifan, Chaimalas, Iason, Mehta, Viraj, Sessa, Pier Giuseppe, Ammar, Haitham Bou, Bogunovic, Ilija

arXiv.org Artificial IntelligenceMay-30-2024

Adapting large language models (LLMs) for specific tasks usually involves fine-tuning through reinforcement learning with human feedback (RLHF) on preference data. While these data often come from diverse labelers' groups (e.g., different demographics, ethnicities, company teams, etc.), traditional RLHF approaches adopt a "one-size-fits-all" approach, i.e., they indiscriminately assume and optimize a single preference model, thus not being robust to unique characteristics and needs of the various groups. To address this limitation, we propose a novel Group Robust Preference Optimization (GRPO) method to align LLMs to individual groups' preferences robustly. Our approach builds upon reward-free direct preference optimization methods, but unlike previous approaches, it seeks a robust policy which maximizes the worst-case group performance. To achieve this, GRPO adaptively and sequentially weights the importance of different groups, prioritizing groups with worse cumulative loss. We theoretically study the feasibility of GRPO and analyze its convergence for the log-linear policy class. By fine-tuning LLMs with GRPO using diverse group-based global opinion data, we significantly improved performance for the worst-performing groups, reduced loss imbalances across groups, and improved probability accuracies compared to non-robust baselines.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2405.20304

Country:

Asia (0.28)
Europe > Switzerland (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.88)

Add feedback

End-to-End Meta-Bayesian Optimisation with Transformer Neural Processes

Maraval, Alexandre, Zimmer, Matthieu, Grosnit, Antoine, Ammar, Haitham Bou

arXiv.org Artificial IntelligenceDec-22-2023

Meta-Bayesian optimisation (meta-BO) aims to improve the sample efficiency of Bayesian optimisation by leveraging data from related tasks. While previous methods successfully meta-learn either a surrogate model or an acquisition function independently, joint training of both components remains an open challenge. This paper proposes the first end-to-end differentiable meta-BO framework that generalises neural processes to learn acquisition functions via transformer architectures. We enable this end-to-end framework with reinforcement learning (RL) to tackle the lack of labelled acquisition data. Early on, we notice that training transformer-based neural processes from scratch with RL is challenging due to insufficient supervision, especially when rewards are sparse. We formalise this claim with a combinatorial analysis showing that the widely used notion of regret as a reward signal exhibits a logarithmic sparsity pattern in trajectory lengths. To tackle this problem, we augment the RL objective with an auxiliary task that guides part of the architecture to learn a valid probabilistic model as an inductive bias. We demonstrate that our method achieves state-of-the-art regret results against various baselines in experiments on standard hyperparameter optimisation tasks and also outperforms others in the real-world problems of mixed-integer programming tuning, antibody design, and logic synthesis for electronic design automation.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

2305.1593

Country:

Europe (1.00)
North America > United States > California > Los Angeles County > Long Beach (0.14)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.14)

Genre: Research Report (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.89)

Add feedback

Framework and Benchmarks for Combinatorial and Mixed-variable Bayesian Optimization

Dreczkowski, Kamil, Grosnit, Antoine, Ammar, Haitham Bou

arXiv.org Artificial IntelligenceDec-11-2023

This paper introduces a modular framework for Mixed-variable and Combinatorial Bayesian Optimization (MCBO) to address the lack of systematic benchmarking and standardized evaluation in the field. Current MCBO papers often introduce non-diverse or non-standard benchmarks to evaluate their methods, impeding the proper assessment of different MCBO primitives and their combinations. Additionally, papers introducing a solution for a single MCBO primitive often omit benchmarking against baselines that utilize the same methods for the remaining primitives [1-4]. This omission is primarily due to the significant implementation overhead involved, resulting in a lack of controlled assessments and an inability to showcase the merits of a contribution effectively. To overcome these challenges, our proposed framework enables an effortless combination of Bayesian Optimization components, and provides a diverse set of synthetic and real-world benchmarking tasks. Leveraging this flexibility, we implement 47 novel MCBO algorithms and benchmark them against seven existing MCBO solvers and five standard black-box optimization algorithms on ten tasks, conducting over 4000 experiments. Our findings reveal a superior combination of MCBO primitives outperforming existing approaches and illustrate the significance of model fit and the use of a trust region.

artificial intelligence, evolutionary algorithm, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2306.09803

Country:

Europe (0.93)
North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (0.67)
Health & Medicine > Therapeutic Area (0.48)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(2 more...)

Add feedback

Reinforcement Learning for Safe Robot Control using Control Lyapunov Barrier Functions

Du, Desong, Han, Shaohang, Qi, Naiming, Ammar, Haitham Bou, Wang, Jun, Pan, Wei

arXiv.org Artificial IntelligenceMay-16-2023

Reinforcement learning (RL) exhibits impressive performance when managing complicated control tasks for robots. However, its wide application to physical robots is limited by the absence of strong safety guarantees. To overcome this challenge, this paper explores the control Lyapunov barrier function (CLBF) to analyze the safety and reachability solely based on data without explicitly employing a dynamic model. We also proposed the Lyapunov barrier actor-critic (LBAC), a model-free RL algorithm, to search for a controller that satisfies the data-based approximation of the safety and reachability conditions. The proposed approach is demonstrated through simulation and real-world robot control experiments, i.e., a 2D quadrotor navigation task. The experimental findings reveal this approach's effectiveness in reachability and safety, surpassing other model-free RL methods.

controller, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2305.09793

Country:

Europe > United Kingdom (0.28)
North America > United States (0.28)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

BOiLS: Bayesian Optimisation for Logic Synthesis

Grosnit, Antoine, Malherbe, Cedric, Tutunov, Rasul, Wan, Xingchen, Wang, Jun, Ammar, Haitham Bou

arXiv.org Machine LearningNov-11-2021

Optimising the quality-of-results (QoR) of circuits during logic synthesis is a formidable challenge necessitating the exploration of exponentially sized search spaces. While expert-designed operations aid in uncovering effective sequences, the increase in complexity of logic circuits favours automated procedures. Inspired by the successes of machine learning, researchers adapted deep learning and reinforcement learning to logic synthesis applications. However successful, those techniques suffer from high sample complexities preventing widespread adoption. To enable efficient and scalable solutions, we propose BOiLS, the first algorithm adapting modern Bayesian optimisation to navigate the space of synthesis operations. BOiLS requires no human intervention and effectively trades-off exploration versus exploitation through novel Gaussian process kernels and trust-region constrained acquisitions. In a set of experiments on EPFL benchmarks, we demonstrate BOiLS's superior performance compared to state-of-the-art in terms of both sample efficiency and QoR values.

artificial intelligence, boils, machine learning, (16 more...)

arXiv.org Machine Learning

2111.06178

Genre: Research Report (1.00)

Industry: Semiconductors & Electronics (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Implicit Variational Conditional Sampling with Normalizing Flows

Moens, Vincent, Sootla, Aivar, Ammar, Haitham Bou, Wang, Jun

arXiv.org Machine LearningJul-6-2021

We present a method for conditional sampling with normalizing flows when only part of an observation is available. We rely on the following fact: if the flow's domain can be partitioned in such a way that the flow restrictions to subdomains keep the bijectivity property, a lower bound to the conditioning variable log-probability can be derived. Simulation from the variational conditional flow then amends to solving an equality constraint. Our contribution is three-fold: a) we provide detailed insights on the choice of variational distributions; b) we propose how to partition the input space of the flow to preserve bijectivity property; c) we propose a set of methods to optimise the variational distribution in specific cases. Through extensive experiments, we show that our sampling method can be applied with success to invertible residual networks for inference and classification.

artificial intelligence, arxiv, neural network, (18 more...)

arXiv.org Machine Learning

2107.02474

Country:

North America > Canada (0.28)
Asia > Middle East > Israel (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Data Science (0.68)

Add feedback

Online Double Oracle

Dinh, Le Cong, Yang, Yaodong, Tian, Zheng, Nieves, Nicolas Perez, Slumbers, Oliver, Mguni, David Henry, Ammar, Haitham Bou, Wang, Jun

arXiv.org Artificial IntelligenceMar-16-2021

Solving strategic games whose action space is prohibitively large is a critical yet under-explored topic in economics, computer science and artificial intelligence. This paper proposes new learning algorithms in two-player zero-sum games where the number of pure strategies is huge or even infinite. Specifically, we combine no-regret analysis from online learning with double oracle methods from game theory. Our method -- \emph{Online Double Oracle (ODO)} -- achieves the regret bound of $\mathcal{O}(\sqrt{T k \log(k)})$ in self-play setting where $k$ is NOT the size of the game, but rather the size of \emph{effective strategy set} that is linearly dependent on the support size of the Nash equilibrium. On tens of different real-world games, including Leduc Poker that contains $3^{936}$ pure strategies, our methods outperform no-regret algorithms and double oracle methods by a large margin, both in convergence rate to Nash equilibrium and average payoff against strategic adversary.

algorithm, artificial intelligence, game theory, (19 more...)

arXiv.org Artificial Intelligence

2103.0778

Genre: Research Report (0.64)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.46)

Add feedback