AITopics | Bibi, Adel

Collaborating Authors

Bibi, Adel

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Attacking Multimodal OS Agents with Malicious Image Patches

Aichberger, Lukas, Paren, Alasdair, Gal, Yarin, Torr, Philip, Bibi, Adel

arXiv.org Artificial IntelligenceMar-13-2025

Recent advances in operating system (OS) agents enable vision-language models to interact directly with the graphical user interface of an OS. These multimodal OS agents autonomously perform computer-based tasks in response to a single prompt via application programming interfaces (APIs). Such APIs typically support low-level operations, including mouse clicks, keyboard inputs, and screenshot captures. We introduce a novel attack vector: malicious image patches (MIPs) that have been adversarially perturbed so that, when captured in a screenshot, they cause an OS agent to perform harmful actions by exploiting specific APIs. For instance, MIPs embedded in desktop backgrounds or shared on social media can redirect an agent to a malicious website, enabling further exploitation. These MIPs generalise across different user requests and screen layouts, and remain effective for multiple OS agents. The existence of such attacks highlights critical security vulnerabilities in OS agents, which should be carefully addressed before their widespread adoption.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2503.10809

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
Europe > Austria > Upper Austria (0.14)

Genre: Research Report (0.81)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Human Computer Interaction > Interfaces (1.00)
Information Technology > Communications > Social Media (1.00)
(4 more...)

Add feedback

Shh, don't say that! Domain Certification in LLMs

Emde, Cornelius, Paren, Alasdair, Arvind, Preetham, Kayser, Maxime, Rainforth, Tom, Lukasiewicz, Thomas, Ghanem, Bernard, Torr, Philip H. S., Bibi, Adel

arXiv.org Machine LearningMar-6-2025

Large language models (LLMs) are often deployed to perform constrained tasks, with narrow domains. For example, customer support bots can be built on top of LLMs, relying on their broad language understanding and capabilities to enhance performance. However, these LLMs are adversarially susceptible, potentially generating outputs outside the intended domain. To formalize, assess, and mitigate this risk, we introduce domain certification; a guarantee that accurately characterizes the out-of-domain behavior of language models. We then propose a simple yet effective approach, which we call VALID that provides adversarial bounds as a certificate. Finally, we evaluate our method across a diverse set of datasets, demonstrating that it yields meaningful certificates, which bound the probability of out-of-domain samples tightly with minimum penalty to refusal behavior.

large language model, machine learning, natural language, (18 more...)

arXiv.org Machine Learning

2502.1932

Country:

Asia (0.92)
Europe > United Kingdom (0.67)
North America > United States > California > San Francisco County > San Francisco (0.14)

Genre:

Research Report > New Finding (0.67)
Research Report > Experimental Study (0.45)

Industry:

Government (0.68)
Health & Medicine > Pharmaceuticals & Biotechnology (0.67)
Education > Curriculum > Subject-Specific Education (0.46)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Mixture of Experts Made Intrinsically Interpretable

Yang, Xingyi, Venhoff, Constantin, Khakzar, Ashkan, de Witt, Christian Schroeder, Dokania, Puneet K., Bibi, Adel, Torr, Philip

arXiv.org Artificial IntelligenceMar-5-2025

Neurons in large language models often exhibit \emph{polysemanticity}, simultaneously encoding multiple unrelated concepts and obscuring interpretability. Instead of relying on post-hoc methods, we present \textbf{MoE-X}, a Mixture-of-Experts (MoE) language model designed to be \emph{intrinsically} interpretable. Our approach is motivated by the observation that, in language models, wider networks with sparse activations are more likely to capture interpretable factors. However, directly training such large sparse networks is computationally prohibitive. MoE architectures offer a scalable alternative by activating only a subset of experts for any given input, inherently aligning with interpretability objectives. In MoE-X, we establish this connection by rewriting the MoE layer as an equivalent sparse, large MLP. This approach enables efficient scaling of the hidden size while maintaining sparsity. To further enhance interpretability, we enforce sparse activation within each expert and redesign the routing mechanism to prioritize experts with the highest activation sparsity. These designs ensure that only the most salient features are routed and processed by the experts. We evaluate MoE-X on chess and natural language tasks, showing that it achieves performance comparable to dense models while significantly improving interpretability. MoE-X achieves a perplexity better than GPT-2, with interpretability surpassing even sparse autoencoder (SAE)-based approaches.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2503.07639

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.46)

Industry: Leisure & Entertainment > Games > Chess (0.51)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

On the Coexistence and Ensembling of Watermarks

Petrov, Aleksandar, Agarwal, Shruti, Torr, Philip H. S., Bibi, Adel, Collomosse, John

arXiv.org Artificial IntelligenceJan-28-2025

Watermarking, the practice of embedding imperceptible information into media such as images, videos, audio, and text, is essential for intellectual property protection, content provenance and attribution. The growing complexity of digital ecosystems necessitates watermarks for different uses to be embedded in the same media. However, to detect and decode all watermarks, they need to coexist well with one another. We perform the first study of coexistence of deep image watermarking methods and, contrary to intuition, we find that various open-source watermarks can coexist with only minor impacts on image quality and decoding robustness. The coexistence of watermarks also opens the avenue for ensembling watermarking methods. We show how ensembling can increase the overall message capacity and enable new trade-offs between capacity, accuracy, robustness and image quality, without needing to retrain the base models.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2501.17356

Genre: Research Report (0.63)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
(3 more...)

Add feedback

Open Problems in Machine Unlearning for AI Safety

Barez, Fazl, Fu, Tingchen, Prabhu, Ameya, Casper, Stephen, Sanyal, Amartya, Bibi, Adel, O'Gara, Aidan, Kirk, Robert, Bucknall, Ben, Fist, Tim, Ong, Luke, Torr, Philip, Lam, Kwok-Yan, Trager, Robert, Krueger, David, Mindermann, Sören, Hernandez-Orallo, José, Geva, Mor, Gal, Yarin

arXiv.org Artificial IntelligenceJan-8-2025

As AI systems become more capable, widely deployed, and increasingly autonomous in critical areas such as cybersecurity, biological research, and healthcare, ensuring their safety and alignment with human values is paramount. Machine unlearning -- the ability to selectively forget or suppress specific types of knowledge -- has shown promise for privacy and data removal tasks, which has been the primary focus of existing research. More recently, its potential application to AI safety has gained attention. In this paper, we identify key limitations that prevent unlearning from serving as a comprehensive solution for AI safety, particularly in managing dual-use knowledge in sensitive domains like cybersecurity and chemical, biological, radiological, and nuclear (CBRN) safety. In these contexts, information can be both beneficial and harmful, and models may combine seemingly harmless information for harmful purposes -- unlearning this information could strongly affect beneficial uses. We provide an overview of inherent constraints and open problems, including the broader side effects of unlearning dangerous knowledge, as well as previously unexplored tensions between unlearning and existing safety mechanisms. Finally, we investigate challenges related to evaluation, robustness, and the preservation of safety features during unlearning. By mapping these limitations and open challenges, we aim to guide future research toward realistic applications of unlearning within a broader AI safety framework, acknowledging its limitations and highlighting areas where alternative approaches may be required.

arxiv preprint arxiv, large language model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2501.04952

Country:

Europe (1.00)
Asia (1.00)
North America > United States (0.93)

Genre: Overview (1.00)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)
Health & Medicine (0.88)
Government > Military (0.87)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.97)

Add feedback

Detecting LLM Hallucination Through Layer-wise Information Deficiency: Analysis of Unanswerable Questions and Ambiguous Prompts

Kim, Hazel, Bibi, Adel, Torr, Philip, Gal, Yarin

arXiv.org Artificial IntelligenceDec-13-2024

Large language models (LLMs) frequently generate confident yet inaccurate responses, introducing significant risks for deployment in safety-critical domains. We present a novel approach to detecting model hallucination through systematic analysis of information flow across model layers when processing inputs with insufficient or ambiguous context. Our investigation reveals that hallucination manifests as usable information deficiencies in inter-layer transmissions. While existing approaches primarily focus on final-layer output analysis, we demonstrate that tracking cross-layer information dynamics ($\mathcal{L}$I) provides robust indicators of model reliability, accounting for both information gain and loss during computation. $\mathcal{L}$I improves model reliability by immediately integrating with universal LLMs without additional training or architectural modifications.

information, large language model, natural language, (17 more...)

arXiv.org Artificial Intelligence

2412.10246

Country: Europe > Austria (0.28)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

FedMedICL: Towards Holistic Evaluation of Distribution Shifts in Federated Medical Imaging

Alhamoud, Kumail, Ghunaim, Yasir, Alfarra, Motasem, Hartvigsen, Thomas, Torr, Philip, Ghanem, Bernard, Bibi, Adel, Ghassemi, Marzyeh

arXiv.org Artificial IntelligenceJul-11-2024

For medical imaging AI models to be clinically impactful, they must generalize. However, this goal is hindered by (i) diverse types of distribution shifts, such as temporal, demographic, and label shifts, and (ii) limited diversity in datasets that are siloed within single medical institutions. While these limitations have spurred interest in federated learning, current evaluation benchmarks fail to evaluate different shifts simultaneously. However, in real healthcare settings, multiple types of shifts co-exist, yet their impact on medical imaging performance remains unstudied. In response, we introduce FedMedICL, a unified framework and benchmark to holistically evaluate federated medical imaging challenges, simultaneously capturing label, demographic, and temporal distribution shifts. We comprehensively evaluate several popular methods on six diverse medical imaging datasets (totaling 550 GPU hours). Furthermore, we use FedMedICL to simulate COVID-19 propagation across hospitals and evaluate whether methods can adapt to pandemic changes in disease prevalence. We find that a simple batch balancing technique surpasses advanced methods in average performance across FedMedICL experiments. This finding questions the applicability of results from previous, narrow benchmarks in real-world medical settings.

artificial intelligence, dataset, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2407.08822

Country:

North America > United States (0.68)
Asia > Middle East > Saudi Arabia (0.28)

Genre:

Research Report > New Finding (0.66)
Research Report > Experimental Study (0.48)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Health & Medicine > Health Care Technology (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Mimicking User Data: On Mitigating Fine-Tuning Risks in Closed Large Language Models

Eiras, Francisco, Petrov, Aleksandar, Torr, Phillip H. S., Kumar, M. Pawan, Bibi, Adel

arXiv.org Artificial IntelligenceJul-1-2024

Fine-tuning large language models on small, high-quality datasets can enhance their performance on specific downstream tasks. Recent research shows that fine-tuning on benign, instruction-following data can inadvertently undo the safety alignment process and increase a model's propensity to comply with harmful queries. Although critical, understanding and mitigating safety risks in well-defined tasks remains distinct from the instruction-following context due to structural differences in the data. Our work addresses the gap in our understanding of these risks across diverse types of data in closed models - where providers control how user data is utilized in the fine-tuning process. We demonstrate how malicious actors can subtly manipulate the structure of almost any task-specific dataset to foster significantly more dangerous model behaviors, while maintaining an appearance of innocuity and reasonable downstream task performance. To address this issue, we propose a novel mitigation strategy that mixes in safety data which mimics the task format and prompting style of the user data, showing this is more effective than existing baselines at re-establishing safety alignment while maintaining similar task performance.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2406.10288

Country: Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)

Genre: Research Report > New Finding (0.34)

Industry: Information Technology > Security & Privacy (0.91)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Model Merging and Safety Alignment: One Bad Model Spoils the Bunch

Hammoud, Hasan Abed Al Kader, Michieli, Umberto, Pizzati, Fabio, Torr, Philip, Bibi, Adel, Ghanem, Bernard, Ozay, Mete

arXiv.org Artificial IntelligenceJun-20-2024

Merging Large Language Models (LLMs) is a cost-effective technique for combining multiple expert LLMs into a single versatile model, retaining the expertise of the original ones. However, current approaches often overlook the importance of safety alignment during merging, leading to highly misaligned models. This work investigates the effects of model merging on alignment. We evaluate several popular model merging techniques, demonstrating that existing methods do not only transfer domain expertise but also propagate misalignment. We propose a simple two-step approach to address this problem: (i) generating synthetic safety and domain-specific data, and (ii) incorporating these generated data into the optimization process of existing data-aware model merging techniques. This allows us to treat alignment as a skill that can be maximized in the resulting merged LLM. Our experiments illustrate the effectiveness of integrating alignment-related data during merging, resulting in models that excel in both domain expertise and alignment.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2406.14563

Country: Europe (0.14)

Genre: Research Report > New Finding (0.46)

Industry:

Law Enforcement & Public Safety (0.68)
Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Continual Learning on a Diet: Learning from Sparsely Labeled Streams Under Constrained Computation

Zhang, Wenxuan, Mohamed, Youssef, Ghanem, Bernard, Torr, Philip H. S., Bibi, Adel, Elhoseiny, Mohamed

arXiv.org Artificial IntelligenceJun-8-2024

We propose and study a realistic Continual Learning (CL) setting where learning algorithms are granted a restricted computational budget per time step while training. We apply this setting to large-scale semi-supervised Continual Learning scenarios with sparse label rate. Previous proficient CL methods perform very poorly in this challenging setting. Overfitting to the sparse labeled data and insufficient computational budget are the two main culprits for such a poor performance. Our new setting encourages learning methods to effectively and efficiently utilize the unlabeled data during training. To that end, we propose a simple but highly effective baseline, DietCL, which utilizes both unlabeled and labeled data jointly. DietCL outperforms, by a large margin, all existing supervised CL algorithms as well as more recent continual semi-supervised methods. Our extensive analysis and ablations demonstrate that DietCL is stable under a full spectrum of label sparsity, computational budget and various other ablations. In the era of abundant information, data is not revealed in its entirety but rather sequentially from a non-stationary environment. For example, social media platforms, such as YouTube, Snapchat, and Facebook, receive huge amounts of data every day. The content of the data and its distribution depend greatly on social trends and focuses on the corresponding platforms, thus shift over time. For instance, Snapchat, in 2017, reported the influx of over 3.5 billion short videos daily from users across the globe (Snap, 2017). These videos had to be instantly processed for various tasks, from image rating and recommendation to hate speech and misinformation detection. Continual learning attempts to address such challenges, focusing on designing training algorithms that accommodate new data streams while preserving previously acquired knowledge. Diverse solutions have emerged, spanning from regularization-based (Kirkpatrick et al., 2017), architecturebased (Ebrahimi et al., 2020), to memory-based methods (Chaudhry et al., 2019b).

artificial intelligence, budget, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2404.12766

Genre: Research Report > New Finding (0.46)

Industry: Information Technology > Services (0.68)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.52)

Add feedback