AITopics | Bibi, Adel

Collaborating Authors

Bibi, Adel

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Towards Interpretable Deep Local Learning with Successive Gradient Reconciliation

Yang, Yibo, Li, Xiaojie, Alfarra, Motasem, Hammoud, Hasan, Bibi, Adel, Torr, Philip, Ghanem, Bernard

arXiv.org Artificial IntelligenceJun-7-2024

Relieving the reliance of neural network training on a global back-propagation (BP) has emerged as a notable research topic due to the biological implausibility and huge memory consumption caused by BP. Among the existing solutions, local learning optimizes gradient-isolated modules of a neural network with local errors and has been proved to be effective even on large-scale datasets. However, the reconciliation among local errors has never been investigated. In this paper, we first theoretically study non-greedy layer-wise training and show that the convergence cannot be assured when the local gradient in a module w.r.t. its input is not reconciled with the local gradient in the previous module w.r.t. its output. Inspired by the theoretical result, we further propose a local training strategy that successively regularizes the gradient reconciliation between neighboring modules without breaking gradient isolation or introducing any learnable parameters. Our method can be integrated into both local-BP and BP-free settings. In experiments, we achieve significant performance improvements compared to previous methods. Particularly, our method for CNN and Transformer architectures on ImageNet is able to attain a competitive performance with global BP, saving more than 40% memory consumption.

artificial intelligence, interpretable deep local learning, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2406.05222

Country: Europe > Austria > Vienna (0.14)

Genre: Research Report (1.00)

Industry: Energy > Oil & Gas (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Universal In-Context Approximation By Prompting Fully Recurrent Models

Petrov, Aleksandar, Lamb, Tom A., Paren, Alasdair, Torr, Philip H. S., Bibi, Adel

arXiv.org Artificial IntelligenceJun-3-2024

Zero-shot and in-context learning enable solving tasks without model fine-tuning, making them essential for developing generative model solutions. Therefore, it is crucial to understand whether a pretrained model can be prompted to approximate any function, i.e., whether it is a universal in-context approximator. While it was recently shown that transformer models do possess this property, these results rely on their attention mechanism. Hence, these findings do not apply to fully recurrent architectures like RNNs, LSTMs, and the increasingly popular SSMs. We demonstrate that RNNs, LSTMs, GRUs, Linear RNNs, and linear gated architectures such as Mamba and Hawk/Griffin can also serve as universal in-context approximators. To streamline our argument, we introduce a programming language called LSRL that compiles to these fully recurrent architectures. LSRL may be of independent interest for further studies of fully recurrent models, such as constructing interpretability benchmarks. We also study the role of multiplicative gating and observe that architectures incorporating such gating (e.g., LSTMs, GRUs, Hawk/Griffin) can implement certain operations more stably, making them more viable candidates for practical in-context universal approximation.

machine learning, natural language, node, (19 more...)

arXiv.org Artificial Intelligence

2406.01424

Country:

Europe > Greece (0.14)
Europe > Austria (0.14)

Genre: Research Report > New Finding (0.87)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Risks and Opportunities of Open-Source Generative AI

Eiras, Francisco, Petrov, Aleksandar, Vidgen, Bertie, Schroeder, Christian, Pizzati, Fabio, Elkins, Katherine, Mukhopadhyay, Supratik, Bibi, Adel, Purewal, Aaron, Botos, Csaba, Steibel, Fabro, Keshtkar, Fazel, Barez, Fazl, Smith, Genevieve, Guadagni, Gianluca, Chun, Jon, Cabot, Jordi, Imperial, Joseph, Nolazco, Juan Arturo, Landay, Lori, Jackson, Matthew, Torr, Phillip H. S., Darrell, Trevor, Lee, Yong, Foerster, Jakob

arXiv.org Artificial IntelligenceMay-29-2024

Applications of Generative AI (Gen AI) are expected to revolutionize a number of different areas, ranging from science & medicine to education. The potential for these seismic changes has triggered a lively debate about the potential risks of the technology, and resulted in calls for tighter regulation, in particular from some of the major tech companies who are leading in AI development. This regulation is likely to put at risk the budding field of open-source generative AI. Using a three-stage framework for Gen AI development (near, mid and long-term), we analyze the risks and opportunities of open-source generative AI models with similar capabilities to the ones currently available (near to mid-term) and with greater capabilities (long-term). We argue that, overall, the benefits of open-source Gen AI outweigh its risks. As such, we encourage the open sourcing of models, training and evaluation data, and provide a set of recommendations and best practices for managing risks associated with open-source generative AI.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2405.08597

Country:

North America > United States (1.00)
Europe (1.00)
Asia > Middle East > Saudi Arabia (0.28)
Asia > Middle East > UAE (0.28)

Genre:

Overview (1.00)
Research Report > Experimental Study (0.45)

Industry:

Social Sector (1.00)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Information Technology > Security & Privacy (1.00)
(12 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Generation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (1.00)

Add feedback

Near to Mid-term Risks and Opportunities of Open-Source Generative AI

Eiras, Francisco, Petrov, Aleksandar, Vidgen, Bertie, de Witt, Christian Schroeder, Pizzati, Fabio, Elkins, Katherine, Mukhopadhyay, Supratik, Bibi, Adel, Csaba, Botos, Steibel, Fabro, Barez, Fazl, Smith, Genevieve, Guadagni, Gianluca, Chun, Jon, Cabot, Jordi, Imperial, Joseph Marvin, Nolazco-Flores, Juan A., Landay, Lori, Jackson, Matthew, Röttger, Paul, Torr, Philip H. S., Darrell, Trevor, Lee, Yong Suk, Foerster, Jakob

arXiv.org Artificial IntelligenceMay-24-2024

In the next few years, applications of Generative AI are expected to revolutionize a number of different areas, ranging from science & medicine to education. The potential for these seismic changes has triggered a lively debate about potential risks and resulted in calls for tighter regulation, in particular from some of the major tech companies who are leading in AI development. This regulation is likely to put at risk the budding field of open-source Generative AI. We argue for the responsible open sourcing of generative AI models in the near and medium term. To set the stage, we first introduce an AI openness taxonomy system and apply it to 40 current large language models. We then outline differential benefits and risks of open versus closed source AI and present potential risk mitigation, ranging from best practices to calls for technical and scientific contributions. We hope that this report will add a much needed missing voice to the current public discourse on near to mid-term AI safety and other societal impact.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2404.17047

Country:

North America > United States (1.00)
Europe (1.00)
Asia > Middle East > UAE (0.28)

Genre: Research Report (0.63)

Industry:

Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Information Technology > Security & Privacy (1.00)
Health & Medicine (1.00)
(7 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Generation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (1.00)

Add feedback

Towards Certification of Uncertainty Calibration under Adversarial Attacks

Emde, Cornelius, Pinto, Francesco, Lukasiewicz, Thomas, Torr, Philip H. S., Bibi, Adel

arXiv.org Machine LearningMay-22-2024

Since neural classifiers are known to be sensitive to adversarial perturbations that alter their accuracy, \textit{certification methods} have been developed to provide provable guarantees on the insensitivity of their predictions to such perturbations. Furthermore, in safety-critical applications, the frequentist interpretation of the confidence of a classifier (also known as model calibration) can be of utmost importance. This property can be measured via the Brier score or the expected calibration error. We show that attacks can significantly harm calibration, and thus propose certified calibration as worst-case bounds on calibration under adversarial perturbations. Specifically, we produce analytic bounds for the Brier score and approximate bounds via the solution of a mixed-integer program on the expected calibration error. Finally, we propose novel calibration attacks and demonstrate how they can improve model calibration through \textit{adversarial calibration training}.

artificial intelligence, calibration, machine learning, (19 more...)

arXiv.org Machine Learning

2405.13922

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
North America > United States (0.14)

Genre: Research Report (1.00)

Industry:

Information Technology > Security & Privacy (0.50)
Government > Military (0.41)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

No "Zero-Shot" Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance

Udandarao, Vishaal, Prabhu, Ameya, Ghosh, Adhiraj, Sharma, Yash, Torr, Philip H. S., Bibi, Adel, Albanie, Samuel, Bethge, Matthias

arXiv.org Artificial IntelligenceApr-8-2024

Web-crawled pretraining datasets underlie the impressive "zero-shot" evaluation performance of multimodal models, such as CLIP for classification/retrieval and Stable-Diffusion for image generation. However, it is unclear how meaningful the notion of "zero-shot" generalization is for such multimodal models, as it is not known to what extent their pretraining datasets encompass the downstream concepts targeted for during "zero-shot" evaluation. In this work, we ask: How is the performance of multimodal models on downstream concepts influenced by the frequency of these concepts in their pretraining datasets? We comprehensively investigate this question across 34 models and five standard pretraining datasets (CC-3M, CC-12M, YFCC-15M, LAION-400M, LAION-Aesthetics), generating over 300GB of data artifacts. We consistently find that, far from exhibiting "zero-shot" generalization, multimodal models require exponentially more data to achieve linear improvements in downstream "zero-shot" performance, following a sample inefficient log-linear scaling trend. This trend persists even when controlling for sample-level similarity between pretraining and downstream datasets, and testing on purely synthetic data distributions. Furthermore, upon benchmarking models on long-tailed data sampled based on our analysis, we demonstrate that multimodal models across the board perform poorly. We contribute this long-tail test set as the "Let it Wag!" benchmark to further research in this direction. Taken together, our study reveals an exponential need for training data which implies that the key to "zero-shot" generalization capabilities under large-scale training paradigms remains to be found.

large language model, machine learning, natural language, (12 more...)

arXiv.org Artificial Intelligence

2404.04125

Country:

North America > United States > California (0.14)
Europe > United Kingdom > England (0.14)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine (0.46)
Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

On Pretraining Data Diversity for Self-Supervised Learning

Hammoud, Hasan Abed Al Kader, Das, Tuhin, Pizzati, Fabio, Torr, Philip, Bibi, Adel, Ghanem, Bernard

arXiv.org Artificial IntelligenceApr-5-2024

We explore the impact of training with more diverse datasets, characterized by the number of unique samples, on the performance of self-supervised learning (SSL) under a fixed computational budget. Our findings consistently demonstrate that increasing pretraining data diversity enhances SSL performance, albeit only when the distribution distance to the downstream data is minimal. Notably, even with an exceptionally large pretraining data diversity achieved through methods like web crawling or diffusion-generated data, among other ways, the distribution shift remains a challenge. Our experiments are comprehensive with seven SSL methods using large-scale datasets such as ImageNet and YFCC100M amounting to over 200 GPU days.

artificial intelligence, inductive learning, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2403.13808

Country: Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.71)

Add feedback

Lifelong Benchmarks: Efficient Model Evaluation in an Era of Rapid Progress

Prabhu, Ameya, Udandarao, Vishaal, Torr, Philip, Bethge, Matthias, Bibi, Adel, Albanie, Samuel

arXiv.org Artificial IntelligenceFeb-29-2024

Standardized benchmarks drive progress in machine learning. However, with repeated testing, the risk of overfitting grows as algorithms over-exploit benchmark idiosyncrasies. In our work, we seek to mitigate this challenge by compiling ever-expanding large-scale benchmarks called Lifelong Benchmarks. As exemplars of our approach, we create Lifelong-CIFAR10 and Lifelong-ImageNet, containing (for now) 1.69M and 1.98M test samples, respectively. While reducing overfitting, lifelong benchmarks introduce a key challenge: the high cost of evaluating a growing number of models across an ever-expanding sample set. To address this challenge, we also introduce an efficient evaluation framework: Sort \& Search (S&S), which reuses previously evaluated models by leveraging dynamic programming algorithms to selectively rank and sub-select test samples, enabling cost-effective lifelong benchmarking. Extensive empirical evaluations across 31,000 models demonstrate that S&S achieves highly-efficient approximate accuracy measurement, reducing compute cost from 180 GPU days to 5 GPU hours (1000x reduction) on a single A100 GPU, with low approximation error. As such, lifelong benchmarks offer a robust, practical solution to the "benchmark exhaustion" problem.

artificial intelligence, conference, machine learning, (12 more...)

arXiv.org Artificial Intelligence

2402.19472

Country:

Europe > Germany (0.14)
Europe > United Kingdom > England (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Prompting a Pretrained Transformer Can Be a Universal Approximator

Petrov, Aleksandar, Torr, Philip H. S., Bibi, Adel

arXiv.org Artificial IntelligenceFeb-22-2024

Despite the widespread adoption of prompting, prompt tuning and prefix-tuning of transformer models, our theoretical understanding of these fine-tuning methods remains limited. A key question is whether one can arbitrarily modify the behavior of pretrained model by prompting or prefix-tuning it. Formally, whether prompting and prefix-tuning a pretrained model can universally approximate sequence-to-sequence functions. This paper answers in the affirmative and demonstrates that much smaller pretrained models than previously thought can be universal approximators when prefixed. In fact, the attention mechanism is uniquely suited for universal approximation with prefix-tuning a single attention head being sufficient to approximate any continuous function. Moreover, any sequence-to-sequence function can be approximated by prefixing a transformer with depth linear in the sequence length. Beyond these density-type results, we also offer Jackson-type bounds on the length of the prefix needed to approximate a function to a desired precision.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2402.14753

Country: Europe (0.14)

Genre:

Overview (0.67)
Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Can Large Language Model Agents Simulate Human Trust Behaviors?

Xie, Chengxing, Chen, Canyu, Jia, Feiran, Ye, Ziyu, Shu, Kai, Bibi, Adel, Hu, Ziniu, Torr, Philip, Ghanem, Bernard, Li, Guohao

arXiv.org Artificial IntelligenceFeb-6-2024

Large Language Model (LLM) agents have been increasingly adopted as simulation tools to model humans in applications such as social science. However, one fundamental question remains: can LLM agents really simulate human behaviors? In this paper, we focus on one of the most critical behaviors in human interactions, trust, and aim to investigate whether or not LLM agents can simulate human trust behaviors. We first find that LLM agents generally exhibit trust behaviors, referred to as agent trust, under the framework of Trust Games, which are widely recognized in behavioral economics. Then, we discover that LLM agents can have high behavioral alignment with humans regarding trust behaviors, indicating the feasibility to simulate human trust behaviors with LLM agents. In addition, we probe into the biases in agent trust and the differences in agent trust towards agents and humans. We also explore the intrinsic properties of agent trust under conditions including advanced reasoning strategies and external manipulations. We further offer important implications for various scenarios where trust is paramount. Our study represents a significant step in understanding the behaviors of LLM agents and the LLM-human analogy.

amount sent returned 3xsent, large language model, machine learning, (13 more...)

arXiv.org Artificial Intelligence

2402.04559

Country:

North America > United States > Illinois (0.14)
North America > United States > California (0.14)

Genre: Research Report > New Finding (0.68)

Industry:

Leisure & Entertainment > Games (0.46)
Health & Medicine > Therapeutic Area (0.46)
Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback