AITopics | Louizos, Christos

Collaborating Authors

Louizos, Christos

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

On Sampling Strategies for Spectral Model Sharding

Korzhenkov, Denis, Louizos, Christos

arXiv.org Artificial IntelligenceOct-31-2024

The problem of heterogeneous clients in federated learning has recently drawn a lot of attention. Spectral model sharding, i.e., partitioning the model parameters into low-rank matrices based on the singular value decomposition, has been one of the proposed solutions for more efficient on-device training in such settings. In this work, we present two sampling strategies for such sharding, obtained as solutions to specific optimization problems. The first produces unbiased estimators of the original weights, while the second aims to minimize the squared approximation error. We discuss how both of these estimators can be incorporated in the federated learning loop and practical considerations that arise during local training. Empirically, we demonstrate that both of these methods can lead to improved performance on various commonly used datasets.

artificial intelligence, estimator, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2410.24106

Country: Europe > Netherlands (0.14)

Genre: Research Report (1.00)

Industry: Education (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Multi-Draft Speculative Sampling: Canonical Architectures and Theoretical Limits

Khisti, Ashish, Ebrahimi, M. Reza, Dbouk, Hassan, Behboodi, Arash, Memisevic, Roland, Louizos, Christos

arXiv.org Artificial IntelligenceOct-23-2024

We consider multi-draft speculative sampling, where the proposal sequences are sampled independently from different draft models. At each step, a token-level draft selection scheme takes a list of valid tokens as input and produces an output token whose distribution matches that of the target model. Previous works have demonstrated that the optimal scheme (which maximizes the probability of accepting one of the input tokens) can be cast as a solution to a linear program. In this work we show that the optimal scheme can be decomposed into a two-step solution: in the first step an importance sampling (IS) type scheme is used to select one intermediate token; in the second step (single-draft) speculative sampling is applied to generate the output token. For the case of two identical draft models we further 1) establish a necessary and sufficient condition on the distributions of the target and draft models for the acceptance probability to equal one and 2) provide an explicit expression for the optimal acceptance probability. Our theoretical analysis also motives a new class of token-level selection scheme based on weighted importance sampling. Our experimental results demonstrate consistent improvements in the achievable block efficiency and token rates over baseline schemes in a number of scenarios. The transformer architecture (Vaswani et al., 2017) has revolutionized the field of natural language processing and deep learning. One of the key factors contributing to the success story of transformers, as opposed to prior recurrent-based architectures (Hochreiter and Schmidhuber, 1997; Chung et al., 2014), is their inherent train-time parallelization due to the attention mechanism. This allows for massive scaling and lead to the development of state-of-the-art Large Language Models (LLMs) (Touvron et al., 2023; Achiam et al., 2023; Brown et al., 2020; Chowdhery et al., 2023) which have demonstrated remarkable performance across a wide range of tasks.

large language model, machine learning, multi-draft speculative sampling, (16 more...)

arXiv.org Artificial Intelligence

2410.18234

Country:

Europe (0.45)
North America (0.28)

Genre:

Workflow (1.00)
Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

An Information Theoretic Perspective on Conformal Prediction

Correia, Alvaro H. C., Massoli, Fabio Valerio, Louizos, Christos, Behboodi, Arash

arXiv.org Machine LearningJun-26-2024

Conformal Prediction (CP) is a distribution-free uncertainty estimation framework that constructs prediction sets guaranteed to contain the true answer with a user-specified probability. Intuitively, the size of the prediction set encodes a general notion of uncertainty, with larger sets associated with higher degrees of uncertainty. In this work, we leverage information theory to connect conformal prediction to other notions of uncertainty. More precisely, we prove three different ways to upper bound the intrinsic uncertainty, as described by the conditional entropy of the target variable given the inputs, by combining CP with information theoretical inequalities. Moreover, we demonstrate two direct and useful applications of such connection between conformal prediction and information theory: (i) more principled and effective conformal training objectives that generalize previous approaches and enable end-to-end training of machine learning models from scratch, and (ii) a natural mechanism to incorporate side information into conformal prediction. We empirically validate both applications in centralized and federated learning settings, showing our theoretical results translate to lower inefficiency (average prediction set size) for popular CP methods.

artificial intelligence, machine learning, prediction, (17 more...)

arXiv.org Machine Learning

2405.0214

Country:

Europe (1.00)
North America > United States > Massachusetts (0.27)

Genre:

Overview (0.92)
Research Report > New Finding (0.46)
Instructional Material > Course Syllabus & Notes (0.46)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Stable Diffusion-based Data Augmentation for Federated Learning with Non-IID Data

Morafah, Mahdi, Reisser, Matthias, Lin, Bill, Louizos, Christos

arXiv.org Artificial IntelligenceMay-13-2024

The proliferation of edge devices has brought Federated Learning (FL) to the forefront as a promising paradigm for decentralized and collaborative model training while preserving the privacy of clients' data. However, FL struggles with a significant performance reduction and poor convergence when confronted with Non-Independent and Identically Distributed (Non-IID) data distributions among participating clients. While previous efforts, such as client drift mitigation and advanced server-side model fusion techniques, have shown some success in addressing this challenge, they often overlook the root cause of the performance reduction - the absence of identical data accurately mirroring the global data distribution among clients. In this paper, we introduce Gen-FedSD, a novel approach that harnesses the powerful capability of state-of-the-art text-to-image foundation models to bridge the significant Non-IID performance gaps in FL. In Gen-FedSD, each client constructs textual prompts for each class label and leverages an off-the-shelf state-of-the-art pre-trained Stable Diffusion model to synthesize high-quality data samples. The generated synthetic data is tailored to each client's unique local data gaps and distribution disparities, effectively making the final augmented local data IID. Through extensive experimentation, we demonstrate that Gen-FedSD achieves state-of-the-art performance and significant communication cost savings across various datasets and Non-IID settings.

artificial intelligence, deep learning, machine learning, (11 more...)

arXiv.org Artificial Intelligence

2405.07925

Country: North America > United States > California (0.14)

Genre:

Research Report (1.00)
Overview (0.88)

Industry: Information Technology > Security & Privacy (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.46)

Add feedback

A Mutual Information Perspective on Federated Contrastive Learning

Louizos, Christos, Reisser, Matthias, Korzhenkov, Denis

arXiv.org Artificial IntelligenceMay-3-2024

We investigate contrastive learning in the federated setting through the lens of SimCLR and multi-view mutual information maximization. In doing so, we uncover a connection between contrastive representation learning and user verification; by adding a user verification loss to each client's local SimCLR loss we recover a lower bound to the global multi-view mutual information. To accommodate for the case of when some labelled data are available at the clients, we extend our SimCLR variant to the federated semi-supervised setting. We see that a supervised SimCLR objective can be obtained with two changes: a) the contrastive loss is computed between datapoints that share the same label and b) we require an additional auxiliary head that predicts the correct labels from either of the two views. Along with the proposed SimCLR extensions, we also study how different sources of non-i.i.d.-ness can impact the performance of federated unsupervised learning through global mutual information maximization; we find that a global objective is beneficial for some sources of non-i.i.d.-ness but can be detrimental for others. We empirically evaluate our proposed extensions in various tasks to validate our claims and furthermore demonstrate that our proposed modifications generalize to other pretraining methods.

artificial intelligence, learning, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2405.02081

Country: Europe > France (0.14)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.92)

Add feedback

DNA: Differentially private Neural Augmentation for contact tracing

Romijnders, Rob, Louizos, Christos, Asano, Yuki M., Welling, Max

arXiv.org Artificial IntelligenceApr-20-2024

The COVID19 pandemic had enormous economic and societal consequences. Contact tracing is an effective way to reduce infection rates by detecting potential virus carriers early. However, this was not generally adopted in the recent pandemic, and privacy concerns are cited as the most important reason. We substantially improve the privacy guarantees of the current state of the art in decentralized contact tracing. Whereas previous work was based on statistical inference only, we augment the inference with a learned neural network and ensure that this neural augmentation satisfies differential privacy. In a simulator for COVID19 even at ε = 1 per message, this can significantly improve the detection of potentially infected individuals and, as a result of targeted testing, reduce infection rates. The COVID19 pandemic had enormous consequences (Kim et al., 2022; Kaye et al., 2021; Boden et al., 2021; Vindegaard & Benros, 2020). Contact-tracing algorithms could make early predictions of virus carriers, signaling individuals to get tested and thereby reducing the spread of the virus (Baker et al., 2021).

artificial intelligence, differential privacy, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2404.13381

Country: Europe > Netherlands (0.28)

Genre: Research Report (0.82)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
(2 more...)

Add feedback

InterroGate: Learning to Share, Specialize, and Prune Representations for Multi-task Learning

Bejnordi, Babak Ehteshami, Kumar, Gaurav, Royer, Amelie, Louizos, Christos, Blankevoort, Tijmen, Ghafoorian, Mohsen

arXiv.org Artificial IntelligenceFeb-26-2024

Jointly learning multiple tasks with a unified model can improve accuracy and data efficiency, but it faces the challenge of task interference, where optimizing one task objective may inadvertently compromise the performance of another. A solution to mitigate this issue is to allocate task-specific parameters, free from interference, on top of shared features. However, manually designing such architectures is cumbersome, as practitioners need to balance between the overall performance across all tasks and the higher computational cost induced by the newly added parameters. In this work, we propose \textit{InterroGate}, a novel multi-task learning (MTL) architecture designed to mitigate task interference while optimizing inference computational efficiency. We employ a learnable gating mechanism to automatically balance the shared and task-specific representations while preserving the performance of all tasks. Crucially, the patterns of parameter sharing and specialization dynamically learned during training, become fixed at inference, resulting in a static, optimized MTL architecture. Through extensive empirical evaluations, we demonstrate SoTA results on three MTL benchmarks using convolutional as well as transformer-based backbones on CelebA, NYUD-v2, and PASCAL-Context.

artificial intelligence, deep learning, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2402.16848

Country:

Europe > Netherlands (0.14)
North America (0.14)
Asia > India (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Protect Your Score: Contact Tracing With Differential Privacy Guarantees

Romijnders, Rob, Louizos, Christos, Asano, Yuki M., Welling, Max

arXiv.org Artificial IntelligenceDec-18-2023

The pandemic in 2020 and 2021 had enormous economic and societal consequences, and studies show that contact tracing algorithms can be key in the early containment of the virus. While large strides have been made towards more effective contact tracing algorithms, we argue that privacy concerns currently hold deployment back. The essence of a contact tracing algorithm constitutes the communication of a risk score. Yet, it is precisely the communication and release of this score to a user that an adversary can leverage to gauge the private health status of an individual. We pinpoint a realistic attack scenario and propose a contact tracing algorithm with differential privacy guarantees against this attack. The algorithm is tested on the two most widely used agent-based COVID19 simulators and demonstrates superior performance in a wide range of settings. Especially for realistic test scenarios and while releasing each risk score with epsilon=1 differential privacy, we achieve a two to ten-fold reduction in the infection rate of the virus. To the best of our knowledge, this presents the first contact tracing algorithm with differential privacy guarantees when revealing risk scores for COVID19.

artificial intelligence, machine learning, privacy, (18 more...)

arXiv.org Artificial Intelligence

2312.11581

Country:

Europe > Netherlands (0.28)
North America > United States (0.28)
Europe > United Kingdom > England (0.14)

Genre: Research Report (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine > Epidemiology (0.94)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.69)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(3 more...)

Add feedback

Hyperparameter Optimization through Neural Network Partitioning

Mlodozeniec, Bruno, Reisser, Matthias, Louizos, Christos

arXiv.org Artificial IntelligenceApr-28-2023

Well-tuned hyperparameters are crucial for obtaining good generalization behavior in neural networks. They can enforce appropriate inductive biases, regularize the model and improve performance -- especially in the presence of limited data. In this work, we propose a simple and efficient way for optimizing hyperparameters inspired by the marginal likelihood, an optimization objective that requires no validation data. Each partition is associated with and optimized only on specific data shards. Combining these partitions into subnetworks allows us to define the "out-of-training-sample" loss of a subnetwork, i.e., the loss on data shards unseen by the subnetwork, as the objective for hyperparameter optimization. We demonstrate that we can apply this objective to optimize a variety of different hyperparameters in a single training run while being significantly computationally cheaper than alternative methods aiming to optimize the marginal likelihood for neural networks. Lastly, we also focus on optimizing hyperparameters in federated learning, where retraining and cross-validation are particularly challenging. Due to their remarkable generalization capabilities, deep neural networks have become the de-facto models for a wide range of complex tasks. Combining large models, large-enough datasets, and sufficient computing capabilities enable researchers to train powerful models through gradient descent. Regardless of the data regime, however, the choice of hyperparameters -- such as neural architecture, data augmentation strategies, regularization, or which optimizer to choose -- plays a crucial role in the final model's generalization capabilities. Hyperparameters allow encoding good inductive biases that effectively constrain the models' hypothesis space (e.g., convolutions for vision tasks), speed up learning, or prevent overfitting in the case of limited data. Whereas gradient descent enables the tuning of model parameters, accessing hyperparameter gradients is more complicated. This approach inherently requires training multiple models and consequently requires spending resources on models that will be discarded. Furthermore, traditional tuning requires a validation set since optimizing the hyperparameters on the training set alone cannot identify the right inductive biases. A canonical example is data augmentations -- they are not expected to improve training set performance, but they greatly help with generalization. In the low data regime, defining a validation set that cannot be used for tuning model parameters is undesirable. Picking the right amount of validation data is a hyperparameter in itself. The conventional rule of thumb to use 10% of all data can result in significant overfitting, as pointed out by Lorraine et al. (2019), when one has a sufficiently large number of hyperparameters to tune.

artificial intelligence, deep learning, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2304.14766

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.68)

Add feedback

DP-REC: Private & Communication-Efficient Federated Learning

Triastcyn, Aleksei, Reisser, Matthias, Louizos, Christos

arXiv.org Machine LearningDec-7-2021

Privacy and communication efficiency are important challenges in federated training of neural networks, and combining them is still an open problem. In this work, we develop a method that unifies highly compressed communication and differential privacy (DP). We introduce a compression technique based on Relative Entropy Coding (REC) to the federated setting. With a minor modification to REC, we obtain a provably differentially private learning algorithm, DP-REC, and show how to compute its privacy guarantees. Our experiments demonstrate that DP-REC drastically reduces communication costs while providing privacy guarantees comparable to the state-of-the-art. The performance of modern neural-network-based machine learning models scales exceptionally well with the amount of data that they are trained on (Kaplan et al., 2020; Henighan et al., 2020). At the same time, industry (Xiao & Karlin), legislators (Dwork, 2019; Voigt & Von dem Bussche, 2017) and consumers (Laziuk, 2021) have become more conscious about the need to protect the privacy of the data that might be used in training such models. Federated learning (FL) describes a machine learning principle that enables learning on decentralized data by computing updates on-device. Instead of sending its data to a central location, a "client" in a federation of devices sends model updates computed on its data to the central server. Such an approach to learning from decentralized data promises to unlock the computing capabilities of billions of edge devices, enable personalized models and new applications in e.g. On the other hand, the federated paradigm brings challenges along many dimensions such as learning from non-i.i.d. Neural network training requires many passes over the data, resulting in repeated transfer of the model and updates between the server and the clients, potentially making communication a primary bottleneck (Kairouz et al., 2019; Wang et al., 2021). Compressing updates is an active area of research in FL and an essential step in "untethering" edge devices from WiFi.

artificial intelligence, dp-rec, machine learning, (17 more...)

arXiv.org Machine Learning

2111.05454

Genre: Research Report (0.82)

Industry: Information Technology > Security & Privacy (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback