AITopics | Davari, MohammadReza

Collaborating Authors

Davari, MohammadReza

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Model Breadcrumbs: Scaling Multi-Task Model Merging with Sparse Masks

Davari, MohammadReza, Belilovsky, Eugene

arXiv.org Artificial IntelligenceDec-11-2023

The rapid development of AI systems has been greatly influenced by the emergence of foundation models. A common approach for targeted problems involves fine-tuning these pre-trained foundation models for specific target tasks, resulting in a rapid spread of models fine-tuned across a diverse array of tasks. This work focuses on the problem of merging multiple fine-tunings of the same foundation model derived from a spectrum of auxiliary tasks. We introduce a new simple method, Model Breadcrumbs, which consists of a sparsely defined set of weights that carve out a trajectory within the weight space of a pre-trained model, enhancing task performance when traversed. These breadcrumbs are constructed by subtracting the weights from a pre-trained model before and after fine-tuning, followed by a sparsification process that eliminates weight outliers and negligible perturbations. Our experiments demonstrate the effectiveness of Model Breadcrumbs to simultaneously improve performance across multiple tasks. This contribution aligns with the evolving paradigm of updatable machine learning, reminiscent of the collaborative principles underlying open-source software development, fostering a community-driven effort to reliably update machine learning models. Our method is shown to be more efficient and unlike previous proposals does not require hyperparameter tuning for each new task added. Through extensive experimentation involving various models, tasks, and modalities we establish that integrating Model Breadcrumbs offers a simple, efficient, and highly effective approach for constructing multi-task models and facilitating updates to foundation models.

machine learning, model breadcrumb, natural language, (16 more...)

arXiv.org Artificial Intelligence

2312.06795

Country:

North America > United States (0.14)
Europe > Belgium (0.14)

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback

Prototype-Sample Relation Distillation: Towards Replay-Free Continual Learning

Asadi, Nader, Davari, MohammadReza, Mudur, Sudhir, Aljundi, Rahaf, Belilovsky, Eugene

arXiv.org Artificial IntelligenceJun-6-2023

In Continual learning (CL) balancing effective adaptation while combating catastrophic forgetting is a central challenge. Many of the recent best-performing methods utilize various forms of prior task data, e.g. a replay buffer, to tackle the catastrophic forgetting problem. Having access to previous task data can be restrictive in many real-world scenarios, for example when task data is sensitive or proprietary. To overcome the necessity of using previous tasks' data, in this work, we start with strong representation learning methods that have been shown to be less prone to forgetting. We propose a holistic approach to jointly learn the representation and class prototypes while maintaining the relevance of old class prototypes and their embedded similarities. Specifically, samples are mapped to an embedding space where the representations are learned using a supervised contrastive loss. Class prototypes are evolved continually in the same latent space, enabling learning and prediction at any point. To continually adapt the prototypes without keeping any prior task data, we propose a novel distillation loss that constrains class prototypes to maintain relative similarities as compared to new task data. This method yields state-of-the-art performance in the task-incremental setting, outperforming methods relying on large amounts of data, and provides strong performance in the class-incremental setting without using any stored data points.

artificial intelligence, machine learning, prototype, (13 more...)

arXiv.org Artificial Intelligence

2303.14771

Country:

North America > Canada (0.28)
North America > United States > Hawaii (0.14)

Genre:

Research Report (0.82)
Instructional Material (0.68)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

ToxBuster: In-game Chat Toxicity Buster with BERT

Yang, Zachary, Maricar, Yasmine, Davari, MohammadReza, Grenon-Godbout, Nicolas, Rabbany, Reihaneh

arXiv.org Artificial IntelligenceMay-21-2023

Detecting toxicity in online spaces is challenging and an ever more pressing problem given the increase in social media and gaming consumption. We introduce ToxBuster, a simple and scalable model trained on a relatively large dataset of 194k lines of game chat from Rainbow Six Siege and For Honor, carefully annotated for different kinds of toxicity. Compared to the existing state-of-the-art, ToxBuster achieves 82.95% (+7) in precision and 83.56% (+57) in recall. This improvement is obtained by leveraging past chat history and metadata. We also study the implication towards real-time and post-game moderation as well as the model transferability from one game to another.

chat line, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2305.12542

Country:

North America > United States (0.46)
North America > Canada > Quebec > Montreal (0.14)

Genre: Research Report (0.50)

Industry:

Leisure & Entertainment > Games > Computer Games (1.00)
Information Technology (0.93)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Reliability of CKA as a Similarity Measure in Deep Learning

Davari, MohammadReza, Horoi, Stefan, Natik, Amine, Lajoie, Guillaume, Wolf, Guy, Belilovsky, Eugene

arXiv.org Artificial IntelligenceNov-16-2022

Comparing learned neural representations in neural networks is a challenging but important problem, which has been approached in different ways. The Centered Kernel Alignment (CKA) similarity metric, particularly its linear variant, has recently become a popular approach and has been widely used to compare representations of a network's different layers, of architecturally similar networks trained differently, or of models with different architectures trained on the same data. A wide variety of conclusions about similarity and dissimilarity of these various representations have been made using CKA. In this work we present analysis that formally characterizes CKA sensitivity to a large class of simple transformations, which can naturally occur in the context of modern machine learning. This provides a concrete explanation of CKA sensitivity to outliers, which has been observed in past works, and to transformations that preserve the linear separability of the data, an important generalization attribute. We empirically investigate several weaknesses of the CKA similarity metric, demonstrating situations in which it gives unexpected or counter-intuitive results. Finally we study approaches for modifying representations to maintain functional behaviour while changing the CKA value. Our results illustrate that, in many cases, the CKA value can be easily manipulated without substantial changes to the functional behaviour of the models, and call for caution when leveraging activation alignment metrics. In the last decade, increasingly complex deep learning models have dominated machine learning and have helped us solve, with remarkable accuracy, a multitude of tasks across a wide array of domains. Due to the size and flexibility of these models it has been challenging to study and understand exactly how they solve the tasks we use them on. A helpful framework for thinking about these models is that of representation learning, where we view artificial neural networks (ANNs) as learning increasingly complex internal representations as we go deeper through their layers.

artificial intelligence, machine learning, representation, (19 more...)

arXiv.org Artificial Intelligence

2210.16156

Country:

North America > Canada (0.46)
Europe (0.46)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback