AITopics | Janson, Paul

Collaborating Authors

Janson, Paul

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Beyond Cosine Decay: On the effectiveness of Infinite Learning Rate Schedule for Continual Pre-training

Singh, Vaibhav, Janson, Paul, Mehrbod, Paria, Ibrahim, Adam, Rish, Irina, Belilovsky, Eugene, Thérien, Benjamin

arXiv.org Artificial IntelligenceMar-5-2025

The ever-growing availability of unlabeled data presents both opportunities and challenges for training artificial intelligence systems. While self-supervised learning (SSL) has emerged as a powerful paradigm for extracting meaningful representations from vast amounts of unlabeled data, existing methods still struggle to adapt to the non-stationary, non-IID nature of real-world data streams without forgetting previously learned knowledge. Recent works have adopted a repeated cosine annealing schedule for large-scale continual pre-training; however, these schedules (1) inherently cause forgetting during the re-warming phase and (2) have not been systematically compared to existing continual SSL methods. In this work, we systematically compare the widely used cosine schedule with the recently proposed infinite learning rate schedule and empirically find the latter to be a more effective alternative. Our extensive empirical evaluation across diverse image and language datasets demonstrates that the infinite learning rate schedule consistently enhances continual pre-training performance compared to a repeated cosine decay without being restricted to a fixed iteration budget. For instance, in a small-scale MAE pre-training setup, it outperforms several strong baselines from the literature. Our results show that the infinite learning rate schedule remains effective at scale, surpassing repeated cosine decay for both MAE pre-training and zero-shot LM benchmarks. These models are known for their massive parameter counts and extensive training on vast amounts of data, often developing impressive general-purpose capabilities unexpectedly during pre-training (Brown et al., 2020; Wei et al., 2022). While foundation models have demonstrated remarkable success on static tasks, adapting them to evolving data--such as the continuous influx of new textual information (Soldaini et al., 2024; Li et al., 2024; Abadji et al., 2022; Kocetkov et al., 2022) and the emergence of novel visual concepts (Prabhu et al., 2023; Seo et al., 2024)--remains a major challenge.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2503.02844

Country:

North America > Canada > Quebec (0.14)
North America > United States > Maryland (0.14)

Genre: Research Report > New Finding (1.00)

Industry:

Energy (0.87)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Domain-Aware Continual Zero-Shot Learning

Yi, Kai, Janson, Paul, Zhang, Wenxuan, Elhoseiny, Mohamed

arXiv.org Artificial IntelligenceDec-8-2023

Continual zero-shot learning involves learning seen classes incrementally while improving the ability to recognize unseen or yet-to-be-seen classes. It has a broad range of potential applications in real-world vision tasks, such as accelerating species discovery. However, in these scenarios, the changes in environmental conditions cause shifts in the presentation of captured images, which we refer to as domain shift, and adds complexity to the tasks. In this paper, we introduce Domain Aware Continual Zero-Shot Learning (DACZSL), a task that involves visually recognizing images of unseen categories in unseen domains continually. To address the challenges of DACZSL, we propose a Domain-Invariant Network (DIN). We empoly a dual network structure to learn factorized features to alleviate forgetting, where consists of a global shared net for domian-invirant and task-invariant features, and per-task private nets for task-specific features. Furthermore, we introduce a class-wise learnable prompt to obtain better class-level text representation, which enables zero-shot prediction of future unseen classes. To evaluate DACZSL, we introduce two benchmarks: DomainNet-CZSL and iWildCam-CZSL. Our results show that DIN significantly outperforms existing baselines and achieves a new state-of-the-art.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2112.12989

Country: North America > United States (0.28)

Genre: Research Report > New Finding (1.00)

Industry: Education (0.93)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

A Simple Baseline that Questions the Use of Pretrained-Models in Continual Learning

Janson, Paul, Zhang, Wenxuan, Aljundi, Rahaf, Elhoseiny, Mohamed

arXiv.org Artificial IntelligenceMar-29-2023

With the success of pretraining techniques in representation learning, a number of continual learning methods based on pretrained models have been proposed. Some of these methods design continual learning mechanisms on the pre-trained representations and only allow minimum updates or even no updates of the backbone models during the training of continual learning. In this paper, we question whether the complexity of these models is needed to achieve good performance by comparing them to a simple baseline that we designed. We argue that the pretrained feature extractor itself can be strong enough to achieve a competitive or even better continual learning performance on Split-CIFAR100 and CoRe 50 benchmarks. To validate this, we conduct a very simple baseline that 1) use the frozen pretrained model to extract image features for every class encountered during the continual learning stage and compute their corresponding mean features on training data, and 2) predict the class of the input based on the nearest neighbor distance between test samples and mean features of the classes; i.e., Nearest Mean Classifier (NMC). This baseline is single-headed, exemplar-free, and can be task-free (by updating the means continually). This baseline achieved 88.53% on 10-Split-CIFAR-100, surpassing most state-of-the-art continual learning methods that are all initialized using the same pretrained transformer model. We hope our baseline may encourage future progress in designing learning systems that can continually add quality to the learning representations even if they started from some pretrained weights.

artificial intelligence, continual learning, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2210.04428

Country: North America > United States (0.29)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)

Add feedback