AITopics | directcopy

Collaborating Authors

directcopy

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

The Edge of Orthogonality: A Simple View of What Makes BYOL Tick

Richemond, Pierre H., Tam, Allison, Tang, Yunhao, Strub, Florian, Piot, Bilal, Hill, Felix

arXiv.org Artificial IntelligenceFeb-9-2023

Self-predictive unsupervised learning methods such as BYOL or SimSiam have shown impressive results, and counter-intuitively, do not collapse to trivial representations. In this work, we aim at exploring the simplest possible mathematical arguments towards explaining the underlying mechanisms behind self-predictive unsupervised learning. We start with the observation that those methods crucially rely on the presence of a predictor network (and stop-gradient). With simple linear algebra, we show that when using a linear predictor, the optimal predictor is close to an orthogonal projection, and propose a general framework based on orthonormalization that enables to interpret and give intuition on why BYOL works. In addition, this framework demonstrates the crucial role of the exponential moving average and stop-gradient operator in BYOL as an efficient orthonormalization mechanism. We use these insights to propose four new \emph{closed-form predictor} variants of BYOL to support our analysis. Our closed-form predictors outperform standard linear trainable predictor BYOL at $100$ and $300$ epochs (top-$1$ linear accuracy on ImageNet).

artificial intelligence, machine learning, predictor, (19 more...)

arXiv.org Artificial Intelligence

2302.04817

Country:

Europe > Russia (0.04)
Asia > Russia (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.34)

Add feedback

Towards Demystifying Representation Learning with Non-contrastive Self-supervision

Wang, Xiang, Chen, Xinlei, Du, Simon S., Tian, Yuandong

arXiv.org Machine LearningOct-10-2021

Self-supervised learning recently emerges as a promising direction to learn representations without manual labels. While contrastive learning (Oord et al., 2018; Tian et al., 2019; Bachman et al., 2019; He et al., 2020; Chen et al., 2020a) minimizes the distance of representation between positive pairs, and maximizes such distances between negative pairs, recently, non-contrastive self-supervised learning (abbreviated as nc-SSL) is able to learn nontrivial representation with only positive pairs, using an extra predictor and a stop-gradient operation. Furthermore, the learned representation shows comparable (or even better) performance for downstream tasks (e.g., image classification) (Grill et al., 2020; Chen & He, 2020). This brings about two fundamental questions: (1) why the learned representation does not collapse to trivial (i.e., constant) solutions, and (2) without negative pairs, what representation nc-SSL learns from the training and how the learned representation reduces the sample complexity in downstream tasks. While many theoretical results on contrastive SSL (Arora et al., 2019; Lee et al., 2020; Tosh et al., 2020; Wen & Li, 2021) do exist, similar study on nc-SSL has been very rare. As one of the first work towards this direction, Tian et al. (2021) show that while the global optimum of the non-contrastive loss is indeed a trivial one, following gradient direction in nc-SSL, one can find a local optimum that admits a nontrivial representation. Based on their theoretical findings on gradient-based methods, they proposed a new approach, DirectPred, that directly sets the predictor using the eigen-decomposition of the correlation matrix of input before the predictor, rather than updating it with gradient methods. As a method for nc-SSL, DirectPred shows comparable or better performance in multiple datasets, including CIFAR-10 (Krizhevsky et al., 2009), STL-10 (Coates et al., 2011) and ImageNet (Deng et al., 2009), compared to BYOL (Grill et al., 2020) and SimSiam (Chen & He, 2020) that optimize the predictor using gradient descent.

converge, directcopy, probability, (15 more...)

arXiv.org Machine Learning

2110.04947

Country:

North America > United States (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.68)

Add feedback