AITopics | cosine loss

Collaborating Authors

cosine loss

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Asterisk*: Keep it Simple

Semenov, Andrew

arXiv.org Artificial IntelligenceNov-8-2024

This paper describes Asterisk, a compact GPT-based model for generating text embeddings. The model uses a minimalist architecture with two layers, two attention heads, and 256 embedding dimensions. By applying knowledge distillation from larger pretrained models, we explore the trade-offs between model size and performance while minimizing computational and memory requirements. The model is primarily evaluated and optimized for classification tasks, with experimental results showing its moderate performance in zero-shot classification across various downstream applications. With additional configuration, the model performance can approach or even surpass that of larger architectures on specific classification tasks.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2411.05691

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

The Unreasonable Effectiveness of Random Target Embeddings for Continuous-Output Neural Machine Translation

Tokarchuk, Evgeniia, Niculae, Vlad

arXiv.org Artificial IntelligenceOct-31-2023

Continuous-output neural machine translation (CoNMT) replaces the discrete next-word prediction problem with an embedding prediction. The semantic structure of the target embedding space (i.e., closeness of related words) is intuitively believed to be crucial. We challenge this assumption and show that completely random output embeddings can outperform laboriously pretrained ones, especially on larger datasets. Further investigation shows this surprising effect is strongest for rare words, due to the geometry of their embeddings. We shed further light on this finding by designing a mixed strategy that combines random and pre-trained embeddings for different tokens.

computational linguistic, proceedings, translation, (15 more...)

arXiv.org Artificial Intelligence

2310.2062

Country:

Europe > Ireland > Leinster > County Dublin > Dublin (0.05)
Europe > Netherlands > North Holland > Amsterdam (0.04)
Europe > Belgium > Brussels-Capital Region > Brussels (0.04)
(5 more...)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

Feature Normalization Prevents Collapse of Non-contrastive Learning Dynamics

Bao, Han

arXiv.org Machine LearningSep-27-2023

Contrastive learning is a self-supervised representation learning framework, where two positive views generated through data augmentation are made similar by an attraction force in a data representation space, while a repulsive force makes them far from negative examples. Non-contrastive learning, represented by BYOL and SimSiam, further gets rid of negative examples and improves computational efficiency. While learned representations may collapse into a single point due to the lack of the repulsive force at first sight, Tian et al. (2021) revealed through the learning dynamics analysis that the representations can avoid collapse if data augmentation is sufficiently stronger than regularization. However, their analysis does not take into account commonly-used feature normalization, a normalizer before measuring the similarity of representations, and hence excessively strong regularization may collapse the dynamics, which is an unnatural behavior under the presence of feature normalization. Therefore, we extend the previous theory based on the L2 loss by considering the cosine loss, which involves feature normalization. We show that the cosine loss induces sixth-order dynamics (while the L2 loss induces a third-order one), in which a stable equilibrium dynamically emerges even if there are only collapsed solutions with given initial parameters. Thus, we offer a new understanding that feature normalization plays an important role in robustly preventing the dynamics collapse.

artificial intelligence, inductive learning, machine learning, (17 more...)

arXiv.org Machine Learning

2309.16109

Country:

Asia > Japan > Honshū > Kansai > Kyoto Prefecture > Kyoto (0.04)
Asia > Japan > Honshū > Chūbu > Nagano Prefecture > Nagano (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(2 more...)

Genre: Research Report (0.63)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.54)

Add feedback

Holographic-(V)AE: an end-to-end SO(3)-Equivariant (Variational) Autoencoder in Fourier Space

Visani, Gian Marco, Pun, Michael N., Angaji, Arman, Nourmohammad, Armita

arXiv.org Artificial IntelligenceJun-11-2023

Group-equivariant neural networks have emerged as a data-efficient approach to solve classification and regression tasks, while respecting the relevant symmetries of the data. However, little work has been done to extend this paradigm to the unsupervised and generative domains. Here, we present Holographic-(Variational) Auto Encoder (H-(V)AE), a fully end-to-end SO(3)-equivariant (variational) autoencoder in Fourier space, suitable for unsupervised learning and generation of data distributed around a specified origin in 3D. H-(V)AE is trained to reconstruct the spherical Fourier encoding of data, learning in the process a low-dimensional representation of the data (i.e., a latent space) with a maximally informative rotationally invariant embedding alongside an equivariant frame describing the orientation of the data. We extensively test the performance of H-(V)AE on diverse datasets. We show that the learned latent space efficiently encodes the categorical features of spherical images. Moreover, H-(V)AE's latent space can be used to extract compact embeddings for protein structure microenvironments, and when paired with a Random Forest Regressor, it enables state-of-the-art predictions of protein-ligand binding affinity.

artificial intelligence, latent space, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2209.15567

Country:

North America > United States > Washington > King County > Seattle (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > California (0.04)
(2 more...)

Genre: Research Report (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Coarse-to-Fine Covid-19 Segmentation via Vision-Language Alignment

Shan, Dandan, Li, Zihan, Chen, Wentao, Li, Qingde, Tian, Jie, Hong, Qingqi

arXiv.org Artificial IntelligenceMar-1-2023

Segmentation of COVID-19 lesions can assist physicians in better diagnosis and treatment of COVID-19. However, there are few relevant studies due to the lack of detailed information and high-quality annotation in the COVID-19 dataset. To solve the above problem, we propose C2FVL, a Coarse-to-Fine segmentation framework via Vision-Language alignment to merge text information containing the number of lesions and specific locations of image information. The introduction of text information allows the network to achieve better prediction results on challenging datasets. We conduct extensive experiments on two COVID-19 datasets including chest X-ray and CT, and the results demonstrate that our proposed method outperforms other state-of-the-art segmentation methods.

artificial intelligence, information, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2303.00279

Country:

Asia > China > Hong Kong (0.04)
Asia > China > Fujian Province > Xiamen (0.04)

Genre: Research Report > New Finding (0.34)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (0.97)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Deep Learning -- not only for the big ones

#artificialintelligenceAug-30-2020, 04:15:03 GMT

It may sound a bit obvious, but sometimes we miss the power of this kind of solution to solve small data problems -- I'm talking about Data Augmentation. The idea behind Data Augmentation is -- Points nearby a certain point (in hyperspace) represent a similar behavior. For eg: An image of a dog with increased contrast or brightness is still an image of a dog. Let's talk about SMOTE in more detail. To simplify we can define the concept behind SMOTE as "Birds of the same feather flock together", which reduced in data terms means that data-points that are close to each other in the hyperspace represent similar behavior so they can be approximated as new data-points for the dataset.

artificial intelligence, deep learning, machine learning, (14 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

Add feedback

A Close Look at Deep Learning with Small Data

Brigato, L., Iocchi, L.

arXiv.org Machine LearningMar-31-2020

In this work, we perform a wide variety of experiments with different Deep Learning architectures in small data conditions. We show that model complexity is a critical factor when only a few samples per class are available. Differently from the literature, we improve the state of the art using low complexity models. We show that standard convolutional neural networks with relatively few parameters are effective in this scenario. In many of our experiments, low complexity models outperform state-of-the-art architectures. Moreover, we propose a novel network that uses an unsupervised loss to regularize its training. Such architecture either improves the results either performs comparably well to low capacity networks. Surprisingly, experiments show that the dynamic data augmentation pipeline is not beneficial in this particular domain. Statically augmenting the dataset might be a promising research direction while dropout maintains its role as a good regularizer.

architecture, augmentation, dataset, (15 more...)

arXiv.org Machine Learning

2003.12843

Country: Europe > Italy (0.04)

Genre: Research Report (0.64)

Industry: Health & Medicine (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

How To Use Deep Learning Even with Small Data

#artificialintelligenceNov-12-2019, 11:01:45 GMT

It promises to solve your most complicated problems for the small price of an enormous amount of data. The only problem is you are not working at Google nor Facebook and data are scarce. So what are you to do? Can you still leverage the power of deep learning or are you out of luck? Let's take a look at how you might be able to leverage deep learning even with limited data and why I think this might be one of the most exciting areas of future research. Before we discuss methods for leveraging deep learning for your limited data, please step back from the neural networks and build a simple baseline.

data augmentation, fine-tuning, neural network, (13 more...)

#artificialintelligence

Industry: Information Technology (0.35)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Deep Learning on Small Datasets without Pre-Training using Cosine Loss

Barz, Björn, Denzler, Joachim

arXiv.org Machine LearningJan-25-2019

Two things seem to be indisputable in the contemporary deep learning discourse: 1. The categorical cross-entropy loss after softmax activation is the method of choice for classification. 2. Training a CNN classifier from scratch on small datasets does not work well. In contrast to this, we show that the cosine loss function provides significantly better performance than cross-entropy on datasets with only a handful of samples per class. For example, the accuracy achieved on the CUB-200-2011 dataset without pre-training is by 30% higher than with the cross-entropy loss. Further experiments on four other popular datasets confirm our findings. Moreover, we show that the classification performance can be improved further by integrating prior knowledge in the form of class hierarchies, which is straightforward with the cosine loss.

cosine loss, dataset, loss function, (9 more...)

arXiv.org Machine Learning

1901.09054

Country:

North America > Canada > Ontario > Toronto (0.14)
Oceania > Australia > New South Wales > Sydney (0.04)
North America > United States > California (0.04)
(2 more...)

Genre: Research Report > New Finding (0.66)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback