AITopics

2402.16569

Country: Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.14)

Genre: Research Report (0.82)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Sensing and Signal Processing > Image Processing (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)

arXiv.org Artificial IntelligenceFeb-23-2024

Representing Online Handwriting for Recognition in Large Vision-Language Models

Fadeeva, Anastasiia, Schlattner, Philippe, Maksai, Andrii, Collier, Mark, Kokiopoulou, Efi, Berent, Jesse, Musat, Claudiu

The adoption of tablets with touchscreens and styluses is increasing, and a key feature is converting handwriting to text, enabling search, indexing, and AI assistance. Meanwhile, vision-language models (VLMs) are now the go-to solution for image understanding, thanks to both their state-of-the-art performance across a variety of tasks and the simplicity of a unified approach to training, fine-tuning, and inference. While VLMs obtain high performance on image-based tasks, they perform poorly on handwriting recognition when applied naively, i.e., by rendering handwriting as an image and performing optical character recognition (OCR). In this paper, we study online handwriting recognition with VLMs, going beyond naive OCR. We propose a novel tokenized representation of digital ink (online handwriting) that includes both a time-ordered sequence of strokes as text, and as image. We show that this representation yields results comparable to or better than state-of-the-art online handwriting recognizers. Wide applicability is shown through results with two different VLM families, on multiple public datasets. Our approach can be applied to off-the-shelf VLMs, does not require any changes in their architecture, and can be used in both fine-tuning and parameter-efficient tuning. We perform a detailed ablation study to identify the key elements of the proposed representation.

large language model, machine learning, natural language, (18 more...)

2402.15307

Country:

North America > United States (0.14)
Europe > Italy (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Vision > Handwriting Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.94)

arXiv.org Artificial IntelligenceOct-30-2023

Three Towers: Flexible Contrastive Learning with Pretrained Image Models

Kossen, Jannik, Collier, Mark, Mustafa, Basil, Wang, Xiao, Zhai, Xiaohua, Beyer, Lucas, Steiner, Andreas, Berent, Jesse, Jenatton, Rodolphe, Kokiopoulou, Efi

We introduce Three Towers (3T), a flexible method to improve the contrastive learning of vision-language models by incorporating pretrained image classifiers. While contrastive models are usually trained from scratch, LiT (Zhai et al., 2022) has recently shown performance gains from using pretrained classifier embeddings. However, LiT directly replaces the image tower with the frozen embeddings, excluding any potential benefits from training the image tower contrastively. With 3T, we propose a more flexible strategy that allows the image tower to benefit from both pretrained embeddings and contrastive training. To achieve this, we introduce a third tower that contains the frozen pretrained embeddings, and we encourage alignment between this third tower and the main image-text towers. Empirically, 3T consistently improves over LiT and the CLIP-style from-scratch baseline for retrieval tasks. For classification, 3T reliably improves over the from-scratch baseline, and while it underperforms relative to LiT for JFT-pretrained models, it outperforms LiT for ImageNet-21k and Places365 pretraining.

artificial intelligence, machine learning, natural language, (20 more...)

2305.16999

Country: Europe (0.29)

Genre: Research Report (0.85)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)

arXiv.org Artificial IntelligenceOct-10-2023

Pi-DUAL: Using Privileged Information to Distinguish Clean from Noisy Labels

Wang, Ke, Ortiz-Jimenez, Guillermo, Jenatton, Rodolphe, Collier, Mark, Kokiopoulou, Efi, Frossard, Pascal

Label noise is a pervasive problem in deep learning that often compromises the generalization performance of trained models. Recently, leveraging privileged information (PI) -- information available only during training but not at test time -- has emerged as an effective approach to mitigate this issue. Yet, existing PI-based methods have failed to consistently outperform their no-PI counterparts in terms of preventing overfitting to label noise. To address this deficiency, we introduce Pi-DUAL, an architecture designed to harness PI to distinguish clean from wrong labels. Pi-DUAL decomposes the output logits into a prediction term, based on conventional input features, and a noise-fitting term influenced solely by PI. A gating mechanism steered by PI adaptively shifts focus between these terms, allowing the model to implicitly separate the learning paths of clean and wrong labels. Empirically, Pi-DUAL achieves significant performance improvements on key PI benchmarks (e.g., +6.8% on ImageNet-PI), establishing a new state-of-the-art test set accuracy. Additionally, Pi-DUAL is a potent method for identifying noisy samples post-training, outperforming other strong methods at this task. Overall, Pi-DUAL is a simple, scalable and practical approach for mitigating the effects of label noise in a variety of real-world scenarios with PI.

artificial intelligence, deep learning, machine learning, (20 more...)

2310.066

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

arXiv.org Artificial IntelligenceJun-1-2023

When does Privileged Information Explain Away Label Noise?

Ortiz-Jimenez, Guillermo, Collier, Mark, Nawalgaria, Anant, D'Amour, Alexander, Berent, Jesse, Jenatton, Rodolphe, Kokiopoulou, Effrosyni

Leveraging privileged information (PI), or features available during training but not at test time, has recently been shown to be an effective method for addressing label noise. However, the reasons for its effectiveness are not well understood. In this study, we investigate the role played by different properties of the PI in explaining away label noise. Through experiments on multiple datasets with real PI (CIFAR-N/H) and a new large-scale benchmark ImageNet-PI, we find that PI is most helpful when it allows networks to easily distinguish clean from noisy data, while enabling a learning shortcut to memorize the noisy examples. Interestingly, when PI becomes too predictive of the target label, PI methods often perform worse than their no-PI baselines. Based on these findings, we propose several enhancements to the state-of-the-art PI methods and demonstrate the potential of PI as a means of tackling label noise. Finally, we show how we can easily combine the resulting PI approaches with existing no-PI techniques designed to deal with label noise.

accuracy, artificial intelligence, machine learning, (16 more...)

2303.01806

Country: North America > United States (0.28)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.67)

arXiv.org Artificial IntelligenceJan-30-2023

Massively Scaling Heteroscedastic Classifiers

Collier, Mark, Jenatton, Rodolphe, Mustafa, Basil, Houlsby, Neil, Berent, Jesse, Kokiopoulou, Effrosyni

Heteroscedastic classifiers, which learn a multivariate Gaussian distribution over prediction logits, have been shown to perform well on image classification problems with hundreds to thousands of classes. However, compared to standard classifiers, they introduce extra parameters that scale linearly with the number of classes. This makes them infeasible to apply to larger-scale problems. In addition heteroscedastic classifiers introduce a critical temperature hyperparameter which must be tuned. We propose HET-XL, a heteroscedastic classifier whose parameter count when compared to a standard classifier scales independently of the number of classes. In our large-scale settings, we show that we can remove the need to tune the temperature hyperparameter, by directly learning it on the training data. On large image classification datasets with up to 4B images and 30k classes our method requires 14X fewer additional parameters, does not require tuning the temperature on a held-out set and performs consistently better than the baseline heteroscedastic classifier. HET-XL improves ImageNet 0-shot classification in a multimodal contrastive learning setup which can be viewed as a 3.5 billion class classification problem.

artificial intelligence, het-xl, machine learning, (15 more...)

2301.1286

Country:

North America (0.28)
Europe (0.28)

Genre:

Research Report > Experimental Study (0.68)
Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)

arXiv.org Machine LearningOct-6-2021

Deep Classifiers with Label Noise Modeling and Distance Awareness

Fortuin, Vincent, Collier, Mark, Wenzel, Florian, Allingham, James, Liu, Jeremiah, Tran, Dustin, Lakshminarayanan, Balaji, Berent, Jesse, Jenatton, Rodolphe, Kokiopoulou, Effrosyni

Uncertainty estimation in deep learning has recently emerged as a crucial area of interest to advance reliability and robustness in safety-critical applications. While there have been many proposed methods that either focus on distance-aware model uncertainties for out-of-distribution detection or on input-dependent label uncertainties for in-distribution calibration, both of these types of uncertainty are often necessary. In this work, we propose the HetSNGP method for jointly modeling the model and data uncertainty. We show that our proposed model affords a favorable combination between these two complementary types of uncertainty and thus outperforms the baseline methods on some challenging out-of-distribution datasets, including CIFAR-100C, Imagenet-C, and Imagenet-A. Moreover, we propose HetSNGP Ensemble, an ensembled version of our method which adds an additional type of uncertainty and also outperforms other ensemble baselines.

artificial intelligence, machine learning, neural network, (16 more...)

2110.02609

Country:

Europe (0.28)
North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report (0.50)

Industry: Health & Medicine (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.35)

arXiv.org Machine LearningSep-9-2020

Routing Networks with Co-training for Continual Learning

Collier, Mark, Kokiopoulou, Efi, Gesmundo, Andrea, Berent, Jesse

The core challenge with continual learning is catastrophic forgetting, the phenomenon that when neural networks are trained on a sequence of tasks they rapidly forget previously learned tasks. It has been observed that catastrophic forgetting is most severe when tasks are dissimilar to each other. We propose the use of sparse routing networks for continual learning. For each input, these network architectures activate a different path through a network of experts. Routing networks have been shown to learn to route similar tasks to overlapping sets of experts and dissimilar tasks to disjoint sets of experts. In the continual learning context this behaviour is desirable as it minimizes interference between dissimilar tasks while allowing positive transfer between related tasks. In practice, we find it is necessary to develop a new training method for routing networks, which we call co-training which avoids poorly initialized experts when new tasks are presented. When combined with a small episodic memory replay buffer, sparse routing networks with co-training outperform densely connected networks on the MNIST-Permutations and MNIST-Rotations benchmarks.

artificial intelligence, learning, neural network, (15 more...)

2009.04381

Genre: Research Report (0.40)

Industry: Education (0.69)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Communications > Networks (0.92)

arXiv.org Machine LearningJul-13-2020

VAEs in the Presence of Missing Data

Collier, Mark, Nazabal, Alfredo, Williams, Christopher K. I.

Real world datasets often contain entries with Existing approaches which adapt VAEs to datasets with missing elements e.g. in a medical dataset, a patient missing data (Vedantam et al., 2017; Nazabal et al., 2018; is unlikely to have taken all possible diagnostic Mattei & Frellsen, 2019; Ma et al., 2019) suffer from a number tests. Variational Autoencoders (VAEs) are of significant disadvantages, including 1) not handling popular generative models often used for unsupervised missing not at random (MNAR) data, 2) replacing missing learning. Despite their widespread use elements with zeros with no way to distinguish an observed it is unclear how best to apply VAEs to datasets data element with value zero from a missing element, 3) with missing data. We develop a novel latent not scaling to high dimensional inputs and/or 4) restricting variable model of a corruption process which the types of neural network architectures permitted, these generates missing data, and derive a corresponding issues are discussed in detail below. We aim to improve tractable evidence lower bound (ELBO). Our upon the handling of missing data by VAEs by addressing model is straightforward to implement, can handle the disadvantages of the existing approaches. In particular both missing completely at random (MCAR) and we propose a novel latent variable probabilistic model of missing not at random (MNAR) data, scales to missing data as the result of a corruption process, and derive high dimensional inputs and gives both the VAE a tractable ELBO for our proposed model.

artificial intelligence, dataset, neural network, (18 more...)

2006.05301

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Data Science > Data Quality (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

arXiv.org Machine LearningSep-18-2019

Memory-Augmented Neural Networks for Machine Translation

Collier, Mark, Beel, Joeran

Memory-augmented neural networks (MANNs) have been shown to outperform other recurrent neural network architectures on a series of artificial sequence learning tasks, yet they have had limited application to real-world tasks. We evaluate direct application of Neural Turing Machines (NTM) and Differentiable Neural Computers (DNC) to machine translation. We further propose and evaluate two models which extend the attentional encoder-decoder with capabilities inspired by memory augmented neural networks. We evaluate our proposed models on IWSLT Vietnamese to English and ACL Romanian to English datasets. Our proposed models and the memory augmented neural networks perform similarly to the attentional encoder-decoder on the Vietnamese to English translation task while have a 0.3-1.9 lower BLEU score for the Romanian to English task. Interestingly, our analysis shows that despite being equipped with additional flexibility and being randomly initialized memory augmented neural networks learn an algorithm for machine translation almost identical to the attentional encoder-decoder.

attentional encoder-decoder, deep learning, neural network, (17 more...)

1909.08314

Country: Europe > Ireland > Leinster > County Dublin > Dublin (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)