AITopics

2502.10628

Country: North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

arXiv.org Artificial IntelligenceOct-11-2024

Exact Byte-Level Probabilities from Tokenized Language Models for FIM-Tasks and Model Ensembles

Phan, Buu, Amos, Brandon, Gat, Itai, Havasi, Marton, Muckley, Matthew, Ullrich, Karen

Tokenization is associated with many poorly understood shortcomings in language models (LMs), yet remains an important component for long sequence scaling purposes. We discover that, even when the two models are statistically equivalent, their predictive distributions over the next byte can be substantially different, a phenomenon we term as "tokenization bias". To fully characterize this phenomenon, we introduce the Byte-Token Representation Lemma, a framework that establishes a mapping between the learned token distribution and its equivalent byte-level distribution. From this result, we develop a next-byte sampling algorithm that eliminates tokenization bias without requiring further training or optimization. In other words, this enables zero-shot conversion of tokenized LMs into statistically equivalent token-free ones. We demonstrate its broad applicability with two use cases: fill-in-the-middle (FIM) tasks and model ensembles. In FIM tasks where input prompts may terminate mid-token, leading to out-of-distribution tokenization, our method mitigates performance degradation and achieves an approximately 18% improvement in FIM coding benchmarks, consistently outperforming the standard token healing fix. For model ensembles where each model employs a distinct vocabulary, our approach enables seamless integration, resulting in improved performance (up to 3.7%) over individual models across various standard baselines in reasoning, knowledge, and coding. Transformers form the backbone of all widely-used state-of-the-art language models (LMs) such as GPTs (Brown et al., 2020), Llama (Touvron et al., 2023), ans Mistral (Jiang et al., 2023a).

large language model, machine learning, natural language, (17 more...)

2410.09303

Genre: Research Report > New Finding (0.93)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.67)

arXiv.org Artificial IntelligenceJul-5-2024

Understanding and Mitigating Tokenization Bias in Language Models

Phan, Buu, Havasi, Marton, Muckley, Matthew, Ullrich, Karen

State-of-the-art language models are autoregressive and operate on subword units known as tokens. Specifically, one must encode the conditioning string into a list of tokens before passing to the language models for next-token prediction. We show that popular encoding schemes, such as maximum prefix encoding (MPE) and byte-pair-encoding (BPE), induce a sampling bias that cannot be mitigated with more training or data. To counter this universal problem, for each encoding scheme above, we propose a novel algorithm to obtain unbiased estimates from any language model trained on tokenized data. Our methods do not require finetuning the model, and the complexity, defined as the number of model runs, scales linearly with the sequence length in the case of MPE. As a result, we show that one can simulate token-free behavior from a tokenized language model. We empirically verify the correctness of our method through a Markov-chain setup, where it accurately recovers the transition probabilities, as opposed to the conventional method of directly prompting tokens into the language model.

encode, large language model, machine learning, (14 more...)

2406.16829

Country: Europe > Austria > Vienna (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.35)

arXiv.org Artificial IntelligenceAug-22-2023

On the Choice of Perception Loss Function for Learned Video Compression

Salehkalaibar, Sadaf, Phan, Buu, Chen, Jun, Yu, Wei, Khisti, Ashish

We study causal, low-latency, sequential video compression when the output is subjected to both a mean squared-error (MSE) distortion loss as well as a perception loss to target realism. Motivated by prior approaches, we consider two different perception loss functions (PLFs). The first, PLF-JD, considers the joint distribution (JD) of all the video frames up to the current one, while the second metric, PLF-FMD, considers the framewise marginal distributions (FMD) between the source and reconstruction. Using information theoretic analysis and deep-learning based experiments, we demonstrate that the choice of PLF can have a significant effect on the reconstruction, especially at low-bit rates. In particular, while the reconstruction based on PLF-JD can better preserve the temporal correlation across frames, it also imposes a significant penalty in distortion compared to PLF-FMD and further makes it more difficult to recover from errors made in the earlier output frames. Although the choice of PLF decisively affects reconstruction quality, we also demonstrate that it may not be essential to commit to a particular PLF during encoding and the choice of PLF can be delegated to the decoder. In particular, encoded representations generated by training a system to minimize the MSE (without requiring either PLF) can be {\em near universal} and can generate close to optimal reconstructions for either choice of PLF at the decoder. We validate our results using (one-shot) information-theoretic analysis, detailed study of the rate-distortion-perception tradeoff of the Gauss-Markov source model as well as deep-learning based experiments on moving MNIST and KTH datasets.

artificial intelligence, machine learning, reconstruction, (18 more...)

2305.19301

Country: North America > Canada > Ontario > Toronto (0.28)

Genre: Research Report > New Finding (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Machine LearningOct-22-2019

Detecting Out-of-Distribution Inputs in Deep Neural Networks Using an Early-Layer Output

Abdelzad, Vahdat, Czarnecki, Krzysztof, Salay, Rick, Denounden, Taylor, Vernekar, Sachin, Phan, Buu

Deep neural networks achieve superior performance in challenging tasks such as image classification. However, deep classifiers tend to incorrectly classify out-of-distribution (OOD) inputs, which are inputs that do not belong to the classifier training distribution. Several approaches have been proposed to detect OOD inputs, but the detection task is still an ongoing challenge. In this paper, we propose a new OOD detection approach that can be easily applied to an existing classifier and does not need to have access to OOD samples. The detector is a one-class classifier trained on the output of an early layer of the original classifier fed with its original training set. We apply our approach to several low- and high-dimensional datasets and compare it to the state-of-the-art detection approaches. Our approach achieves substantially better results over multiple metrics.

dataset, deep learning, neural network, (14 more...)

1910.10307

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Machine LearningApr-27-2019

Analysis of Confident-Classifiers for Out-of-distribution Detection

Vernekar, Sachin, Gaurav, Ashish, Denouden, Taylor, Phan, Buu, Abdelzad, Vahdat, Salay, Rick, Czarnecki, Krzysztof

Discriminatively trained neural classifiers can be trusted, only when the input data comes from the training distribution (in-distribution). Therefore, detecting out-of-distribution (OOD) samples is very important to avoid classification errors. In the context of OOD detection for image classification, one of the recent approaches proposes training a classifier called "confident-classifier" by minimizing the standard cross-entropy loss on in-distribution samples and minimizing the KL divergence between the predictive distribution of OOD samples in the low-density regions of in-distribution and the uniform distribution (maximizing the entropy of the outputs). Thus, the samples could be detected as OOD if they have low confidence or high entropy. In this paper, we analyze this setting both theoretically and experimentally. We conclude that the resulting confident-classifier still yields arbitrarily high confidence for OOD samples far away from the in-distribution. We instead suggest training a classifier by adding an explicit "reject" class for OOD samples.

classifier, neural network, ood sample, (18 more...)

1904.1222

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.97)

arXiv.org Machine LearningDec-6-2018

Improving Reconstruction Autoencoder Out-of-distribution Detection with Mahalanobis Distance

Denouden, Taylor, Salay, Rick, Czarnecki, Krzysztof, Abdelzad, Vahdat, Phan, Buu, Vernekar, Sachin

There is an increasingly apparent need for validating the classifications made by deep learning systems in safety-critical applications like autonomous vehicle systems. A number of recent papers have proposed methods for detecting anomalous image data that appear different from known inlier data samples, including reconstruction-based autoencoders. Autoencoders optimize the compression of input data to a latent space of a dimensionality smaller than the original input and attempt to accurately reconstruct the input using that compressed representation. Since the latent vector is optimized to capture the salient features from the inlier class only, it is commonly assumed that images of objects from outside of the training class cannot effectively be compressed and reconstructed. Some thus consider reconstruction error as a kind of novelty measure. Here we suggest that reconstruction-based approaches fail to capture particular anomalies that lie far from known inlier samples in latent space but near the latent dimension manifold defined by the parameters of the model. We propose incorporating the Mahalanobis distance in latent space to better capture these out-of-distribution samples and our results show that this method often improves performance over the baseline approach.

deep learning, hybrid recon, neural network, (15 more...)

1812.02765

Genre: Research Report > New Finding (0.55)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

arXiv.org Machine LearningNov-27-2018

Calibrating Uncertainties in Object Localization Task

Phan, Buu, Salay, Rick, Czarnecki, Krzysztof, Abdelzad, Vahdat, Denouden, Taylor, Vernekar, Sachin

In many safety-critical applications such as autonomous driving and surgical robots, it is desirable to obtain prediction uncertainties from object detection modules to help support safe decision-making. Specifically, such modules need to estimate the probability of each predicted object in a given region and the confidence interval for its bounding box. While recent Bayesian deep learning methods provide a principled way to estimate this uncertainty, the estimates for the bounding boxes obtained using these methods are uncalibrated. In this paper, we address this problem for the single-object localization task by adapting an existing technique for calibrating regression models. We show, experimentally, that the resulting calibrated model obtains more reliable uncertainty estimates.

deep learning, neural network, uncertainty estimate, (16 more...)

1811.1121

Country:

North America > Canada (0.14)
Europe > Sweden (0.14)

Genre: Research Report > New Finding (0.47)

Industry:

Transportation > Ground > Road (0.35)
Information Technology > Robotics & Automation (0.35)
Automobiles & Trucks (0.35)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.72)

arXiv.org Machine LearningMay-4-2018

Improve Uncertainty Estimation for Unknown Classes in Bayesian Neural Networks with Semi-Supervised /One Set Classification

Phan, Buu

Although deep neural network (DNN) has achieved many state-of-the-art results, estimating the uncertainty presented in the DNN model and the data is a challenging task. Problems related to uncertainty such as classifying unknown classes (class which does not appear in the training data) data as known class with high confidence, is critically concerned in the safety domain area (e.g, autonomous driving, medical diagnosis). In this paper, we show that applying current Bayesian Neural Network (BNN) techniques alone does not effectively capture the uncertainty. To tackle this problem, we introduce a simple way to improve the BNN by using one class classification (in this paper, we use the term "set classification" instead). We empirically show the result of our method on an experiment which involves three datasets: MNIST, notMNIST and FMNIST.

deep learning, neural network, unknown class, (19 more...)

1805.01955

Country: North America > United States > California (0.14)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)