pqt
8d5f526a31d3731a30eb58d5874cf5b1-Supplemental-Conference.pdf
Note that given access to population of positives and unlabeled, α can be estimated as minxpupxq{pppxq. To make a prediction on test point from unlabeled data, we can then use Bayes rule to obtain the following transformation on probabilistic output ofthe domain discriminator:f " α In particular, for each classj PYs, we can first estimate its prevalencepαj in the unlabeled target. Forclassification,we can traink PU learning classifiersfi, wherefi is trained to classify a source classi versus others in target. Assuming that eachfj returns a score betweenr0,1s, during test time, an examplex is classifiedasfpxqgivenby fpxq" " Note that mathematically any OSLS problems can be thought of ask-PU problems as per(10). Put simply,for individual PU problems defined for source classesj PYs,we need existence of a sub-domainXj suchthatweonlyobserveexample forthatclassjinXj. This error incurred due to bias can be mild forasingle mixture proportion estimation taskbutaccumulates withincreasing number ofclasses (i.e.,k). Assume that there exists aunique solutionptpyq. Without loss of generality, we assume that|Xwp| " k.
Gaussian Weight Sampling for Scalable, Efficient and Stable Pseudo-Quantization Training
Ever-growing scale of large language models (LLMs) is pushing for improved efficiency, favoring fully quantized training (FQT) over BF16. While FQT accelerates training, it faces consistency challenges and requires searching over an exponential number of cases, each needing over 200B tokens to ensure stability. Pseudo-quantization training (PQT) addresses the issues of FQT, although it is not well-studied. We explore the practical implications of PQT in detail and propose a noise distribution $R$ that is floating-point (FP)-friendly, with ideal properties including stochastic precision annealing. As a result, the proposed method serves as an effective theoretical foundation for low-precision FP parameters through PQT, utilizing efficient fake quantization via an addition and subsequent FP casting. We demonstrate that Gaussian weight sampling is (1) scalable: supports low-precision FP parameters down to FP6 and high-precision noise up to 9-bit with BF16 operator. The proposed method is (2) efficient: incurring computational overhead as low as 1.40\% on the A100 GPU in terms of Llama2 training tokens per second, and requiring 2 bytes per parameter in GPU memory. We demonstrate that PQT with Gaussian weight sampling is (3) stable: closely following or even surpassing performance of the BF16 baseline while pre-training GPT2 and Llama2 models with up to 1B parameters and 300B tokens.
Efficient Inference and Computation of Optimal Alternatives for Preference Languages Based On Lexicographic Models
Wilson, Nic, George, Anne-Marie
We analyse preference inference, through consistency, for general preference languages based on lexicographic models. We identify a property, which we call strong compositionality, that applies for many natural kinds of preference statement, and that allows a greedy algorithm for determining consistency of a set of preference statements. We also consider different natural definitions of optimality, and their relations to each other, for general preference languages based on lexicographic models. Based on our framework, we show that testing consistency, and thus inference, is polynomial for a specific preference language LpqT, which allows strict and non-strict statements, comparisons between outcomes and between partial tuples, both ceteris paribus and strong statements, and their combination. Computing different kinds of optimal sets is also shown to be polynomial; this is backed up by our experimental results.
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Europe > Ireland > Munster > County Cork > Cork (0.04)
Product Quantized Translation for Fast Nearest Neighbor Search
Hwang, Yoonho (Pohang University of Science and Technology (POSTECH)) | Baek, Mooyeol (Pohang University of Science and Technology (POSTECH)) | Kim, Saehoon (Pohang University of Science and Technology (POSTECH)) | Han, Bohyung (Pohang University of Science and Technology (POSTECH)) | Ahn, Hee-Kap (Pohang University of Science and Technology (POSTECH))
This paper proposes a simple nearest neighbor search algorithm, which provides the exact solution in terms of the Euclidean distance efficiently. Especially, we present an interesting approach to improve the speed of nearest neighbor search by proper translations of data and query although the task is inherently invariant to the Euclidean transformations. The proposed algorithm aims to eliminate nearest neighbor candidates effectively using their distance lower bounds in nonlinear embedded spaces, and further improves the lower bounds by transforming data and query through product quantized translations. Although our framework is composed of simple operations only, it achieves the state-of-the-art performance compared to existing nearest neighbor search techniques, which is illustrated quantitatively using various large-scale benchmark datasets in different sizes and dimensions.
- Information Technology > Artificial Intelligence > Representation & Reasoning > Case-Based Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.48)