skim
SKIM: Any-bit Quantization Pushing The Limits of Post-Training Quantization
Bai, Runsheng, Liu, Bo, Liu, Qiang
Large Language Models (LLMs) exhibit impressive performance across various tasks, but deploying them for inference poses challenges. Their high resource demands often necessitate complex, costly multi-GPU pipelines, or the use of smaller, less capable models. While quantization offers a promising solution utilizing lower precision for model storage, existing methods frequently experience significant performance drops at lower precision levels. Additionally, they typically provide only a limited set of solutions at specific bit levels, many of which are extensively manually tuned. To address these challenges, we propose a new method called SKIM: Scaled K-means clustering wIth Mixed precision. Our approach introduces two novel techniques: 1. A greedy algorithm to solve approximately optimal bit allocation across weight channels, and 2. A trainable scaling vector for non-differentiable K-means clustering. These techniques substantially improve performance and can be adapted to any given bit. Notably, in terms of model perplexity, our method narrows the gap between 3-bit quantized LLaMA models and their full precision counterparts by 16.3% on average.
Improved selective background Monte Carlo simulation at Belle II with graph attention networks and weighted events
Yu, Boyang, Hartmann, Nikolai, Schinnerl, Luca, Kuhr, Thomas
When measuring rare processes at Belle II, a huge luminosity is required, which means a large number of simulations are necessary to determine signal efficiencies and background contributions. However, this process demands high computation costs while most of the simulated data, in particular in case of background, are discarded by the event selection. Thus, filters using graph neural networks are introduced at an early stage to save the resources for the detector simulation and reconstruction of events discarded at analysis level. In our work, we improved the performance of the filters using graph attention and investigated statistical methods including sampling and reweighting to deal with the biases introduced by the filtering.
Using GitHub as Artifactory for Machine Learning Model Artifacts · Omkar Prabhu
Note: This blog post is part of my ongoing work on experiments with model training, deployment and monitoring repository bitbeast. If you liked this blog post, please upvote on Hacker News. Last year, I launched Skim with my friends. It is a platform to find, manage and read research papers. The platform is powered by machine learning models for use cases like finding related papers and classifying research areas/tasks of the paper.
ESOMAR Fusion 2019 - Can machines be emotional? SKIM
We're looking forward to ESOMAR Fusion 2019 where we'll share our journey with Audeering – a German start-up that develops machine learning to detect emotions in voice – in analyzing'how' people communicate their needs, attitudes and interest. We all know the importance of identifying both rational and emotional consumer needs and drivers of decision-making and this is particularly the case in new product development. However, whilst we have techniques to uncover emotions qualitatively, what about when we need to size the unmet need or opportunity for a new product innovation? Together with Audeering, we had a goal to access their underlying emotions and explore an opportunity or evaluate a new product with greater validity by understanding their emotions in voice.