quest
QUEST: Quadruple Multimodal Contrastive Learning with Constraints and Self-Penalization
Multimodal contrastive learning (MCL) has recently demonstrated significant success across various tasks. However, the existing MCL treats all negative samples equally and ignores the potential semantic association with positive samples, which limits the model's ability to achieve fine-grained alignment. In multi-view scenarios, MCL tends to prioritize shared information while neglecting modality-specific unique information across different views, leading to feature suppression and suboptimal performance in downstream tasks. To address these limitations, we propose a novel contrastive framework name QUEST: Quadruple Multimodal Contrastive Learning with Constraints and Self-Penalization. In the QUEST framework, we propose quaternion contrastive objectives and orthogonal constraints to extract sufficient unique information.
QueST: Self-Supervised Skill Abstractions for Learning Continuous Control
Generalization capabilities, or rather a lack thereof, is one of the most important unsolved problems in the field of robot learning, and while several large scale efforts have set out to tackle this problem, unsolved it remains. In this paper, we hypothesize that learning temporal action abstractions using latent variable models (LVMs), which learn to map data to a compressed latent space and back, is apromising direction towards low-level skills that can readily be used for new tasks. Although several works have attempted to show this, they have generally been limited by architectures that do not faithfully capture sharable representations. To address this we present Quantized Skill Transformer (QueST), which learns a larger and more flexible latent encoding that is more capable of modeling the breadth of low-level skills necessary for a variety of tasks. To make use of this extra flexibility, QueST imparts causal inductive bias from the action sequence data into the latent space, leading to more semantically useful and transferable representations.
QuEST: Stable Training of LLMs with 1-Bit Weights and Activations
Panferov, Andrei, Chen, Jiale, Tabesh, Soroush, Castro, Roberto L., Nikdan, Mahdi, Alistarh, Dan
One approach to reducing the massive costs of large language models (LLMs) is the use of quantized or sparse representations for training or deployment. While post-training compression methods are very popular, the question of obtaining even more accurate compressed models by directly training over such representations, i.e., Quantization-Aware Training (QAT), is still open: for example, a recent study (arXiv:2411.04330v2) put the "optimal" bit-width at which models can be trained using QAT, while staying accuracy-competitive with standard FP16/BF16 precision, at 8-bits weights and activations. We advance this state-of-the-art via a new method called QuEST, which is Pareto-competitive with FP16, i.e., it provides better accuracy at lower model size, while training models with weights and activations in 4-bits or less. Moreover, QuEST allows stable training with 1-bit weights and activations. QuEST achieves this by improving two key aspects of QAT methods: (1) accurate and fast quantization of the (continuous) distributions of weights and activations via Hadamard normalization and MSE-optimal fitting; (2) a new trust gradient estimator based on the idea of explicitly minimizing the error between the noisy gradient computed over quantized states and the "true" (but unknown) full-precision gradient. Experiments on Llama-type architectures show that QuEST induces stable scaling laws across the entire range of hardware-supported precisions, and can be extended to sparse representations. We provide GPU kernel support showing that models produced by QuEST can be executed efficiently. Our code is available at https://github.com/IST-DASLab/QuEST.
Reviews: Efficient Probabilistic Inference in the Quest for Physics Beyond the Standard Model
The main contributions of this work are pulling these ideas together into a practical framework that works on a real large-scale simulator. The original challenges that are addressed include: how to apply PPL to an existing code base? The other strength of the paper is the sheer depth of related work that is considered and explained, while being smooth to read at the same time. Ideally, we would have had more detail on the specific contributions of this paper, particularly on the "prior inflation" scheme and the protocol. The limitations of the writing come mainly from needing further explanation and discussion for why various ideas are being used, e.g., why do you consider LMH, RMH, IC? why would you "like to employ deep neural networks" in this context?
Reviews: Efficient Probabilistic Inference in the Quest for Physics Beyond the Standard Model
The paper presents a new probabilistic programming framework that makes Bayesian inference applicable to simulation code at scale. A large scale high energy physics application is presented. Probabilistic inference can be applied to an existing simulation code bass, allowing for'plug-and-play' inference. A large-scale particle physics application was provided. On the downside, the involved inference approaches themselves have already been published before.
QueST: Self-Supervised Skill Abstractions for Learning Continuous Control
Mete, Atharva, Xue, Haotian, Wilcox, Albert, Chen, Yongxin, Garg, Animesh
Generalization capabilities, or rather a lack thereof, is one of the most important unsolved problems in the field of robot learning, and while several large scale efforts have set out to tackle this problem, unsolved it remains. In this paper, we hypothesize that learning temporal action abstractions using latent variable models (LVMs), which learn to map data to a compressed latent space and back, is a promising direction towards low-level skills that can readily be used for new tasks. Although several works have attempted to show this, they have generally been limited by architectures that do not faithfully capture shareable representations. To address this we present Quantized Skill Transformer (QueST), which learns a larger and more flexible latent encoding that is more capable of modeling the breadth of low-level skills necessary for a variety of tasks. To make use of this extra flexibility, QueST imparts causal inductive bias from the action sequence data into the latent space, leading to more semantically useful and transferable representations. We compare to state-of-the-art imitation learning and LVM baselines and see that QueST's architecture leads to strong performance on several multitask and few-shot learning benchmarks. Further results and videos are available at https://quest-model.github.io/
- Health & Medicine > Consumer Health (0.50)
- Government > Regional Government (0.31)
- Food & Agriculture > Agriculture (0.31)
- Information Technology > Communications > Social Media (0.51)
- Information Technology > Communications > Mobile (0.51)
- Information Technology > Artificial Intelligence > Machine Learning (0.31)
Amazon's Quest for the 'Holy Grail' of Robotics
For decades, one of the hardest problems for robot developers to crack has been something seemingly mundane: how to replicate the human hand's ability to pick up stuff. The tech giant last month unveiled a collection of new robots, one of which is suited to replacing humans in the most common job at Amazon – picking up items and placing them elsewhere. The linchpin of this new kind of automation is a robot arm – appropriately named Sparrow after the tenacious, pervasive bird – that combines advanced artificial intelligence, a variety of grippers, and the speed and precision that is now standard in off-the-shelf industrial robotic arms. The announcement was easy to miss, coming as it did amid a run of news that, in part, illustrated some of the challenges Amazon is trying to tackle with its automation effort. The company began layoffs of corporate employees in mid-November, part of a sweeping cost-cutting effort to deal with the aftereffects of its rapid expansion during the pandemic. The company's workforce more than doubled during that period, to exceed 1.6 million as of early this year.
- North America > United States > New York (0.05)
- Asia > Japan (0.05)
The Quest to Save the Most Precious Voices on Earth
"My whole world is the human voice," says Harry Yeff. Yeff is also a digital artist, and he has traveled the world meeting experts and artists who share his obsession. He's spent the past five years collecting, he explains, the most precious voices on Earth. The motivation for his project is a simple fact: Every day, voices that could be preserved go extinct--whether that be the call of a critically endangered bird or a digital voice note lost in a phone update. That's why Yeff and his collaborator Trung Bao created Voice Gems: a project that uses AI to shape iconic and endangered voices into digital gemstones and physical sculptures.
Meta's VR Headset Harvests Personal Data Right Off Your Face
In November 2021, Facebook announced it would delete face recognition data extracted from images of more than 1 billion people and stop offering to automatically tag people in photos and videos. Luke Stark, an assistant professor at Western University, in Canada, told WIRED at the time that he considered the policy change a PR tactic because the company's VR push would likely lead to the expanded collection of physiological data and raise new privacy concerns. This week, Stark's prediction proved right. Meta, as the company that built Facebook is now called, introduced its latest VR headset, the Quest Pro. The new model adds a set of five inward-facing cameras that watch a person's face to track eye movements and facial expressions, allowing an avatar to reflect their expressions, smiling, winking, or raising an eyebrow in real time.
- Information Technology > Services (0.93)
- Information Technology > Security & Privacy (0.71)