pebble
CES 2026 Day 1: The biggest tech news and gadgets you missed from the first official day of the show
Gaming tech, foldables, wearables and AI gadgets from the likes of NVIDIA, Samsung, Pebble, Lenovo, Meta and Razer dominated the first day of CES 2026. With its XD Rollable concept, Lenovo took the Thinkbook Plus Gen 6's basic design and made it even more futuristic by allowing its flexible display to wrap around onto its lid. CES 2026's first official show day kept the pace up with a mix of near-term gaming upgrades, ambitious new form factors and a few reminders that not every gadget needs to do everything. NVIDIA announced important gaming news, we caught up with Samsung's tri-fold phone and Lenovo marched out an army of impressive looking gaming laptops and concept tech. Here are the biggest stories from January 6.
- South America > Peru (0.04)
- North America > United States (0.04)
- Asia > South Korea (0.04)
- Information Technology > Hardware (1.00)
- Information Technology > Artificial Intelligence (1.00)
Pebble is making a weird little smart ring for recording thoughts
It's basically just a loop with a microphone and I want one. The Index 01 is almost anti-tech in its simplicity. There's no needless AI component shoehorned in, aside from speech-to-text. It's a ring with a microphone that you whisper ideas into and I want one. You get an idea while walking down the street, so you quietly whisper it into the ring. The ring sends the idea to a notes app or saves it for later review.
- Information Technology > Communications > Mobile (1.00)
- Information Technology > Artificial Intelligence (0.76)
A Reward Net Algorithm
In this section, we present the detailed procedures of MRN in Algorithm 1. In Section 4.2, the implicit derivative at iteration k of is calculated by: g Cauchy-Schwarz inequality, and the last inequality holds for the definition of Lipschitz smoothness. Lemma 2. Assume the outer loss Then the gradient of with respect to the outer loss is Lipschitz continuous. Theorem 1. Assume the outer loss Theorem 2. Assume the outer loss Even worse, it might be difficult for human experts to give preferences to trajectory pairs (e.g., a pair of poor trajectories.). This problem leads to a significant impact on the efficiency of the feedback in the initial stage.
Residual Reward Models for Preference-based Reinforcement Learning
Cao, Chenyang, Rogel-García, Miguel, Nabail, Mohamed, Wang, Xueqian, Rhinehart, Nicholas
Preference-based Reinforcement Learning (PbRL) provides a way to learn high-performance policies in environments where the reward signal is hard to specify, avoiding heuristic and time-consuming reward design. However, PbRL can suffer from slow convergence speed since it requires training in a reward model. Prior work has proposed learning a reward model from demonstrations and fine-tuning it using preferences. However, when the model is a neural network, using different loss functions for pre-training and fine-tuning can pose challenges to reliable optimization. In this paper, we propose a method to effectively leverage prior knowledge with a Residual Reward Model (RRM). An RRM assumes that the true reward of the environment can be split into a sum of two parts: a prior reward and a learned reward. The prior reward is a term available before training, for example, a user's ``best guess'' reward function, or a reward function learned from inverse reinforcement learning (IRL), and the learned reward is trained with preferences. We introduce state-based and image-based versions of RRM and evaluate them on several tasks in the Meta-World environment suite. Experimental results show that our method substantially improves the performance of a common PbRL method. Our method achieves performance improvements for a variety of different types of prior rewards, including proxy rewards, a reward obtained from IRL, and even a negated version of the proxy reward. We also conduct experiments with a Franka Panda to show that our method leads to superior performance on a real robot. It significantly accelerates policy learning for different tasks, achieving success in fewer steps than the baseline. The videos are presented at https://sunlighted.github.io/RRM-web/.
- North America > Canada > Ontario > Toronto (0.14)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
Pearls from Pebbles: Improved Confidence Functions for Auto-labeling
Auto-labeling is an important family of techniques that produce labeled training sets with minimum manual annotation. A prominent variant, threshold-based auto-labeling (TBAL), works by finding thresholds on a model's confidence scores above which it can accurately automatically label unlabeled data. However, many models are known to produce overconfident scores, leading to poor TBAL performance. While a natural idea is to apply off-the-shelf calibration methods to alleviate the overconfidence issue, we show that such methods fall short. Rather than experimenting with ad-hoc choices of confidence functions, we propose a framework for studying the optimal TBAL confidence function.
Preference VLM: Leveraging VLMs for Scalable Preference-Based Reinforcement Learning
Ghosh, Udita, Raychaudhuri, Dripta S., Li, Jiachen, Karydis, Konstantinos, Roy-Chowdhury, Amit
Preference-based reinforcement learning (RL) offers a promising approach for aligning policies with human intent but is often constrained by the high cost of human feedback. In this work, we introduce PrefVLM, a framework that integrates Vision-Language Models (VLMs) with selective human feedback to significantly reduce annotation requirements while maintaining performance. Our method leverages VLMs to generate initial preference labels, which are then filtered to identify uncertain cases for targeted human annotation. Additionally, we adapt VLMs using a self-supervised inverse dynamics loss to improve alignment with evolving policies. Experiments on Meta-World manipulation tasks demonstrate that PrefVLM achieves comparable or superior success rates to state-of-the-art methods while using up to 2 x fewer human annotations. Furthermore, we show that adapted VLMs enable efficient knowledge transfer across tasks, further minimizing feedback needs. Our results highlight the potential of combining VLMs with selective human supervision to make preference-based RL more scalable and practical.
- North America > United States > Illinois > Cook County > Chicago (0.04)
- North America > United States > California > Riverside County > Riverside (0.04)
- Research Report > Promising Solution (0.68)
- Research Report > New Finding (0.48)
Boosting Robustness in Preference-Based Reinforcement Learning with Dynamic Sparsity
Muslimani, Calarina, Grooten, Bram, Mamillapalli, Deepak Ranganatha Sastry, Pechenizkiy, Mykola, Mocanu, Decebal Constantin, Taylor, Matthew E.
For autonomous agents to successfully integrate into human-centered environments, agents should be able to learn from and adapt to humans in their native settings. Preference-based reinforcement learning (PbRL) is a promising approach that learns reward functions from human preferences. This enables RL agents to adapt their behavior based on human desires. However, humans live in a world full of diverse information, most of which is not relevant to completing a particular task. It becomes essential that agents learn to focus on the subset of task-relevant environment features. Unfortunately, prior work has largely ignored this aspect; primarily focusing on improving PbRL algorithms in standard RL environments that are carefully constructed to contain only task-relevant features. This can result in algorithms that may not effectively transfer to a more noisy real-world setting. To that end, this work proposes R2N (Robust-to-Noise), the first PbRL algorithm that leverages principles of dynamic sparse training to learn robust reward models that can focus on task-relevant features. We study the effectiveness of R2N in the Extremely Noisy Environment setting, an RL problem setting where up to 95% of the state features are irrelevant distractions. In experiments with a simulated teacher, we demonstrate that R2N can adapt the sparse connectivity of its neural networks to focus on task-relevant features, enabling R2N to significantly outperform several state-of-the-art PbRL algorithms in multiple locomotion and control environments.
- North America > Canada > Alberta (0.14)
- Europe > Netherlands > North Brabant > Eindhoven (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (0.95)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
RIME: Robust Preference-based Reinforcement Learning with Noisy Preferences
Cheng, Jie, Xiong, Gang, Dai, Xingyuan, Miao, Qinghai, Lv, Yisheng, Wang, Fei-Yue
Preference-based Reinforcement Learning (PbRL) circumvents the need for reward engineering by harnessing human preferences as the reward signal. However, current PbRL methods excessively depend on high-quality feedback from domain experts, which results in a lack of robustness. In this paper, we present RIME, a robust PbRL algorithm for effective reward learning from noisy preferences. Our method utilizes a sample selection-based discriminator to dynamically filter out noise and ensure robust training. To counteract the cumulative error stemming from incorrect selection, we suggest a warm start for the reward model, which additionally bridges the performance gap during the transition from pre-training to online training in PbRL. Our experiments on robotic manipulation and locomotion tasks demonstrate that RIME significantly enhances the robustness of the state-of-the-art PbRL method. Code is available at https://github.com/CJReinforce/RIME_ICML2024.
- Information Technology > Artificial Intelligence > Robots (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Leveraging Sub-Optimal Data for Human-in-the-Loop Reinforcement Learning
Muslimani, Calarina, Taylor, Matthew E.
To create useful reinforcement learning (RL) agents, step zero is to design a suitable reward function that captures the nuances of the task. However, reward engineering can be a difficult and time-consuming process. Instead, human-in-the-loop (HitL) RL allows agents to learn reward functions from human feedback. Despite recent successes, many of the HitL RL methods still require numerous human interactions to learn successful reward functions. To improve the feedback efficiency of HitL RL methods (i.e., require less feedback), this paper introduces Sub-optimal Data Pre-training, SDP, an approach that leverages reward-free, sub-optimal data to improve scalar- and preference-based HitL RL algorithms. In SDP, we start by pseudo-labeling all low-quality data with rewards of zero. Through this process, we obtain free reward labels to pre-train our reward model. This pre-training phase provides the reward model a head start in learning, whereby it can identify that low-quality transitions should have a low reward, all without any actual feedback. Through extensive experiments with a simulated teacher, we demonstrate that SDP can significantly improve or achieve competitive performance with state-of-the-art (SOTA) HitL RL algorithms across nine robotic manipulation and locomotion tasks.
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)