AITopics | pebble

Collaborating Authors

pebble

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

8be9c134bb193d8bd3827d4df8488228-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-10-2026, 15:46:18 GMT

experiment, meta, walker, (17 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.48)
Information Technology > Artificial Intelligence > Robots (0.30)

Add feedback

CES 2026 Day 1: The biggest tech news and gadgets you missed from the first official day of the show

EngadgetJan-7-2026, 14:40:40 GMT

Gaming tech, foldables, wearables and AI gadgets from the likes of NVIDIA, Samsung, Pebble, Lenovo, Meta and Razer dominated the first day of CES 2026. With its XD Rollable concept, Lenovo took the Thinkbook Plus Gen 6's basic design and made it even more futuristic by allowing its flexible display to wrap around onto its lid. CES 2026's first official show day kept the pace up with a mix of near-term gaming upgrades, ambitious new form factors and a few reminders that not every gadget needs to do everything. NVIDIA announced important gaming news, we caught up with Samsung's tri-fold phone and Lenovo marched out an army of impressive looking gaming laptops and concept tech. Here are the biggest stories from January 6.

engadget, lenovo, samsung, (10 more...)

Engadget

Country:

South America > Peru (0.04)
North America > United States (0.04)
Asia > South Korea (0.04)

Genre: Personal > Interview (0.54)

Industry: Information Technology > Hardware (1.00)

Technology:

Information Technology > Hardware (1.00)
Information Technology > Artificial Intelligence (1.00)

Add feedback

Pebble is making a weird little smart ring for recording thoughts

EngadgetDec-9-2025, 16:17:23 GMT

It's basically just a loop with a microphone and I want one. The Index 01 is almost anti-tech in its simplicity. There's no needless AI component shoehorned in, aside from speech-to-text. It's a ring with a microphone that you whisper ideas into and I want one. You get an idea while walking down the street, so you quietly whisper it into the ring. The ring sends the idea to a notes app or saves it for later review.

artificial intelligence, pebble, smart ring, (7 more...)

Engadget

Technology:

Information Technology > Communications > Mobile (1.00)
Information Technology > Artificial Intelligence (0.76)

Add feedback

A Reward Net Algorithm

Neural Information Processing SystemsAug-16-2025, 20:21:32 GMT

In this section, we present the detailed procedures of MRN in Algorithm 1. In Section 4.2, the implicit derivative at iteration k of is calculated by: g Cauchy-Schwarz inequality, and the last inequality holds for the definition of Lipschitz smoothness. Lemma 2. Assume the outer loss Then the gradient of with respect to the outer loss is Lipschitz continuous. Theorem 1. Assume the outer loss Theorem 2. Assume the outer loss Even worse, it might be difficult for human experts to give preferences to trajectory pairs (e.g., a pair of poor trajectories.). This problem leads to a significant impact on the efficiency of the feedback in the initial stage.

artificial intelligence, machine learning, meta, (19 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.48)
Information Technology > Artificial Intelligence > Robots (0.30)

Add feedback

Residual Reward Models for Preference-based Reinforcement Learning

Cao, Chenyang, Rogel-García, Miguel, Nabail, Mohamed, Wang, Xueqian, Rhinehart, Nicholas

arXiv.org Artificial IntelligenceJul-2-2025

Preference-based Reinforcement Learning (PbRL) provides a way to learn high-performance policies in environments where the reward signal is hard to specify, avoiding heuristic and time-consuming reward design. However, PbRL can suffer from slow convergence speed since it requires training in a reward model. Prior work has proposed learning a reward model from demonstrations and fine-tuning it using preferences. However, when the model is a neural network, using different loss functions for pre-training and fine-tuning can pose challenges to reliable optimization. In this paper, we propose a method to effectively leverage prior knowledge with a Residual Reward Model (RRM). An RRM assumes that the true reward of the environment can be split into a sum of two parts: a prior reward and a learned reward. The prior reward is a term available before training, for example, a user's ``best guess'' reward function, or a reward function learned from inverse reinforcement learning (IRL), and the learned reward is trained with preferences. We introduce state-based and image-based versions of RRM and evaluate them on several tasks in the Meta-World environment suite. Experimental results show that our method substantially improves the performance of a common PbRL method. Our method achieves performance improvements for a variety of different types of prior rewards, including proxy rewards, a reward obtained from IRL, and even a negated version of the proxy reward. We also conduct experiments with a Franka Panda to show that our method leads to superior performance on a real robot. It significantly accelerates policy learning for different tasks, achieving success in fewer steps than the baseline. The videos are presented at https://sunlighted.github.io/RRM-web/.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

arXiv.org Artificial Intelligence

2507.00611

Country:

North America > Canada > Ontario > Toronto (0.14)
Europe > Netherlands > North Holland > Amsterdam (0.04)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Pearls from Pebbles: Improved Confidence Functions for Auto-labeling

Neural Information Processing SystemsMay-26-2025, 17:57:35 GMT

Auto-labeling is an important family of techniques that produce labeled training sets with minimum manual annotation. A prominent variant, threshold-based auto-labeling (TBAL), works by finding thresholds on a model's confidence scores above which it can accurately automatically label unlabeled data. However, many models are known to produce overconfident scores, leading to poor TBAL performance. While a natural idea is to apply off-the-shelf calibration methods to alleviate the overconfidence issue, we show that such methods fall short. Rather than experimenting with ad-hoc choices of confidence functions, we propose a framework for studying the optimal TBAL confidence function.

artificial intelligence, auto-labeling, machine learning, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Preference VLM: Leveraging VLMs for Scalable Preference-Based Reinforcement Learning

Ghosh, Udita, Raychaudhuri, Dripta S., Li, Jiachen, Karydis, Konstantinos, Roy-Chowdhury, Amit

arXiv.org Artificial IntelligenceFeb-3-2025

Preference-based reinforcement learning (RL) offers a promising approach for aligning policies with human intent but is often constrained by the high cost of human feedback. In this work, we introduce PrefVLM, a framework that integrates Vision-Language Models (VLMs) with selective human feedback to significantly reduce annotation requirements while maintaining performance. Our method leverages VLMs to generate initial preference labels, which are then filtered to identify uncertain cases for targeted human annotation. Additionally, we adapt VLMs using a self-supervised inverse dynamics loss to improve alignment with evolving policies. Experiments on Meta-World manipulation tasks demonstrate that PrefVLM achieves comparable or superior success rates to state-of-the-art methods while using up to 2 x fewer human annotations. Furthermore, we show that adapted VLMs enable efficient knowledge transfer across tasks, further minimizing feedback needs. Our results highlight the potential of combining VLMs with selective human supervision to make preference-based RL more scalable and practical.

human feedback, machine learning, reinforcement learning, (13 more...)

arXiv.org Artificial Intelligence

2502.01616

Country:

North America > United States > Illinois > Cook County > Chicago (0.04)
North America > United States > California > Riverside County > Riverside (0.04)

Genre:

Research Report > Promising Solution (0.68)
Research Report > New Finding (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Boosting Robustness in Preference-Based Reinforcement Learning with Dynamic Sparsity

Muslimani, Calarina, Grooten, Bram, Mamillapalli, Deepak Ranganatha Sastry, Pechenizkiy, Mykola, Mocanu, Decebal Constantin, Taylor, Matthew E.

arXiv.org Artificial IntelligenceJun-10-2024

For autonomous agents to successfully integrate into human-centered environments, agents should be able to learn from and adapt to humans in their native settings. Preference-based reinforcement learning (PbRL) is a promising approach that learns reward functions from human preferences. This enables RL agents to adapt their behavior based on human desires. However, humans live in a world full of diverse information, most of which is not relevant to completing a particular task. It becomes essential that agents learn to focus on the subset of task-relevant environment features. Unfortunately, prior work has largely ignored this aspect; primarily focusing on improving PbRL algorithms in standard RL environments that are carefully constructed to contain only task-relevant features. This can result in algorithms that may not effectively transfer to a more noisy real-world setting. To that end, this work proposes R2N (Robust-to-Noise), the first PbRL algorithm that leverages principles of dynamic sparse training to learn robust reward models that can focus on task-relevant features. We study the effectiveness of R2N in the Extremely Noisy Environment setting, an RL problem setting where up to 95% of the state features are irrelevant distractions. In experiments with a simulated teacher, we demonstrate that R2N can adapt the sparse connectivity of its neural networks to focus on task-relevant features, enabling R2N to significantly outperform several state-of-the-art PbRL algorithms in multiple locomotion and control environments.

pebble, timestep, true return true return, (13 more...)

arXiv.org Artificial Intelligence

2406.06495

Country:

North America > Canada > Alberta (0.14)
Europe > Netherlands > North Brabant > Eindhoven (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.95)

Industry: Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

RIME: Robust Preference-based Reinforcement Learning with Noisy Preferences

Cheng, Jie, Xiong, Gang, Dai, Xingyuan, Miao, Qinghai, Lv, Yisheng, Wang, Fei-Yue

arXiv.org Artificial IntelligenceMay-30-2024

Preference-based Reinforcement Learning (PbRL) circumvents the need for reward engineering by harnessing human preferences as the reward signal. However, current PbRL methods excessively depend on high-quality feedback from domain experts, which results in a lack of robustness. In this paper, we present RIME, a robust PbRL algorithm for effective reward learning from noisy preferences. Our method utilizes a sample selection-based discriminator to dynamically filter out noise and ensure robust training. To counteract the cumulative error stemming from incorrect selection, we suggest a warm start for the reward model, which additionally bridges the performance gap during the transition from pre-training to online training in PbRL. Our experiments on robotic manipulation and locomotion tasks demonstrate that RIME significantly enhances the robustness of the state-of-the-art PbRL method. Code is available at https://github.com/CJReinforce/RIME_ICML2024.

reward model, rime, robust preference-based reinforcement learning, (11 more...)

arXiv.org Artificial Intelligence

2402.17257

Country:

Europe > Austria > Vienna (0.14)
Asia > China > Zhejiang Province > Ningbo (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Education > Educational Setting > Online (0.69)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Leveraging Sub-Optimal Data for Human-in-the-Loop Reinforcement Learning

Muslimani, Calarina, Taylor, Matthew E.

arXiv.org Artificial IntelligenceApr-30-2024

To create useful reinforcement learning (RL) agents, step zero is to design a suitable reward function that captures the nuances of the task. However, reward engineering can be a difficult and time-consuming process. Instead, human-in-the-loop (HitL) RL allows agents to learn reward functions from human feedback. Despite recent successes, many of the HitL RL methods still require numerous human interactions to learn successful reward functions. To improve the feedback efficiency of HitL RL methods (i.e., require less feedback), this paper introduces Sub-optimal Data Pre-training, SDP, an approach that leverages reward-free, sub-optimal data to improve scalar- and preference-based HitL RL algorithms. In SDP, we start by pseudo-labeling all low-quality data with rewards of zero. Through this process, we obtain free reward labels to pre-train our reward model. This pre-training phase provides the reward model a head start in learning, whereby it can identify that low-quality transitions should have a low reward, all without any actual feedback. Through extensive experiments with a simulated teacher, we demonstrate that SDP can significantly improve or achieve competitive performance with state-of-the-art (SOTA) HitL RL algorithms across nine robotic manipulation and locomotion tasks.

experiment, reward model, transition, (15 more...)

arXiv.org Artificial Intelligence

2405.00746

Country: North America > Canada > Alberta (0.14)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Education (0.66)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback