Goto

Collaborating Authors

 eureka


Leveraging LLMs for reward function design in reinforcement learning control tasks

Cardenoso, Franklin, Caarls, Wouter

arXiv.org Artificial Intelligence

The challenge of designing effective reward functions in reinforcement learning (RL) represents a significant bottleneck, often requiring extensive human expertise and being time-consuming. Previous work and recent advancements in large language models (LLMs) have demonstrated their potential for automating the generation of reward functions. However, existing methodologies often require preliminary evaluation metrics, human-engineered feedback for the refinement process, or the use of environmental source code as context. To address these limitations, this paper introduces LEARN-Opt (LLM-based Evaluator and Analyzer for Reward functioN Optimization). This LLM-based, fully autonomous, and model-agnostic framework eliminates the need for preliminary metrics and environmental source code as context to generate, execute, and evaluate reward function candidates from textual descriptions of systems and task objectives. LEARN-Opt's main contribution lies in its ability to autonomously derive performance metrics directly from the system description and the task objective, enabling unsupervised evaluation and selection of reward functions. Our experiments indicate that LEARN-Opt achieves performance comparable to or better to that of state-of-the-art methods, such as EUREKA, while requiring less prior knowledge. We find that automated reward design is a high-variance problem, where the average-case candidate fails, requiring a multi-run approach to find the best candidates. Finally, we show that LEARN-Opt can unlock the potential of low-cost LLMs to find high-performing candidates that are comparable to, or even better than, those of larger models. This demonstrated performance affirms its potential to generate high-quality reward functions without requiring any preliminary human-defined metrics, thereby reducing engineering overhead and enhancing generalizability.


Interestingness First Classifiers

Sato, Ryoma

arXiv.org Machine Learning

Most machine learning models are designed to maximize predictive accuracy. In this work, we explore a different goal: building classifiers that are interesting. An ``interesting classifier'' is one that uses unusual or unexpected features, even if its accuracy is lower than the best possible model. For example, predicting room congestion from CO2 levels achieves near-perfect accuracy but is unsurprising. In contrast, predicting room congestion from humidity is less accurate yet more nuanced and intriguing. We introduce EUREKA, a simple framework that selects features according to their perceived interestingness. Our method leverages large language models to rank features by their interestingness and then builds interpretable classifiers using only the selected interesting features. Across several benchmark datasets, EUREKA consistently identifies features that are non-obvious yet still predictive. For example, in the Occupancy Detection dataset, our method favors humidity over CO2 levels and light intensity, producing classifiers that achieve meaningful accuracy while offering insights. In the Twin Papers dataset, our method discovers the rule that papers with a colon in the title are more likely to be cited in the future. We argue that such models can support new ways of knowledge discovery and communication, especially in settings where moderate accuracy is sufficient but novelty and interpretability are valued.


Your 'Eureka!' moments can be seen in brain scans

Popular Science

Breakthroughs, discoveries, and DIY tips sent every weekday. That euphoric feeling when a great idea strikes or a challenging puzzle piece fits into place is electric–and also helps our brains. Now, a team of researchers from the United States and Germany have taken a peek inside the brain to see what those so-called aha, lightbulb, or eureka moments look like. The new brain imaging shows that these flashes of insights reshape how the brain represents information and helps burn it into our memory. According to Maxi Becker, a study co-author and cognitive neuroscientist at Humboldt University in Berlin, if you have one of these aha moments when solving a problem, "you're actually more likely to remember the solution.'" The findings are detailed in a study published May 9 in the journal Nature Communications.


The Clear Sky Corridor: Insights Towards Aerosol Formation in Exoplanets Using An AI-based Survey of Exoplanet Atmospheres

Ashtari, Reza, Stevenson, Kevin B., Sing, David, Lopez-Morales, Mercedes, Alam, Munazza K., Nikolov, Nikolay K., Evans-Soma, Thomas M.

arXiv.org Artificial Intelligence

Producing optimized and accurate transmission spectra of exoplanets from telescope data has traditionally been a manual and labor-intensive procedure. Here we present the results of the first attempt to improve and standardize this procedure using artificial intelligence (AI) based processing of light curves and spectroscopic data from transiting exoplanets observed with the Hubble Space Telescope's (HST) Wide Field Camera 3 (WFC3) instrument. We implement an AI-based parameter optimizer that autonomously operates the Eureka pipeline to produce homogeneous transmission spectra of publicly available HST WFC3 datasets, spanning exoplanet types from hot Jupiters to sub-Neptunes. Surveying 42 exoplanets with temperatures between 280 and 2580 Kelvin, we confirm modeled relationships between the amplitude of the water band at 1.4um in hot Jupiters and their equilibrium temperatures. We also identify a similar, novel trend in Neptune/sub-Neptune atmospheres, but shifted to cooler temperatures. Excitingly, a planet mass versus equilibrium temperature diagram reveals a "Clear Sky Corridor," where planets between 700 and 1700 Kelvin (depending on the mass) show stronger 1.4um H2O band measurements. This novel trend points to metallicity as a potentially important driver of aerosol formation. As we unveil and include these new discoveries into our understanding of aerosol formation, we enter a thrilling future for the study of exoplanet atmospheres. With HST sculpting this foundational understanding for aerosol formation in various exoplanet types, ranging from Jupiters to sub-Neptunes, we present a compelling platform for the James Webb Space Telescope (JWST) to discover similar atmospheric trends for more planets across a broader wavelength range.


Efficient Language-instructed Skill Acquisition via Reward-Policy Co-Evolution

Huang, Changxin, Chang, Yanbin, Lin, Junfan, Liang, Junyang, Zeng, Runhao, Li, Jianqiang

arXiv.org Artificial Intelligence

The ability to autonomously explore and resolve tasks with minimal human guidance is crucial for the self-development of embodied intelligence. Although reinforcement learning methods can largely ease human effort, it's challenging to design reward functions for real-world tasks, especially for high-dimensional robotic control, due to complex relationships among joints and tasks. Recent advancements large language models (LLMs) enable automatic reward function design. However, approaches evaluate reward functions by re-training policies from scratch placing an undue burden on the reward function, expecting it to be effective throughout the whole policy improvement process. We argue for a more practical strategy in robotic autonomy, focusing on refining existing policies with policy-dependent reward functions rather than a universal one. To this end, we propose a novel reward-policy co-evolution framework where the reward function and the learned policy benefit from each other's progressive on-the-fly improvements, resulting in more efficient and higher-performing skill acquisition. Specifically, the reward evolution process translates the robot's previous best reward function, descriptions of tasks and environment into text inputs. These inputs are used to query LLMs to generate a dynamic amount of reward function candidates, ensuring continuous improvement at each round of evolution. For policy evolution, our method generates new policy populations by hybridizing historically optimal and random policies. Through an improved Bayesian optimization, our approach efficiently and robustly identifies the most capable and plastic reward-policy combination, which then proceeds to the next round of co-evolution. Despite using less data, our approach demonstrates an average normalized improvement of 95.3% across various high-dimensional robotic skill learning tasks.


Video2Reward: Generating Reward Function from Videos for Legged Robot Behavior Learning

Zeng, Runhao, Zhou, Dingjie, Liang, Qiwei, Liu, Junlin, Li, Hui, Huang, Changxin, Li, Jianqiang, Hu, Xiping, Sun, Fuchun

arXiv.org Artificial Intelligence

Learning behavior in legged robots presents a significant challenge due to its inherent instability and complex constraints. Recent research has proposed the use of a large language model (LLM) to generate reward functions in reinforcement learning, thereby replacing the need for manually designed rewards by experts. However, this approach, which relies on textual descriptions to define learning objectives, fails to achieve controllable and precise behavior learning with clear directionality. In this paper, we introduce a new video2reward method, which directly generates reward functions from videos depicting the behaviors to be mimicked and learned. Specifically, we first process videos containing the target behaviors, converting the motion information of individuals in the videos into keypoint trajectories represented as coordinates through a video2text transforming module. These trajectories are then fed into an LLM to generate the reward function, which in turn is used to train the policy. To enhance the quality of the reward function, we develop a video-assisted iterative reward refinement scheme that visually assesses the learned behaviors and provides textual feedback to the LLM. This feedback guides the LLM to continually refine the reward function, ultimately facilitating more efficient behavior learning. Experimental results on tasks involving bipedal and quadrupedal robot motion control demonstrate that our method surpasses the performance of state-of-the-art LLM-based reward generation methods by over 37.6% in terms of human normalized score. More importantly, by switching video inputs, we find our method can rapidly learn diverse motion behaviors such as walking and running.


ELEMENTAL: Interactive Learning from Demonstrations and Vision-Language Models for Reward Design in Robotics

Chen, Letian, Gombolay, Matthew

arXiv.org Artificial Intelligence

Reinforcement learning (RL) has demonstrated compelling performance in robotic tasks, but its success often hinges on the design of complex, ad hoc reward functions. Researchers have explored how Large Language Models (LLMs) could enable non-expert users to specify reward functions more easily. However, LLMs struggle to balance the importance of different features, generalize poorly to out-of-distribution robotic tasks, and cannot represent the problem properly with only text-based descriptions. To address these challenges, we propose ELEMENTAL (intEractive LEarning froM dEmoNstraTion And Language), a novel framework that combines natural language guidance with visual user demonstrations to align robot behavior with user intentions better. By incorporating visual inputs, ELEMENTAL overcomes the limitations of text-only task specifications, while leveraging inverse reinforcement learning (IRL) to balance feature weights and match the demonstrated behaviors optimally. ELEMENTAL also introduces an iterative feedback-loop through self-reflection to improve feature, reward, and policy learning. Our experiment results demonstrate that ELEMENTAL outperforms prior work by 42.3% on task success, and achieves 41.3% better generalization in out-of-distribution tasks, highlighting its robustness in LfD.


The best robot vacuum I've tested this year is 330 off for Black Friday

Popular Science

We may earn revenue from the products available on this page and participate in affiliate programs. Nobody likes cleaning their floors, but it's a chore that has to be done regularly. The Eureka J20 is a robot vacuum that will do the job for you--even when you're not home--so you don't have to pull out a vacuum yourself. We've tested the J20, and it's left our floors spotless while being smart enough to make its way around plenty of obstacles, from chair legs to shoes and boxes. An early Black Friday deal brings the Eureka J20 down to its lowest price ever.


Automated Rewards via LLM-Generated Progress Functions

Sarukkai, Vishnu, Shacklett, Brennan, Majercik, Zander, Bhatia, Kush, Ré, Christopher, Fatahalian, Kayvon

arXiv.org Artificial Intelligence

Large Language Models (LLMs) have the potential to automate reward engineering by leveraging their broad domain knowledge across various tasks. However, they often need many iterations of trial-and-error to generate effective reward functions. This process is costly because evaluating every sampled reward function requires completing the full policy optimization process for each function. In this paper, we introduce an LLM-driven reward generation framework that is able to produce state-of-the-art policies on the challenging Bi-DexHands benchmark with 20x fewer reward function samples than the prior state-of-the-art work. Our key insight is that we reduce the problem of generating task-specific rewards to the problem of coarsely estimating task progress. Our two-step solution leverages the task domain knowledge and the code synthesis abilities of LLMs to author progress functions that estimate task progress from a given state. Then, we use this notion of progress to discretize states, and generate count-based intrinsic rewards using the low-dimensional state space. We show that the combination of LLM-generated progress functions and count-based intrinsic rewards is essential for our performance gains, while alternatives such as generic hash-based counts or using progress directly as a reward function fall short.


Eureka's J20 robot vacuum is my favorite smart cleaning tool--and it's 170 cheaper than usual on Prime Day

Popular Science

Nobody wants to sweep or mop their floors, and with Eureka's J20, you don't have to. This smart-home vacuum, which allows you to monitor and schedule cleanings using Wi-Fi, is a massive improvement over previous generations of similar hardware. The robot vacuum is also currently 170 cheaper than usual thanks to a deal during Amazon's October Prime Day (aka Prime Big Deal Days). Remember, if you don't have an active Amazon Prime subscription, you can sign up for a free 30-day trial here, which is the perfect price to save you time, trouble, and almost 200. The premium smart-home cleaning device has every feature you'd need, from a maximum suction of 8000Pa (pascals), the ability to clean both hardwood and carpeted surfaces, and, critically, the intelligence to return to its dock to recharge and clean its brushes. Robot vacuums have become commoditized over the past few years, but the J20 stands out due to its raw performance.