sera
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.67)
- Information Technology > Security & Privacy (1.00)
- Government (0.67)
- Information Technology > Security & Privacy (1.00)
- Information Technology > Data Science > Data Mining (1.00)
- Information Technology > Communications (1.00)
- (3 more...)
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.67)
- Information Technology > Security & Privacy (1.00)
- Government (0.67)
- Information Technology > Security & Privacy (1.00)
- Information Technology > Data Science > Data Mining (1.00)
- Information Technology > Communications (1.00)
- (3 more...)
From Failures to Fixes: LLM-Driven Scenario Repair for Self-Evolving Autonomous Driving
Xia, Xinyu, Ma, Xingjun, Hu, Yunfeng, Qu, Ting, Chen, Hong, Gong, Xun
Ensuring robust and generalizable autonomous driving requires not only broad scenario coverage but also efficient repair of failure cases, particularly those related to challenging and safety-critical scenarios. However, existing scenario generation and selection methods often lack adaptivity and semantic relevance, limiting their impact on performance improvement. In this paper, we propose \textbf{SERA}, an LLM-powered framework that enables autonomous driving systems to self-evolve by repairing failure cases through targeted scenario recommendation. By analyzing performance logs, SERA identifies failure patterns and dynamically retrieves semantically aligned scenarios from a structured bank. An LLM-based reflection mechanism further refines these recommendations to maximize relevance and diversity. The selected scenarios are used for few-shot fine-tuning, enabling targeted adaptation with minimal data. Experiments on the benchmark show that SERA consistently improves key metrics across multiple autonomous driving baselines, demonstrating its effectiveness and generalizability under safety-critical conditions.
- Transportation > Ground > Road (1.00)
- Information Technology > Robotics & Automation (1.00)
Local distribution-based adaptive oversampling for imbalanced regression
Alahyari, Shayan, Domaratzki, Mike
Imbalanced regression occurs when continuous target variables have skewed distributions, creating sparse regions that are difficult for machine learning models to predict accurately. This issue particularly affects neural networks, which often struggle with imbalanced data. While class imbalance in classification has been extensively studied, imbalanced regression remains relatively unexplored, with few effective solutions. Existing approaches often rely on arbitrary thresholds to categorize samples as rare or frequent, ignoring the continuous nature of target distributions. These methods can produce synthetic samples that fail to improve model performance and may discard valuable information through undersampling. To address these limitations, we propose LDAO (Local Distribution-based Adaptive Oversampling), a novel data-level approach that avoids categorizing individual samples as rare or frequent. Instead, LDAO learns the global distribution structure by decomposing the dataset into a mixture of local distributions, each preserving its statistical characteristics. LDAO then models and samples from each local distribution independently before merging them into a balanced training set. LDAO achieves a balanced representation across the entire target range while preserving the inherent statistical structure within each local distribution. In extensive evaluations on 45 imbalanced datasets, LDAO outperforms state-of-the-art oversampling methods on both frequent and rare target values, demonstrating its effectiveness for addressing the challenge of imbalanced regression.
- North America > United States > New Jersey > Hudson County > Hoboken (0.04)
- North America > United States > California (0.04)
- North America > Canada > Ontario > Middlesex County > London (0.04)
SeRA: Self-Reviewing and Alignment of Large Language Models using Implicit Reward Margins
Ko, Jongwoo, Dingliwal, Saket, Ganesh, Bhavana, Sengupta, Sailik, Bodapati, Sravan, Galstyan, Aram
Direct alignment algorithms (DAAs), such as direct preference optimization (DPO), have become popular alternatives for Reinforcement Learning from Human Feedback (RLHF) due to their simplicity, efficiency, and stability. However, the preferences used in DAAs are usually collected before the alignment training begins and remain unchanged (off-policy). This can lead to two problems where the policy model (1) picks up on spurious correlations in the dataset (as opposed to learning the intended alignment expressed in the human preference labels), and (2) overfits to feedback on off-policy trajectories that have less likelihood of being generated by an updated policy model. To address these issues, we introduce Self-Reviewing and Alignment (SeRA), a cost-efficient and effective method that can be readily combined with existing DAAs. SeRA comprises of two components: (1) sample selection using implicit reward margins, which helps alleviate over-fitting to some undesired features, and (2) preference bootstrapping using implicit rewards to augment preference data with updated policy models in a cost-efficient manner. Extensive experimentation, including some on instruction-following tasks, demonstrate the effectiveness and generality of SeRA in training LLMs on offline preference datasets with DAAs.
Sample Efficient Reward Augmentation in offline-to-online Reinforcement Learning
Zhang, Ziqi, Xiong, Xiao, Zhuang, Zifeng, Liu, Jinxin, Wang, Donglin
Offline-to-online RL can make full use of pre-collected offline datasets to initialize policies, resulting in higher sample efficiency and better performance compared to only using online algorithms alone for policy training. However, direct fine-tuning of the pre-trained policy tends to result in sub-optimal performance. A primary reason is that conservative offline RL methods diminish the agent's capability of exploration, thereby impacting online fine-tuning performance. To encourage agent's exploration during online fine-tuning and enhance the overall online fine-tuning performance, we propose a generalized reward augmentation method called Sample Efficient Reward Augmentation (SERA). Specifically, SERA encourages agent to explore by computing Q conditioned entropy as intrinsic reward. The advantage of SERA is that it can extensively utilize offline pre-trained Q to encourage agent uniformly coverage of state space while considering the imbalance between the distributions of high-value and low-value states. Additionally, SERA can be effortlessly plugged into various RL algorithms to improve online fine-tuning and ensure sustained asymptotic improvement. Moreover, extensive experimental results demonstrate that when conducting offline-to-online problems, SERA consistently and effectively enhances the performance of various offline algorithms.
- North America > United States > Montana (0.04)
- Oceania > New Zealand > North Island > Auckland Region > Auckland (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- (5 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Using Interpretable Machine Learning to Massively Increase the Number of Antibody-Virus Interactions Across Studies
Department of Statistics, Stanford University, Stanford, California, United States of America *Authors contributed equally to this work Correspondence should be addressed to teinav@fredhutch.org Abstract A central challenge in every field of biology is to use existing measurements to predict the outcomes of future experiments. In this work, we consider the wealth of antibody inhibition data against variants of the influenza virus. Due to this virus's genetic diversity and evolvability, the variants examined in one study will often have little-to-no overlap with other studies, making it difficult to discern common patterns or unify datasets for further analysis. To that end, we develop a computational framework that predicts how an antibody or serum would inhibit any variant from any other study. We use this framework to greatly expand seven influenza datasets utilizing hemagglutination inhibition, validating our method upon 200,000 existing measurements and predicting 2,000,000 new values uncertainties. With these new values, we quantify the transferability between seven vaccination and infection studies in humans and ferrets, show that the serum potency is negatively correlated with breadth, and present a tool for pandemic preparedness. This data-driven approach does not require any information beyond each virus's name and measurements, and even datasets with as few as 5 viruses can be expanded, making this approach widely applicable. Future influenza studies using hemagglutination inhibition can directly utilize our curated datasets to predict newly measured antibody responses against 80 H3N2 influenza viruses from 1968-2011, whereas immunological studies utilizing other viruses or a different assay only need a single partially-overlapping dataset to extend their work. In essence, this approach enables a shift in perspective when analyzing data from "what you see is what you get" into "what anyone sees is what everyone gets." Introduction Our understanding of how antibody-mediated immunity drives viral evolution and escape relies upon painstaking measurements of antibody binding, inhibition, or neutralization against variants of concern (Petrova and Russell, 2017). Every interaction is unique because: (1) the antibody response (serum) changes even in the absence of viral exposure and (2) for rapidly evolving viruses such as influenza, the specific variants examined in one study will often have little-to-no overlap with other studies (Figure 1).
- North America > United States > California > Santa Clara County > Stanford (0.24)
- Asia > Vietnam > Hanoi > Hanoi (0.04)
- Oceania > New Zealand > North Island > Auckland Region > Auckland (0.04)
- (8 more...)
Model Optimization in Imbalanced Regression
Silva, Aníbal, Ribeiro, Rita P., Moniz, Nuno
Imbalanced domain learning aims to produce accurate models in predicting instances that, though underrepresented, are of utmost importance for the domain. Research in this field has been mainly focused on classification tasks. Comparatively, the number of studies carried out in the context of regression tasks is negligible. One of the main reasons for this is the lack of loss functions capable of focusing on minimizing the errors of extreme (rare) values. Recently, an evaluation metric was introduced: Squared Error Relevance Area (SERA). This metric posits a bigger emphasis on the errors committed at extreme values while also accounting for the performance in the overall target variable domain, thus preventing severe bias. However, its effectiveness as an optimization metric is unknown. In this paper, our goal is to study the impacts of using SERA as an optimization criterion in imbalanced regression tasks. Using gradient boosting algorithms as proof of concept, we perform an experimental study with 36 data sets of different domains and sizes. Results show that models that used SERA as an objective function are practically better than the models produced by their respective standard boosting algorithms at the prediction of extreme values. This confirms that SERA can be embedded as a loss function into optimization-based learning algorithms for imbalanced regression scenarios.
- Asia > India (0.05)
- North America > United States > Texas (0.05)
- North America > United States > Pennsylvania (0.05)
- (6 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.89)
- Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.72)
- Asia > India (0.05)
- North America > United States > Texas (0.05)
- North America > United States > Pennsylvania (0.05)
- (6 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.89)
- Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.72)