coe
Second Thoughts are Best: Learning to Re-Align With Human Values from Text Edits - Appendix
A.1 Detailed Re-alignment T ask Formulation and Training Setup In Figure A1, we show the procedure for converting the data samples in the alignment datasets into training data of AEM (negative samples used in AIL are generated similarly). Then our decipher module will translate these special tokens into natural language. For AEM, we fine-tune the LM with the above-mentioned Source-CoE-Target data (as shown in Figure A1, "Input for AEM") with the common language modeling objective, which is to maximize the probability of generating ground truth tokens at each decoding step. We train with three epochs for each task by default but set an early-stopping condition when the evaluation loss does not decrease (i.e., plateaus) for five intermediate evaluation steps. LM can know the boundary between Context + Source and Chain-of-Edits (CoEs) + Target.
- North America > United States > Oregon > Multnomah County > Portland (0.04)
- North America > United States > Michigan > Washtenaw County > Ann Arbor (0.04)
- North America > Canada > British Columbia > Vancouver (0.04)
- (4 more...)
Empathy-R1: A Chain-of-Empathy and Reinforcement Learning Framework for Long-Form Mental Health Support
Yao, Xianrong, She, Dong, Zhang, Chenxu, Zhang, Yimeng, Sun, Yueru, Ahmed, Noman, Gao, Yang, Jin, Zhanpeng
Empathy is critical for effective mental health support, especially when addressing Long Counseling Texts (LCTs). However, existing Large Language Models (LLMs) often generate replies that are semantically fluent but lack the structured reasoning necessary for genuine psychological support, particularly in a Chinese context. To bridge this gap, we introduce Empathy-R1, a novel framework that integrates a Chain-of-Empathy (CoE) reasoning process with Reinforcement Learning (RL) to enhance response quality for LCTs. Inspired by cognitive-behavioral therapy, our CoE paradigm guides the model to sequentially reason about a help-seeker's emotions, causes, and intentions, making its thinking process both transparent and interpretable. Our framework is empowered by a new large-scale Chinese dataset, Empathy-QA, and a two-stage training process. First, Supervised Fine-Tuning instills the CoE's reasoning structure. Subsequently, RL, guided by a dedicated reward model, refines the therapeutic relevance and contextual appropriateness of the final responses. Experiments show that Empathy-R1 achieves strong performance on key automatic metrics. More importantly, human evaluations confirm its superiority, showing a clear preference over strong baselines and achieving a Win@1 rate of 44.30% on our new benchmark. By enabling interpretable and contextually nuanced responses, Empathy-R1 represents a significant advancement in developing responsible and genuinely beneficial AI for mental health support.
- Asia > China (0.42)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Denmark > Capital Region > Copenhagen (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Cognitive Science (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Chain-of-Experts: Unlocking the Communication Power of Mixture-of-Experts Models
Wang, Zihan, Pan, Rui, Yao, Jiarui, Csordas, Robert, Li, Linjie, Yin, Lu, Wu, Jiajun, Zhang, Tong, Li, Manling, Liu, Shiwei
We propose Chain-of-Experts (CoE), a new Mixture-of-Experts (MoE) architecture that introduces sequential expert communication within each layer. Unlike traditional MoE models, where experts operate independently in parallel, CoE processes tokens iteratively across a chain of experts inside a layer. To support dynamic expert selection across iterations, CoE employs a dedicated router at each iteration step within a layer. This design allows tokens to re-evaluate and select different experts during each iteration, rather than being statically assigned. As a result, CoE introduces a flexible routing mechanism that increases the diversity of expert combinations and enriches the model's representational capacity. CoE demonstrates improved performance under fixed compute: on math reasoning tasks, it reduces validation loss from 1.20 to 1.12 compared to a standard MoE. Beyond performance, CoE offers a new scaling axis: depth through expert iteration, which complements conventional width/depth scaling. For example, using 2x iterations matches the performance of 3x expert selections (in width), while reducing memory usage by 17.6-42% relative to other scaling strategies. Our analysis reveals that CoE's benefits stem from its iterative residual structure and enhanced expert specialization empowered by iterative routing, which together unlock more expressive representations. Code is available at https://github.com/ZihanWang314/coe.
- North America > United States > Illinois > Champaign County > Urbana (0.04)
- North America > Dominican Republic (0.04)
- North America > Canada > Ontario > Toronto (0.04)
- (2 more...)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Communications > Networks (0.68)
- Information Technology > Artificial Intelligence > Representation & Reasoning (0.67)
What External Knowledge is Preferred by LLMs? Characterizing and Exploring Chain of Evidence in Imperfect Context
Chang, Zhiyuan, Li, Mingyang, Jia, Xiaojun, Wang, Junjie, Huang, Yuekai, Wang, Qing, Huang, Yihao, Liu, Yang
Incorporating external knowledge into large language models (LLMs) has emerged as a promising approach to mitigate outdated knowledge and hallucination in LLMs. However, external knowledge is often imperfect. In addition to useful knowledge, external knowledge is rich in irrelevant or misinformation in the context that can impair the reliability of LLM responses. This paper focuses on LLMs' preferred external knowledge in imperfect contexts when handling multi-hop QA. Inspired by criminal procedural law's Chain of Evidence (CoE), we characterize that knowledge preferred by LLMs should maintain both relevance to the question and mutual support among knowledge pieces. Accordingly, we propose an automated CoE discrimination approach and explore LLMs' preferences from their effectiveness, faithfulness and robustness, as well as CoE's usability in a naive Retrieval-Augmented Generation (RAG) case. The evaluation on five LLMs reveals that CoE enhances LLMs through more accurate generation, stronger answer faithfulness, better robustness against knowledge conflict, and improved performance in a popular RAG case.
- North America > Canada > Ontario > Toronto (0.14)
- North America > United States (0.04)
- Asia > China > Beijing > Beijing (0.04)
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)
- Media (0.52)
- Leisure & Entertainment > Sports (0.46)
Composition of Experts: A Modular Compound AI System Leveraging Large Language Models
Jain, Swayambhoo, Raju, Ravi, Li, Bo, Csaki, Zoltan, Li, Jonathan, Liang, Kaizhao, Feng, Guoyao, Thakkar, Urmish, Sampat, Anand, Prabhakar, Raghu, Jairath, Sumati
Large Language Models (LLMs) have achieved remarkable advancements, but their monolithic nature presents challenges in terms of scalability, cost, and customization. This paper introduces the Composition of Experts (CoE), a modular compound AI system leveraging multiple expert LLMs. CoE leverages a router to dynamically select the most appropriate expert for a given input, enabling efficient utilization of resources and improved performance. We formulate the general problem of training a CoE and discuss inherent complexities associated with it. We propose a two-step routing approach to address these complexities that first uses a router to classify the input into distinct categories followed by a category-to-expert mapping to obtain desired experts. CoE offers a flexible and cost-effective solution to build compound AI systems. Our empirical evaluation demonstrates the effectiveness of CoE in achieving superior performance with reduced computational overhead. Given that CoE comprises of many expert LLMs it has unique system requirements for cost-effective serving. We present an efficient implementation of CoE leveraging SambaNova SN40L RDUs unique three-tiered memory architecture. CoEs obtained using open weight LLMs Qwen/Qwen2-7B-Instruct, google/gemma-2-9b-it, google/gemma-2-27b-it, meta-llama/Llama-3.1-70B-Instruct and Qwen/Qwen2-72B-Instruct achieve a score of $59.4$ with merely $31$ billion average active parameters on Arena-Hard and a score of $9.06$ with $54$ billion average active parameters on MT-Bench.
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Asia > Middle East > Jordan (0.04)
Revisiting Cascaded Ensembles for Efficient Inference
Kolawole, Steven, Dennis, Don, Talwalkar, Ameet, Smith, Virginia
A common approach to make machine learning inference more efficient is to use example-specific adaptive schemes, which route or select models for each example at inference time. In this work we study a simple scheme for adaptive inference. We build a cascade of ensembles (CoE), beginning with resource-efficient models and growing to larger, more expressive models, where ensemble agreement serves as a data-dependent routing criterion. This scheme is easy to incorporate into existing inference pipelines, requires no additional training, and can be used to place models across multiple resource tiers--for instance, serving efficient models at the edge and invoking larger models in the cloud only when necessary. In cases where parallel inference is feasible, we show that CoE can improve accuracy relative to the single best model while reducing the average cost of inference by up to 7x, and provides Pareto-dominate solutions in accuracy and efficiency relative to existing adaptive inference baselines. These savings translate to an over 3x-reduction in total monetary cost when performing inference using a heterogeneous cluster of GPUs. Finally, for edge inference scenarios where portions of the cascade reside at the edge vs. in the cloud, CoE can provide a 14x reduction in communication cost and inference latency without sacrificing accuracy.
- North America > United States > Virginia (0.04)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- Europe > Sweden > Stockholm > Stockholm (0.04)
- (2 more...)
Error Analysis of Three-Layer Neural Network Trained with PGD for Deep Ritz Method
Jiao, Yuling, Lai, Yanming, Wang, Yang
Machine learning is a rapidly advancing field with diverse applications across various domains. One prominent area of research is the utilization of deep learning techniques for solving partial differential equations(PDEs). In this work, we specifically focus on employing a three-layer tanh neural network within the framework of the deep Ritz method(DRM) to solve second-order elliptic equations with three different types of boundary conditions. We perform projected gradient descent(PDG) to train the three-layer network and we establish its global convergence. To the best of our knowledge, we are the first to provide a comprehensive error analysis of using overparameterized networks to solve PDE problems, as our analysis simultaneously includes estimates for approximation error, generalization error, and optimization error. We present error bound in terms of the sample size $n$ and our work provides guidance on how to set the network depth, width, step size, and number of iterations for the projected gradient descent algorithm. Importantly, our assumptions in this work are classical and we do not require any additional assumptions on the solution of the equation. This ensures the broad applicability and generality of our results.
- North America (0.14)
- Asia > China > Hubei Province > Wuhan (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > China > Hong Kong > Kowloon (0.04)
Second Thoughts are Best Learning to Re Align With Human Values from Text Edits Appendix
A.1 Detailed Re-alignment Task Formulation and Training Setup In Figure A1, we show the procedure for converting the data samples in the alignment datasets into training data of AEM (negative samples used in AIL are generated similarly). In DP-inferred chain-of-edits (CoEs), we use a few special tokens to mark the editing operations (with their position and content). Then our decipher module will translate these special tokens into natural language. As the final step, we add a special token [SEP] between Context + Source and the ground truth Chain-of-Edits (CoEs) and Target, as a boundary signal similar to the settings in text-to-text training. We also augment the data by using different sets of costs for the editing operations (as discussed in Section 3.2, and footnote 3).
- North America > United States > Oregon > Multnomah County > Portland (0.04)
- North America > United States > Michigan > Washtenaw County > Ann Arbor (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- (4 more...)
Caught trying to steal copper piping, he fled across freeway and was run down, police say
A 32-year-old man died Monday while he was attempting to cross a freeway in Orange County after apparently trying to steal copper piping from a local business. The crash occurred around 11:35 a.m. The man, identified as Alberto Huizar, 32, was pronounced dead at the scene, according to the Orange County Sheriff's Department. Huizar appeared to be homeless, authorities said. First responders arrived to find a woman pinned underneath a self-driving car with'multiple traumatic injuries.'
- Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
- Transportation > Ground > Road (0.47)
Conditionally Optimistic Exploration for Cooperative Deep Multi-Agent Reinforcement Learning
Zhao, Xutong, Pan, Yangchen, Xiao, Chenjun, Chandar, Sarath, Rajendran, Janarthanan
Efficient exploration is critical in cooperative deep Multi-Agent Reinforcement Learning (MARL). In this work, we propose an exploration method that effectively encourages cooperative exploration based on the idea of sequential action-computation scheme. The high-level intuition is that to perform optimism-based exploration, agents would explore cooperative strategies if each agent's optimism estimate captures a structured dependency relationship with other agents. Assuming agents compute actions following a sequential order at \textit{each environment timestep}, we provide a perspective to view MARL as tree search iterations by considering agents as nodes at different depths of the search tree. Inspired by the theoretically justified tree search algorithm UCT (Upper Confidence bounds applied to Trees), we develop a method called Conditionally Optimistic Exploration (COE). COE augments each agent's state-action value estimate with an action-conditioned optimistic bonus derived from the visitation count of the global state and joint actions of preceding agents. COE is performed during training and disabled at deployment, making it compatible with any value decomposition method for centralized training with decentralized execution. Experiments across various cooperative MARL benchmarks show that COE outperforms current state-of-the-art exploration methods on hard-exploration tasks.
- North America > Canada > Alberta (0.14)
- Asia > Middle East > Jordan (0.04)
- North America > United States > Iowa (0.04)
- (3 more...)
- Overview (1.00)
- Research Report > New Finding (0.46)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.46)