total time
Cluster-Based Client Selection for Dependent Multi-Task Federated Learning in Edge Computing
Luo, Jieping, Li, Qiyue, Liu, Zhizhang, Qi, Hang, Yin, Jiaying, Wu, Jingjin
We study the client selection problem in Federated Learning (FL) within mobile edge computing (MEC) environments, particularly under the dependent multi-task settings, to reduce the total time required to complete various learning tasks. We propose CoDa-FL, a Cluster-oriented and Dependency-aware framework designed to reduce the total required time via cluster-based client selection and dependent task assignment. Our approach considers Earth Mover's Distance (EMD) for client clustering based on their local data distributions to lower computational cost and improve communication efficiency. We derive a direct and explicit relationship between intra-cluster EMD and the number of training rounds required for convergence, thereby simplifying the otherwise complex process of obtaining the optimal solution. Additionally, we incorporate a directed acyclic graph-based task scheduling mechanism to effectively manage task dependencies. Through numerical experiments, we validate that our proposed CoDa-FL outperforms existing benchmarks by achieving faster convergence, lower communication and computational costs, and higher learning accuracy under heterogeneous MEC settings.
Thinking Sparks!: Emergent Attention Heads in Reasoning Models During Post Training
Park, Yein, Jeong, Minbyul, Kang, Jaewoo
The remarkable capabilities of modern large reasoning models are largely unlocked through post-training techniques such as supervised fine-tuning and reinforcement learning. However, the architectural mechanisms behind such improvements remain largely opaque. In this work, we use circuit analysis to demonstrate that post-training for complex reasoning sparks the emergence of novel, functionally specialized attention heads. These heads collectively support structured reasoning and computation. Our comparative analysis across Qwen families and DeepSeek-distilled model reveals that these emergent heads evolve differently under different training regimes. Distillation and SFT foster a cumulative addition of stable reasoning heads. In contrast, group relative policy optimization operates in a dynamic search mode: relatively few attention heads are iteratively activated, evaluated, and pruned, with their survival closely tracking fluctuations in the task reward signal. Furthermore, we find that controllable think on/off models do not possess dedicated thinking heads. Instead, turning off explicit reasoning triggers a broader-but less efficient-set of compensatory heads. Through ablation and qualitative analyses, we connect these circuit-level dynamics to a crucial performance trade-off: strengthened heads enable sophisticated problem-solving strategies for difficult problems but can also introduce over-thinking failure modes, such as calculation errors or logical loops on simpler tasks. These findings connect circuit-level dynamics to macro-level performance, identifying an inherent tension where complex reasoning comes at the cost of elementary computations. More broadly, our work points to future directions for training policy design, emphasizing the need to balance the development of effective reasoning strategies with the assurance of reliable, flawless execution.
- Europe > Austria > Vienna (0.14)
- North America > Canada (0.04)
- Europe > Italy > Tuscany > Florence (0.04)
- (2 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.89)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)
Logic Gate Neural Networks are Good for Verification
Kresse, Fabian, Yu, Emily, Lampert, Christoph H., Henzinger, Thomas A.
Learning-based systems are increasingly deployed across various domains, yet the complexity of traditional neural networks poses significant challenges for formal verification. Unlike conventional neural networks, learned Logic Gate Networks (LGNs) replace multiplications with Boolean logic gates, yielding a sparse, netlist-like architecture that is inherently more amenable to symbolic verification, while still delivering promising performance. In this paper, we introduce a SA T encoding for verifying global robustness and fairness in LGNs. We evaluate our method on five benchmark datasets, including a newly constructed 5-class variant, and find that LGNs are both verification-friendly and maintain strong predictive performance.
- Europe > Austria (0.04)
- North America > United States > California (0.04)
- Europe > France (0.04)
fea16e782bc1b1240e4b3c797012e289-AuthorFeedback.pdf
We thank all the Reviewers for their time and raising several interesting questions. Please see our responses below. Reviewer #1: We will try to reduce dependence on the Supplement. The name V ol in 3.3 refers to V olume, which for the ellipsoid We will add this definition. Reviewer #2: We will add a comment comparing the convergence rate of LSGD to other distributed methods.
A Proof of Lemma
Recall that the algorithm of Section 3.2 uses two black-boxes that are impractical and have hidden To implement this algorithm for edge-generation, we must be able to compute a bidirectional mapping between an input point and the grid cell that contains the point. Using this, we will show the following lemma. Proof: We begin by proving (i) and (ii) and then use it to prove the lemma. No weights are assigned in the HK algorithm. The LR has a preprocessing step that computes the maximum cardinality matching for each piece.
Decomposing the Entropy-Performance Exchange: The Missing Keys to Unlocking Effective Reinforcement Learning
Deng, Jia, Chen, Jie, Chen, Zhipeng, Zhao, Wayne Xin, Wen, Ji-Rong
Recently, reinforcement learning with verifiable rewards (RL VR) has been widely used for enhancing the reasoning abilities of large language models (LLMs). A core challenge in RL VR involves managing the exchange between entropy and performance of policies. Despite the importance of this exchange, a fine-grained understanding of when and how this exchange operates most effectively remains limited. To bridge this gap, we conduct a systematic empirical analysis of the entropy-performance exchange mechanism of RL VR across different levels of granularity. Specifically, we first divide the training process into two distinct stages based on entropy dynamics, i.e., rising stage and plateau stage, and then systematically investigate how this mechanism varies across stage-level, instance-level, and token-level granularitiess. Our analysis reveals that, in the rising stage, entropy reduction in negative samples facilitates the learning of effective reasoning patterns, which in turn drives rapid performance gains. Moreover, in the plateau stage, learning efficiency strongly correlates with high-entropy tokens present in low-perplexity samples and those located at the end of sequences. Motivated by these findings, we propose two methods that dynamically adjust the reward signal using perplexity and positional information to focus RL updates on tokens that exhibit high learning potential, achieving improvements compared to the baseline methods on various LLMs.
- North America > United States > Florida > Miami-Dade County > Miami (0.04)
- Europe > Netherlands > South Holland > Rotterdam (0.04)
- Asia > China (0.04)
Improved Wake-Up Time For Euclidean Freeze-Tag Problem
Alipour, Sharareh, Ahadi, Arash, Baghestani, Kajal
The Freeze-Tag Problem (FTP) involves activating a set of initially asleep robots as quickly as possible, starting from a single awake robot. Once activated, a robot can assist in waking up other robots. Each active robot moves at unit speed. The objective is to minimize the makespan, i.e., the time required to activate the last robot. A key performance measure is the wake-up ratio, defined as the maximum time needed to activate any number of robots in any primary positions. This work focuses on the geometric (Euclidean) version of FTP in $\mathbb{R}^d$ under the $\ell_p$ norm, where the initial distance between each asleep robot and the single active robot is at most 1. For $(\mathbb{R}^2, \ell_2)$, we improve the previous upper bound of 4.62 ([7], CCCG 2024) to 4.31. Note that it is known that 3.82 is a lower bound for the wake-up ratio. In $\mathbb{R}^3$, we propose a new strategy that achieves a wake-up ratio of 12 for $(\mathbb{R}^3, \ell_1)$ and 12.76 for $(\mathbb{R}^3, \ell_2)$, improving upon the previous bounds of 13 and $13\sqrt{3}$, respectively, reported in [2].
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- North America > United States > California > San Diego County > San Diego (0.04)
- (3 more...)
ProgCo: Program Helps Self-Correction of Large Language Models
Song, Xiaoshuai, Wu, Yanan, Wang, Weixun, Liu, Jiaheng, Su, Wenbo, Zheng, Bo
Self-Correction aims to enable large language models (LLMs) to self-verify and self-refine their initial responses without external feedback. However, LLMs often fail to effectively self-verify and generate correct feedback, further misleading refinement and leading to the failure of self-correction, especially in complex reasoning tasks. In this paper, we propose Program-driven Self-Correction (ProgCo). First, program-driven verification (ProgVe) achieves complex verification logic and extensive validation through self-generated, self-executing verification pseudo-programs. Then, program-driven refinement (ProgRe) receives feedback from ProgVe, conducts dual reflection and refinement on both responses and verification programs to mitigate misleading of incorrect feedback in complex reasoning tasks. Experiments on three instruction-following and mathematical benchmarks indicate that ProgCo achieves effective self-correction, and can be further enhance performance when combined with real program tools.
- Asia > Thailand > Bangkok > Bangkok (0.04)
- North America > Mexico > Mexico City > Mexico City (0.04)
- North America > United States > Florida > Miami-Dade County > Miami (0.04)
- Research Report (0.50)
- Workflow (0.46)
Imitate, Explore, and Self-Improve: A Reproduction Report on Slow-thinking Reasoning Systems
Min, Yingqian, Chen, Zhipeng, Jiang, Jinhao, Chen, Jie, Deng, Jia, Hu, Yiwen, Tang, Yiru, Wang, Jiapeng, Cheng, Xiaoxue, Song, Huatong, Zhao, Wayne Xin, Liu, Zheng, Wang, Zhongyuan, Wen, Ji-Rong
Recently, slow-thinking reasoning systems, such as o1, have demonstrated remarkable capabilities in solving complex reasoning tasks. These systems typically engage in an extended thinking process before responding to a query, allowing them to generate more thorough, accurate, and well-reasoned solutions. These systems are primarily developed and maintained by industry, with their core techniques not publicly disclosed. In response, an increasing number of studies from the research community aim to explore the technical foundations underlying these powerful reasoning systems. Building on these prior efforts, this paper presents a reproduction report on implementing o1-like reasoning systems. We introduce an ``imitate, explore, and self-improve'' framework, denoted as \textbf{STILL-2}, as our primary technical approach to train the reasoning model. In the initial phase, we use distilled long-form thought data to fine-tune the reasoning model, enabling it to invoke a slow-thinking mode. The model is then encouraged to explore challenging problems by generating multiple rollouts, which can result in increasingly more high-quality trajectories that lead to correct answers. Furthermore, the model undergoes self-improvement by iteratively refining its training dataset. To verify the effectiveness of this approach, we conduct extensive experiments on three challenging benchmarks. The experimental results demonstrate that our approach achieves competitive performance compared to industry-level reasoning systems on these benchmarks.
Revisiting Space Mission Planning: A Reinforcement Learning-Guided Approach for Multi-Debris Rendezvous
Bandyopadhyay, Agni, Waxenegger-Wilfing, Guenther
This research introduces a novel application of a masked Proximal Policy Optimization (PPO) algorithm from the field of deep reinforcement learning (RL), for determining the most efficient sequence of space debris visitation, utilizing the Lambert solver as per Izzo's adaptation for individual rendezvous. The aim is to optimize the sequence in which all the given debris should be visited to get the least total time for rendezvous for the entire mission. A neural network (NN) policy is developed, trained on simulated space missions with varying debris fields. After training, the neural network calculates approximately optimal paths using Izzo's adaptation of Lambert maneuvers. Performance is evaluated against standard heuristics in mission planning. The reinforcement learning approach demonstrates a significant improvement in planning efficiency by optimizing the sequence for debris rendezvous, reducing the total mission time by an average of approximately {10.96\%} and {13.66\%} compared to the Genetic and Greedy algorithms, respectively. The model on average identifies the most time-efficient sequence for debris visitation across various simulated scenarios with the fastest computational speed. This approach signifies a step forward in enhancing mission planning strategies for space debris clearance.
- Europe > Germany > Bavaria > Lower Franconia > Würzburg (0.05)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- North America > United States > District of Columbia > Washington (0.04)
- Europe > Germany > Hesse > Darmstadt Region > Darmstadt (0.04)