Goto

Collaborating Authors

 Chen, Xiaokai


DCatalyst: A Unified Accelerated Framework for Decentralized Optimization

arXiv.org Artificial Intelligence

We study decentralized optimization over a network of agents, modeled as graphs, with no central server. The goal is to minimize $f+r$, where $f$ represents a (strongly) convex function averaging the local agents' losses, and $r$ is a convex, extended-value function. We introduce DCatalyst, a unified black-box framework that integrates Nesterov acceleration into decentralized optimization algorithms. %, enhancing their performance. At its core, DCatalyst operates as an \textit{inexact}, \textit{momentum-accelerated} proximal method (forming the outer loop) that seamlessly incorporates any selected decentralized algorithm (as the inner loop). We demonstrate that DCatalyst achieves optimal communication and computational complexity (up to log-factors) across various decentralized algorithms and problem instances. Notably, it extends acceleration capabilities to problem classes previously lacking accelerated solution methods, thereby broadening the effectiveness of decentralized methods. On the technical side, our framework introduce the {\it inexact estimating sequences}--a novel extension of the well-known Nesterov's estimating sequences, tailored for the minimization of composite losses in decentralized settings. This method adeptly handles consensus errors and inexact solutions of agents' subproblems, challenges not addressed by existing models.


Multi-Party Supervised Fine-tuning of Language Models for Multi-Party Dialogue Generation

arXiv.org Artificial Intelligence

Large Language Models (LLM) are usually fine-tuned to participate in dyadic or two-party dialogues, which can not adapt well to multi-party dialogues (MPD), which hinders their applications in such scenarios including multi-personal meetings, discussions and daily communication. Previous LLM-based researches mainly focus on the multi-agent framework, while their base LLMs are still pairwisely fine-tuned. In this work, we design a multi-party fine-tuning framework (MuPaS) for LLMs on the multi-party dialogue datasets, and prove such a straightforward framework can let the LLM align with the multi-party conversation style efficiently and effectively. We also design two training strategies which can convert MuPaS into the MPD simulator. Substantial experiments show that MuPaS can achieve state-of-the-art multi-party response, higher accuracy of the-next-speaker prediction, higher human and automatic evaluated utterance qualities, and can even generate reasonably with out-of-distribution scene, topic and role descriptions. The MuPaS framework bridges the LLM training with more complicated multi-party applications, such as conversation generation, virtual rehearsal or meta-universe.


Enhancing Convergence of Decentralized Gradient Tracking under the KL Property

arXiv.org Machine Learning

We study decentralized multiagent optimization over networks, modeled as undirected graphs. The optimization problem consists of minimizing a nonconvex smooth function plus a convex extended-value function, which enforces constraints or extra structure on the solution (e.g., sparsity, low-rank). We further assume that the objective function satisfies the Kurdyka-{\L}ojasiewicz (KL) property, with given exponent $\theta\in [0,1)$. The KL property is satisfied by several (nonconvex) functions of practical interest, e.g., arising from machine learning applications; in the centralized setting, it permits to achieve strong convergence guarantees. Here we establish convergence of the same type for the notorious decentralized gradient-tracking-based algorithm SONATA. Specifically, $\textbf{(i)}$ when $\theta\in (0,1/2]$, the sequence generated by SONATA converges to a stationary solution of the problem at R-linear rate;$ \textbf{(ii)} $when $\theta\in (1/2,1)$, sublinear rate is certified; and finally $\textbf{(iii)}$ when $\theta=0$, the iterates will either converge in a finite number of steps or converges at R-linear rate. This matches the convergence behavior of centralized proximal-gradient algorithms except when $\theta=0$. Numerical results validate our theoretical findings.


Boosting share routing for multi-task learning

arXiv.org Machine Learning

Multi-task learning (MTL) aims to make full use of the knowledge contained in multi-task supervision signals to improve the overall performance. How to make the knowledge of multiple tasks shared appropriately is an open problem for MTL. Most existing deep MTL models are based on parameter sharing. However, suitable sharing mechanism is hard to design as the relationship among tasks is complicated. In this paper, we propose a general framework called Multi-Task Neural Architecture Search (MTNAS) to efficiently find a suitable sharing route for a given MTL problem. MTNAS modularizes the sharing part into multiple layers of sub-networks. It allows sparse connection among these sub-networks and soft sharing based on gating is enabled for a certain route. Benefiting from such setting, each candidate architecture in our search space defines a dynamic sparse sharing route which is more flexible compared with full-sharing in previous approaches. We show that existing typical sharing approaches are sub-graphs in our search space. Extensive experiments on three real-world recommendation datasets demonstrate MTANS achieves consistent improvement compared with single-task models and typical multi-task methods while maintaining high computation efficiency. Furthermore, in-depth experiments demonstrates that MTNAS can learn suitable sparse route to mitigate negative transfer.