AITopics | Xiao, Tesi

Collaborating Authors

Xiao, Tesi

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

HyperDPO: Conditioned One-Shot Multi-Objective Fine-Tuning Framework

Ren, Yinuo, Xiao, Tesi, Shavlovsky, Michael, Ying, Lexing, Rahmanian, Holakou

arXiv.org Artificial IntelligenceDec-3-2024

In LLM alignment and many other ML applications, one often faces the Multi-Objective Fine-Tuning (MOFT) problem, i.e. fine-tuning an existing model with datasets labeled w.r.t. different objectives simultaneously. To address the challenge, we propose the HyperDPO framework, a conditioned one-shot fine-tuning approach that extends the Direct Preference Optimization (DPO) technique, originally developed for efficient LLM alignment with preference data, to accommodate the MOFT settings. By substituting the Bradley-Terry-Luce model in DPO with the Plackett-Luce model, our framework is capable of handling a wide range of MOFT tasks that involve listwise ranking datasets. Compared with previous approaches, HyperDPO enjoys an efficient one-shot training process for profiling the Pareto front of auxiliary objectives, and offers post-training control over trade-offs. Additionally, we propose a novel Hyper Prompt Tuning design, that conveys continuous importance weight across objectives to transformer-based models without altering their architecture, and investigate the potential of temperature-conditioned networks for enhancing the flexibility of post-training control. We demonstrate the effectiveness and efficiency of the HyperDPO framework through its applications to various tasks, including Learning-to-Rank (LTR) and LLM alignment, highlighting its viability for large-scale ML deployments.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2410.08316

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

A Sinkhorn-type Algorithm for Constrained Optimal Transport

Tang, Xun, Rahmanian, Holakou, Shavlovsky, Michael, Thekumparampil, Kiran Koshy, Xiao, Tesi, Ying, Lexing

arXiv.org Artificial IntelligenceMar-8-2024

Entropic optimal transport (OT) and the Sinkhorn algorithm have made it practical for machine learning practitioners to perform the fundamental task of calculating transport distance between statistical distributions. In this work, we focus on a general class of OT problems under a combination of equality and inequality constraints. We derive the corresponding entropy regularization formulation and introduce a Sinkhorn-type algorithm for such constrained OT problems supported by theoretical guarantees. We first bound the approximation error when solving the problem through entropic regularization, which reduces exponentially with the increase of the regularization parameter. Furthermore, we prove a sublinear first-order convergence rate of the proposed Sinkhorn-type algorithm in the dual space by characterizing the optimization procedure with a Lyapunov function. To achieve fast and higher-order convergence under weak entropy regularization, we augment the Sinkhorn-type algorithm with dynamic regularization scheduling and second-order acceleration. Overall, this work systematically combines recent theoretical and numerical advances in entropic optimal transport with the constrained case, allowing practitioners to derive approximate transport plans in complex scenarios.

algorithm, artificial intelligence, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2403.05054

Country:

North America > United States (0.14)
Europe > North Macedonia (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Accelerating Sinkhorn Algorithm with Sparse Newton Iterations

Tang, Xun, Shavlovsky, Michael, Rahmanian, Holakou, Tardini, Elisa, Thekumparampil, Kiran Koshy, Xiao, Tesi, Ying, Lexing

arXiv.org Artificial IntelligenceJan-20-2024

Computing the optimal transport distance between statistical distributions is a fundamental task in machine learning. One remarkable recent advancement is entropic regularization and the Sinkhorn algorithm, which utilizes only matrix scaling and guarantees an approximated solution with near-linear runtime. Despite the success of the Sinkhorn algorithm, its runtime may still be slow due to the potentially large number of iterations needed for convergence. To achieve possibly super-exponential convergence, we present Sinkhorn-Newton-Sparse (SNS), an extension to the Sinkhorn algorithm, by introducing early stopping for the matrix scaling steps and a second stage featuring a Newton-type subroutine. Adopting the variational viewpoint that the Sinkhorn algorithm maximizes a concave Lyapunov potential, we offer the insight that the Hessian matrix of the potential function is approximately sparse. Sparsification of the Hessian results in a fast $O(n^2)$ per-iteration complexity, the same as the Sinkhorn algorithm. In terms of total iteration count, we observe that the SNS algorithm converges orders of magnitude faster across a wide range of practical cases, including optimal transportation between empirical distributions and calculating the Wasserstein $W_1, W_2$ distance of discretized densities. The empirical performance is corroborated by a rigorous bound on the approximate sparsity of the Hessian matrix.

algorithm, artificial intelligence, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2401.12253

Country:

North America > United States (0.14)
Europe > North Macedonia (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)

Add feedback

Multi-Objective Optimization via Wasserstein-Fisher-Rao Gradient Flow

Ren, Yinuo, Xiao, Tesi, Gangwani, Tanmay, Rangi, Anshuka, Rahmanian, Holakou, Ying, Lexing, Sanyal, Subhajit

arXiv.org Machine LearningNov-21-2023

Multi-objective optimization (MOO) aims to optimize multiple, possibly conflicting objectives with widespread applications. We introduce a novel interacting particle method for MOO inspired by molecular dynamics simulations. Our approach combines overdamped Langevin and birth-death dynamics, incorporating a "dominance potential" to steer particles toward global Pareto optimality. In contrast to previous methods, our method is able to relocate dominated particles, making it particularly adept at managing Pareto fronts of complicated geometries. Our method is also theoretically grounded as a Wasserstein-Fisher-Rao gradient flow with convergence guarantees. Extensive experiments confirm that our approach outperforms state-of-the-art methods on challenging synthetic and real-world datasets.

artificial intelligence, machine learning, pareto front, (15 more...)

arXiv.org Machine Learning

2311.13159

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

A One-Sample Decentralized Proximal Algorithm for Non-Convex Stochastic Composite Optimization

Xiao, Tesi, Chen, Xuxing, Balasubramanian, Krishnakumar, Ghadimi, Saeed

arXiv.org Artificial IntelligenceJun-22-2023

We focus on decentralized stochastic non-convex optimization, where $n$ agents work together to optimize a composite objective function which is a sum of a smooth term and a non-smooth convex term. To solve this problem, we propose two single-time scale algorithms: Prox-DASA and Prox-DASA-GT. These algorithms can find $\epsilon$-stationary points in $\mathcal{O}(n^{-1}\epsilon^{-2})$ iterations using constant batch sizes (i.e., $\mathcal{O}(1)$). Unlike prior work, our algorithms achieve comparable complexity without requiring large batch sizes, more complex per-iteration operations (such as double loops), or stronger assumptions. Our theoretical findings are supported by extensive numerical experiments, which demonstrate the superiority of our algorithms over previous approaches. Our code is available at https://github.com/xuxingc/ProxDASA.

algorithm, artificial intelligence, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2302.09766

Genre: Research Report (0.50)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.67)
(2 more...)

Add feedback

Optimal Algorithms for Stochastic Bilevel Optimization under Relaxed Smoothness Conditions

Chen, Xuxing, Xiao, Tesi, Balasubramanian, Krishnakumar

arXiv.org Artificial IntelligenceJun-21-2023

Stochastic Bilevel optimization usually involves minimizing an upper-level (UL) function that is dependent on the arg-min of a strongly-convex lower-level (LL) function. Several algorithms utilize Neumann series to approximate certain matrix inverses involved in estimating the implicit gradient of the UL function (hypergradient). The state-of-the-art StOchastic Bilevel Algorithm (SOBA) [16] instead uses stochastic gradient descent steps to solve the linear system associated with the explicit matrix inversion. This modification enables SOBA to match the lower bound of sample complexity for the single-level counterpart in non-convex settings. Unfortunately, the current analysis of SOBA relies on the assumption of higher-order smoothness for the UL and LL functions to achieve optimality. In this paper, we introduce a novel fully single-loop and Hessian-inversion-free algorithmic framework for stochastic bilevel optimization and present a tighter analysis under standard smoothness assumptions (first-order Lipschitzness of the UL function and second-order Lipschitzness of the LL function). Furthermore, we show that by a slight modification of our approach, our algorithm can handle a more general multi-objective robust bilevel optimization problem. For this case, we obtain the state-of-the-art oracle complexity results demonstrating the generality of both the proposed algorithmic and analytic frameworks. Numerical experiments demonstrate the performance gain of the proposed algorithms over existing ones.

artificial intelligence, machine learning, null null 2, (16 more...)

arXiv.org Artificial Intelligence

2306.12067

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.68)

Add feedback

Statistical Inference for Polyak-Ruppert Averaged Zeroth-order Stochastic Gradient Algorithm

Jin, Yanhao, Xiao, Tesi, Balasubramanian, Krishnakumar

arXiv.org Machine LearningFeb-11-2021

As machine learning models are deployed in critical applications, it becomes important to not just provide point estimators of the model parameters (or subsequent predictions), but also quantify the uncertainty associated with estimating the model parameters via confidence sets. In the last decade, estimating or training in several machine learning models has become synonymous with running stochastic gradient algorithms. However, computing the stochastic gradients in several settings is highly expensive or even impossible at times. An important question which has thus far not been addressed sufficiently in the statistical machine learning literature is that of equipping zeroth-order stochastic gradient algorithms with practical yet rigorous inferential capabilities. Towards this, in this work, we first establish a central limit theorem for Polyak-Ruppert averaged stochastic gradient algorithm in the zeroth-order setting. We then provide online estimators of the asymptotic covariance matrix appearing in the central limit theorem, thereby providing a practical procedure for constructing asymptotically valid confidence sets (or intervals) for parameter estimation (or prediction) in the zeroth-order setting.

algorithm, artificial intelligence, survey article, (15 more...)

arXiv.org Machine Learning

2102.05198

Country: North America > United States > California (0.14)

Genre: Research Report (1.00)

Industry: Education (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Add feedback

Improved Complexities for Stochastic Conditional Gradient Methods under Interpolation-like Conditions

Xiao, Tesi, Balasubramanian, Krishnakumar, Ghadimi, Saeed

arXiv.org Machine LearningJun-15-2020

In a machine learning setup, the function F could be interpreted as the loss function associated with a sample ξ and the function f could represent the risk, which is defined as the expected loss. Such constrained stochastic optimization problems arise frequently in statistical machine learning applications. Conditional gradient algorithm, also called as Frank-Wolfe algorithm, is an efficient method for solving constrained optimization problems of the form in (1) due to their projection-free nature [Jag13, HJN15, FGM17, LPZZ17, BZK18, RDLS18]. In each step of the conditional gradient method, it is only required to minimize a linear objective over the set Ω. This operation could be implemented efficiently for a variety of sets arising in statistical machine learning, compared to the operation of projecting on to the set Ω, which is required for example by the projected gradient method. Hence, conditional gradient method has regained popularity in the last decade in the optimization and machine learning community.

artificial intelligence, conditional gradient method, optimization problem, (14 more...)

arXiv.org Machine Learning

2006.08167

Country: North America > United States > California (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.46)

Add feedback

Neural SDE: Stabilizing Neural ODE Networks with Stochastic Noise

Liu, Xuanqing, Xiao, Tesi, Si, Si, Cao, Qin, Kumar, Sanjiv, Hsieh, Cho-Jui

arXiv.org Artificial IntelligenceJun-5-2019

Neural Ordinary Differential Equation (Neural ODE) has been proposed as a continuous approximation to the ResNet architecture. Some commonly used regularization mechanisms in discrete neural networks (e.g. dropout, Gaussian noise) are missing in current Neural ODE networks. In this paper, we propose a new continuous neural network framework called Neural Stochastic Differential Equation (Neural SDE) network, which naturally incorporates various commonly used regularization mechanisms based on random noise injection. Our framework can model various types of noise injection frequently used in discrete networks for regularization purpose, such as dropout and additive/multiplicative noise in each block. We provide theoretical analysis explaining the improved robustness of Neural SDE models against input perturbations/adversarial attacks. Furthermore, we demonstrate that the Neural SDE network can achieve better generalization than the Neural ODE and is more resistant to adversarial and non-adversarial input perturbations.

deep learning, neural network, noise, (15 more...)

arXiv.org Artificial Intelligence

1906.02355

Genre: Research Report (0.64)

Industry: Information Technology > Security & Privacy (0.35)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback