Goto

Collaborating Authors

 lange




Convex Clustering

Chi, Eric C., Molstad, Aaron J., Gao, Zheming

arXiv.org Machine Learning

This survey reviews a clustering method based on solving a convex optimization problem. Despite the plethora of existing clustering methods, convex clustering has several uncommon features that distinguish it from prior art. The optimization problem is free of spurious local minima, and its unique global minimizer is stable with respect to all its inputs, including the data, a tuning parameter, and weight hyperparameters. Its single tuning parameter controls the number of clusters and can be chosen using standard techniques from penalized regression. We give intuition into the behavior and theory for convex clustering as well as practical guidance. We highlight important algorithms and give insight into how their computational costs scale with the problem size. Finally, we highlight the breadth of its uses and flexibility to be combined and integrated with other inferential methods.


Stein Variational Evolution Strategies

Braun, Cornelius V., Lange, Robert T., Toussaint, Marc

arXiv.org Artificial Intelligence

Stein Variational Gradient Descent (SVGD) is a highly efficient method to sample from an unnormalized probability distribution. However, the SVGD update relies on gradients of the log-density, which may not always be available. Existing gradient-free versions of SVGD make use of simple Monte Carlo approximations or gradients from surrogate distributions, both with limitations. To improve gradient-free Stein variational inference, we combine SVGD steps with evolution strategy (ES) updates. Our results demonstrate that the resulting algorithm generates high-quality samples from unnormalized target densities without requiring gradient information. Compared to prior gradient-free SVGD methods, we find that the integration of the ES update in SVGD significantly improves the performance on multiple challenging benchmark problems.


Optimal lower bounds for logistic log-likelihoods

Anceschi, Niccolò, Rigon, Tommaso, Zanella, Giacomo, Durante, Daniele

arXiv.org Machine Learning

The logit transform is arguably the most widely-employed link function beyond linear settings. This transformation routinely appears in regression models for binary data and provides, either explicitly or implicitly, a core building-block within state-of-the-art methodologies for both classification and regression. Its widespread use, combined with the lack of analytical solutions for the optimization of general losses involving the logit transform, still motivates active research in computational statistics. Among the directions explored, a central one has focused on the design of tangent lower bounds for logistic log-likelihoods that can be tractably optimized, while providing a tight approximation of these log-likelihoods. Although progress along these lines has led to the development of effective minorize-maximize (MM) algorithms for point estimation and coordinate ascent variational inference schemes for approximate Bayesian inference under several logit models, the overarching focus in the literature has been on tangent quadratic minorizers. In fact, it is still unclear whether tangent lower bounds sharper than quadratic ones can be derived without undermining the tractability of the resulting minorizer. This article addresses such a challenging question through the design and study of a novel piece-wise quadratic lower bound that uniformly improves any tangent quadratic minorizer, including the sharpest ones, while admitting a direct interpretation in terms of the classical generalized lasso problem. As illustrated in a ridge logistic regression, this unique connection facilitates more effective implementations than those provided by available piece-wise bounds, while improving the convergence speed of quadratic ones.


M\'elange: Cost Efficient Large Language Model Serving by Exploiting GPU Heterogeneity

Griggs, Tyler, Liu, Xiaoxuan, Yu, Jiaxiang, Kim, Doyoung, Chiang, Wei-Lin, Cheung, Alvin, Stoica, Ion

arXiv.org Artificial Intelligence

Large language models (LLMs) are increasingly integrated into many online services, yet they remain cost-prohibitive to deploy due to the requirement of expensive GPU instances. Prior work has addressed the high cost of LLM serving by improving the inference engine, but less attention has been given to selecting the most cost-efficient GPU type(s) for a specific LLM service. There is a large and growing landscape of GPU types and, within these options, higher cost does not always lead to increased performance. Instead, through a comprehensive investigation, we find that three key LLM service characteristics (request size, request rate, SLO) strongly influence GPU cost efficiency, and differing GPU types are most cost efficient for differing LLM service settings. As a result, the most cost-efficient allocation for a given service is typically a mix of heterogeneous GPU types. Based on this analysis, we introduce M\'elange, a GPU allocation framework that navigates these diverse LLM service characteristics and heterogeneous GPU option space to automatically and efficiently derive the minimal-cost GPU allocation for a given LLM service. We formulate the GPU allocation task as a cost-aware bin packing problem where GPUs are bins and items are slices of the service workload. Our formulation's constraints account for a service's unique characteristics, allowing M\'elange to be flexible to support diverse service settings and heterogeneity-aware to adapt the GPU allocation to a specific service. Compared to using only a single GPU type, M\'elange reduces deployment costs by up to 77% in conversational settings, 33% in document-based settings, and 51% in a mixed setting.


A Generic Approach for Identification of Event Related Brain Potentials via a Competitive Neural Network Structure

Neural Information Processing Systems

We present a novel generic approach to the problem of Event Related Potential identification and classification, based on a competitive N eu(cid:173) ral Net architecture. The network weights converge to the embedded signal patterns, resulting in the formation of a matched filter bank. The network performance is analyzed via a simulation study, exploring identification robustness under low SNR conditions and compared to the expected performance from an information theoretic perspective. The classifier is applied to real event-related potential data recorded during a classic odd-ball type paradigm; for the first time, within(cid:173) session variable signal patterns are automatically identified, dismiss(cid:173) ing the strong and limiting requirement of a-priori stimulus-related selective grouping of the recorded data.


Why AI and machine learning are drifting away from the cloud

#artificialintelligence

A quick-service restaurant chain is running its AI models on machines inside its stores to localize delivery logistics. At the same time, a global pharma company is training its machine learning models on premises, using servers it manages by itself. Cloud computing isn't going anywhere, but some companies that use machine learning models and the tech vendors supplying the platforms to manage them say machine learning is having an on-premises moment. For many years, cloud providers have argued that the computing requirements for machine learning would be far too expensive and cumbersome to start up on their own, but the field is maturing. "We still have a ton of customers who want to go on a cloud migration, but we're definitely now seeing -- at least in the past year or so -- a lot more customers who want to repatriate workloads back onto on-premise because of cost," said Thomas Robinson, vice president of strategic partnerships and corporate development at MLOps platform company Domino Data Lab.


Bregman Power k-Means for Clustering Exponential Family Data

Vellal, Adithya, Chakraborty, Saptarshi, Xu, Jason

arXiv.org Machine Learning

Recent progress in center-based clustering algorithms combats poor local minima by implicit annealing, using a family of generalized means. These methods are variations of Lloyd's celebrated $k$-means algorithm, and are most appropriate for spherical clusters such as those arising from Gaussian data. In this paper, we bridge these algorithmic advances to classical work on hard clustering under Bregman divergences, which enjoy a bijection to exponential family distributions and are thus well-suited for clustering objects arising from a breadth of data generating mechanisms. The elegant properties of Bregman divergences allow us to maintain closed form updates in a simple and transparent algorithm, and moreover lead to new theoretical arguments for establishing finite sample bounds that relax the bounded support assumption made in the existing state of the art. Additionally, we consider thorough empirical analyses on simulated experiments and a case study on rainfall data, finding that the proposed method outperforms existing peer methods in a variety of non-Gaussian data settings.


How the metaverse will let you simulate everything

#artificialintelligence

This article is part of a VB special issue. Read the full series here: The metaverse - How close are we? Defining the "metaverse" is a difficult task, but one commonly accepted definition is a digital space populated by representations of people, places, and things. Through a combination of technologies including virtual reality (VR), augmented reality (AR), and AI, the metaverse that some futurists envision is an extension of the real world -- albeit without the physical trappings. Companies like Rockstar and Roblox have pitched the metaverse as the ideal platform for gaming, but there's no limit to the potential applications in the enterprise.