Goto

Collaborating Authors

 Optimization


Sparks of Explainability: Recent Advancements in Explaining Large Vision Models

arXiv.org Artificial Intelligence

This thesis explores advanced approaches to improve explainability in computer vision by analyzing and modeling the features exploited by deep neural networks. Initially, it evaluates attribution methods, notably saliency maps, by introducing a metric based on algorithmic stability and an approach utilizing Sobol indices, which, through quasi-Monte Carlo sequences, allows a significant reduction in computation time. In addition, the EVA method offers a first formulation of attribution with formal guarantees via verified perturbation analysis. Experimental results indicate that in complex scenarios these methods do not provide sufficient understanding, particularly because they identify only "where" the model focuses without clarifying "what" it perceives. Two hypotheses are therefore examined: aligning models with human reasoning -- through the introduction of a training routine that integrates the imitation of human explanations and optimization within the space of 1-Lipschitz functions -- and adopting a conceptual explainability approach. The CRAFT method is proposed to automate the extraction of the concepts used by the model and to assess their importance, complemented by MACO, which enables their visualization. These works converge towards a unified framework, illustrated by an interactive demonstration applied to the 1000 ImageNet classes in a ResNet model.


Selective Response Strategies for GenAI

arXiv.org Artificial Intelligence

The rise of Generative AI (GenAI) has significantly impacted human-based forums like Stack Overflow, which are essential for generating high-quality data. This creates a negative feedback loop, hindering the development of GenAI systems, which rely on such data to provide accurate responses. In this paper, we provide a possible remedy: A novel strategy we call selective response. Selective response implies that GenAI could strategically provide inaccurate (or conservative) responses to queries involving emerging topics and novel technologies, thereby driving users to use human-based forums like Stack Overflow. We show that selective response can potentially have a compounding effect on the data generation process, increasing both GenAI's revenue and user welfare in the long term. From an algorithmic perspective, we propose an approximately optimal approach to maximize GenAI's revenue under social welfare constraints. From a regulatory perspective, we derive sufficient and necessary conditions for selective response to improve welfare improvements.


HASSLE-free: A unified Framework for Sparse plus Low-Rank Matrix Decomposition for LLMs

arXiv.org Machine Learning

The impressive capabilities of large foundation models come at a cost of substantial computing resources to serve them. Compressing these pre-trained models is of practical interest as it can democratize deploying them to the machine learning community at large by lowering the costs associated with inference. A promising compression scheme is to decompose foundation models' dense weights into a sum of sparse plus low-rank matrices. In this paper, we design a unified framework coined HASSLE-free for (semi-structured) sparse plus low-rank matrix decomposition of foundation models. Our framework introduces the local layer-wise reconstruction error objective for this decomposition, we demonstrate that prior work solves a relaxation of this optimization problem; and we provide efficient and scalable methods to minimize the exact introduced optimization problem. HASSLE-free substantially outperforms state-of-the-art methods in terms of the introduced objective and a wide range of LLM evaluation benchmarks. For the Llama3-8B model with a 2:4 sparsity component plus a 64-rank component decomposition, a compression scheme for which recent work shows important inference acceleration on GPUs, HASSLE-free reduces the test perplexity by 12% for the WikiText-2 dataset and reduces the gap (compared to the dense model) of the average of eight popular zero-shot tasks by 15% compared to existing methods.


Universal Post-Processing Networks for Joint Optimization of Modules in Task-Oriented Dialogue Systems

arXiv.org Artificial Intelligence

Post-processing networks (PPNs) are components that modify the outputs of arbitrary modules in task-oriented dialogue systems and are optimized using reinforcement learning (RL) to improve the overall task completion capability of the system. However, previous PPN-based approaches have been limited to handling only a subset of modules within a system, which poses a significant limitation in improving the system performance. In this study, we propose a joint optimization method for post-processing the outputs of all modules using universal post-processing networks (UniPPNs), which are language-model-based networks that can modify the outputs of arbitrary modules in a system as a sequence-transformation task. Moreover, our RL algorithm, which employs a module-level Markov decision process, enables fine-grained value and advantage estimation for each module, thereby stabilizing joint learning for post-processing the outputs of all modules. Through both simulation-based and human evaluation experiments using the MultiWOZ dataset, we demonstrated that UniPPN outperforms conventional PPNs in the task completion capability of task-oriented dialogue systems.


Orlicz-Sobolev Transport for Unbalanced Measures on a Graph

arXiv.org Machine Learning

Moving beyond $L^p$ geometric structure, Orlicz-Wasserstein (OW) leverages a specific class of convex functions for Orlicz geometric structure. While OW remarkably helps to advance certain machine learning approaches, it has a high computational complexity due to its two-level optimization formula. Recently, Le et al. (2024) exploits graph structure to propose generalized Sobolev transport (GST), i.e., a scalable variant for OW. However, GST assumes that input measures have the same mass. Unlike optimal transport (OT), it is nontrivial to incorporate a mass constraint to extend GST for measures on a graph, possibly having different total mass. In this work, we propose to take a step back by considering the entropy partial transport (EPT) for nonnegative measures on a graph. By leveraging Caffarelli & McCann (2010)'s observations, EPT can be reformulated as a standard complete OT between two corresponding balanced measures. Consequently, we develop a novel EPT with Orlicz geometric structure, namely Orlicz-EPT, for unbalanced measures on a graph. Especially, by exploiting the dual EPT formulation and geometric structures of the graph-based Orlicz-Sobolev space, we derive a novel regularization to propose Orlicz-Sobolev transport (OST). The resulting distance can be efficiently computed by simply solving a univariate optimization problem, unlike the high-computational two-level optimization problem for Orlicz-EPT. Additionally, we derive geometric structures for the OST and draw its relations to other transport distances. We empirically show that OST is several-order faster than Orlicz-EPT. We further illustrate preliminary evidences on the advantages of OST for document classification, and several tasks in topological data analysis.


ATA: Adaptive Task Allocation for Efficient Resource Management in Distributed Machine Learning

arXiv.org Machine Learning

Asynchronous methods are fundamental for parallelizing computations in distributed machine learning. They aim to accelerate training by fully utilizing all available resources. However, their greedy approach can lead to inefficiencies using more computation than required, especially when computation times vary across devices. If the computation times were known in advance, training could be fast and resource-efficient by assigning more tasks to faster workers. The challenge lies in achieving this optimal allocation without prior knowledge of the computation time distributions. In this paper, we propose ATA (Adaptive Task Allocation), a method that adapts to heterogeneous and random distributions of worker computation times. Through rigorous theoretical analysis, we show that ATA identifies the optimal task allocation and performs comparably to methods with prior knowledge of computation times. Experimental results further demonstrate that ATA is resource-efficient, significantly reducing costs compared to the greedy approach, which can be arbitrarily expensive depending on the number of workers.


OpenLiDARMap: Zero-Drift Point Cloud Mapping using Map Priors

arXiv.org Artificial Intelligence

Accurate localization is a critical component of mobile autonomous systems, especially in Global Navigation Satellite Systems (GNSS)-denied environments where traditional methods fail. In such scenarios, environmental sensing is essential for reliable operation. However, approaches such as LiDAR odometry and Simultaneous Localization and Mapping (SLAM) suffer from drift over long distances, especially in the absence of loop closures. Map-based localization offers a robust alternative, but the challenge lies in creating and georeferencing maps without GNSS support. To address this issue, we propose a method for creating georeferenced maps without GNSS by using publicly available data, such as building footprints and surface models derived from sparse aerial scans. Our approach integrates these data with onboard LiDAR scans to produce dense, accurate, georeferenced 3D point cloud maps. By combining an Iterative Closest Point (ICP) scan-to-scan and scan-to-map matching strategy, we achieve high local consistency without suffering from long-term drift. Thus, we eliminate the reliance on GNSS for the creation of georeferenced maps. The results demonstrate that LiDAR-only mapping can produce accurate georeferenced point cloud maps when augmented with existing map priors.


Addressing Delayed Feedback in Conversion Rate Prediction via Influence Functions

arXiv.org Artificial Intelligence

In the realm of online digital advertising, conversion rate (CVR) prediction plays a pivotal role in maximizing revenue under cost-per-conversion (CPA) models, where advertisers are charged only when users complete specific actions, such as making a purchase. A major challenge in CVR prediction lies in the delayed feedback problem-conversions may occur hours or even weeks after initial user interactions. This delay complicates model training, as recent data may be incomplete, leading to biases and diminished performance. Although existing methods attempt to address this issue, they often fall short in adapting to evolving user behaviors and depend on auxiliary models, which introduces computational inefficiencies and the risk of model inconsistency. In this work, we propose an Influence Function-empowered framework for Delayed Feedback Modeling (IF-DFM). IF-DFM leverages influence functions to estimate how newly acquired and delayed conversion data impact model parameters, enabling efficient parameter updates without the need for full retraining. Additionally, we present a scalable algorithm that efficiently computes parameter updates by reframing the inverse Hessian-vector product as an optimization problem, striking a balance between computational efficiency and effectiveness. Extensive experiments on benchmark datasets demonstrate that IF-DFM consistently surpasses state-of-the-art methods, significantly enhancing both prediction accuracy and model adaptability.


Distributed Primal-Dual Algorithms: Unification, Connections, and Insights

arXiv.org Machine Learning

We study primal-dual algorithms for general empirical risk minimization problems in distributed settings, focusing on two prominent classes of algorithms. The first class is the communication-efficient distributed dual coordinate ascent (CoCoA), derived from the coordinate ascent method for solving the dual problem. The second class is the alternating direction method of multipliers (ADMM), including consensus ADMM, linearized ADMM, and proximal ADMM. We demonstrate that both classes of algorithms can be transformed into a unified update form that involves only primal and dual variables. This discovery reveals key connections between the two classes of algorithms: CoCoA can be interpreted as a special case of proximal ADMM for solving the dual problem, while consensus ADMM is closely related to a proximal ADMM algorithm. This discovery provides the insight that by adjusting the augmented Lagrangian parameter, we can easily enable the ADMM variants to outperform the CoCoA variants. We further explore linearized versions of ADMM and analyze the effects of tuning parameters on these ADMM variants in the distributed setting. Our theoretical findings are supported by extensive simulation studies and real-world data analysis.


Uniform-in-time weak propagation of chaos for consensus-based optimization

arXiv.org Artificial Intelligence

We study the uniform-in-time weak propagation of chaos for the consensus-based optimization (CBO) method on a bounded searching domain. We apply the methodology for studying long-time behaviors of interacting particle systems developed in the work of Delarue and Tse (ArXiv:2104.14973). Our work shows that the weak error has order $O(N^{-1})$ uniformly in time, where $N$ denotes the number of particles. The main strategy behind the proofs are the decomposition of the weak errors using the linearized Fokker-Planck equations and the exponential decay of their Sobolev norms. Consequently, our result leads to the joint convergence of the empirical distribution of the CBO particle system to the Dirac-delta distribution at the global minimizer in population size and running time in Wasserstein-type metrics.