AITopics

Budd, Jeremy, Ideami, Javier, Rynne, Benjamin Macdowall, Duggar, Keith, Balestriero, Randall

SplInterp: Improving our Understanding and Training of Sparse Autoencoders

arXiv.org Artificial IntelligenceMay-20-2025

Sparse autoencoders (SAEs) have received considerable recent attention as tools for mechanistic interpretability, showing success at extracting interpretable features even from very large LLMs. However, this research has been largely empirical, and there have been recent doubts about the true utility of SAEs. In this work, we seek to enhance the theoretical understanding of SAEs, using the spline theory of deep learning. By situating SAEs in this framework: we discover that SAEs generalise ``$k$-means autoencoders'' to be piecewise affine, but sacrifice accuracy for interpretability vs. the optimal ``$k$-means-esque plus local principal component analysis (PCA)'' piecewise affine autoencoder. We characterise the underlying geometry of (TopK) SAEs using power diagrams. And we develop a novel proximal alternating method SGD (PAM-SGD) algorithm for training SAEs, with both solid theoretical foundations and promising empirical results in MNIST and LLM experiments, particularly in sample efficiency and (in the LLM setting) improved sparsity of codes. All code is available at: https://github.com/splInterp2025/splInterp

artificial intelligence, diagram, machine learning, (18 more...)

2505.11836

Country: Europe (0.28)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

arXiv.org Artificial IntelligenceDec-10-2024

TTVD: Towards a Geometric Framework for Test-Time Adaptation Based on Voronoi Diagram

Lei, Mingxi, Ma, Chunwei, Ding, Meng, Zhou, Yufan, Huang, Ziyun, Xu, Jinhui

Deep learning models often struggle with generalization when deploying on real-world data, due to the common distributional shift to the training data. Test-time adaptation (TTA) is an emerging scheme used at inference time to address this issue. In TTA, models are adapted online at the same time when making predictions to test data. Neighbor-based approaches have gained attention recently, where prototype embeddings provide location information to alleviate the feature shift between training and testing data. However, due to their inherit limitation of simplicity, they often struggle to learn useful patterns and encounter performance degradation. To confront this challenge, we study the TTA problem from a geometric point of view. We first reveal that the underlying structure of neighbor-based methods aligns with the Voronoi Diagram, a classical computational geometry model for space partitioning. Building on this observation, we propose the Test-Time adjustment by Voronoi Diagram guidance (TTVD), a novel framework that leverages the benefits of this geometric property. Specifically, we explore two key structures: 1) Cluster-induced Voronoi Diagram (CIVD): This integrates the joint contribution of self-supervision and entropy-based methods to provide richer information. 2) Power Diagram (PD): A generalized version of the Voronoi Diagram that refines partitions by assigning weights to each Voronoi cell. Our experiments under rigid, peer-reviewed settings on CIFAR-10-C, CIFAR-100-C, ImageNet-C, and ImageNet-R shows that TTVD achieves remarkable improvements compared to state-of-the-art methods. Moreover, extensive experimental results also explore the effects of batch size and class imbalance, which are two scenarios commonly encountered in real-world applications. These analyses further validate the robustness and adaptability of our proposed framework.

artificial intelligence, machine learning, spatial reasoning, (19 more...)

2412.0798

Country:

North America > United States > New York (0.04)
Europe > Netherlands > South Holland > Delft (0.04)
Asia > Middle East > Jordan (0.04)

Genre:

Research Report > New Finding (0.46)
Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Palacios-Gasós, José Manuel, Tardioli, Danilo, Montijano, Eduardo, Sagüés, Carlos

Equitable Persistent Coverage of Non-Convex Environments with Graph-Based Planning

arXiv.org Artificial IntelligenceJan-24-2024

In this paper we tackle the problem of persistently covering a complex non-convex environment with a team of robots. We consider scenarios where the coverage quality of the environment deteriorates with time, requiring to constantly revisit every point. As a first step, our solution finds a partition of the environment where the amount of work for each robot, weighted by the importance of each point, is equal. This is achieved using a power diagram and finding an equitable partition through a provably correct distributed control law on the power weights. Compared to other existing partitioning methods, our solution considers a continuous environment formulation with non-convex obstacles. In the second step, each robot computes a graph that gathers sweep-like paths and covers its entire partition. At each planning time, the coverage error at the graph vertices is assigned as weights of the corresponding edges. Then, our solution is capable of efficiently finding the optimal open coverage path through the graph with respect to the coverage error per distance traversed. Simulation and experimental results are presented to support our proposal.

algorithm, partition, robot, (16 more...)

doi: 10.1177/0278364919882082

2401.13614

Country:

Europe > Spain > Aragón > Zaragoza Province > Zaragoza (0.04)
North America > United States > Colorado (0.04)
Asia > Japan (0.04)

Genre: Research Report (0.63)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.92)

Ghimire, Donipolo, Kia, Solmaz S.

Stein Coverage: a Variational Inference Approach to Distribution-matching Multisensor Deployment

arXiv.org Artificial IntelligenceDec-12-2023

This paper examines the spatial coverage optimization problem for multiple sensors in a known convex environment, where the coverage service of each sensor is heterogeneous and anisotropic. We introduce the Stein Coverage algorithm, a distribution-matching coverage approach that aims to place sensors at positions and orientations such that their collective coverage distribution is as close as possible to the event distribution. To select the most important representative points from the coverage event distribution, Stein Coverage utilizes the Stein Variational Gradient Descent (SVGD), a deterministic sampling method from the variational inference literature. An innovation in our work is the introduction of a repulsive force between the samples in the SVGD algorithm to spread the samples and avoid footprint overlap for the deployed sensors. After pinpointing the points of interest for deployment, Stein Coverage solves the multisensor assignment problem using a bipartite optimal matching process. Simulations demonstrate the advantages of the Stein Coverage method compared to conventional Voronoi partitioning multisensor deployment methods.

algorithm, deployment, sensor, (16 more...)

2312.07001

Country:

North America > United States > California > Orange County > Irvine (0.14)
Asia > Middle East > Jordan (0.04)

Genre:

Research Report (1.00)
Overview (0.66)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Robots (0.96)
Information Technology > Communications > Networks (0.94)

Chen, Jun, Abugurain, Mohammed, Dames, Philip, Park, Shinkyu

Distributed Multi-Robot Multi-Target Tracking Using Heterogeneous Limited-Range Sensors

arXiv.org Artificial IntelligenceNov-3-2023

This paper presents a cooperative multi-robot multi-target tracking framework aimed at enhancing the efficiency of the heterogeneous sensor network and, consequently, improving overall target tracking accuracy. The concept of normalized unused sensing capacity is introduced to quantify the information a sensor is currently gathering relative to its theoretical maximum. This measurement can be computed using entirely local information and is applicable to various sensor models, distinguishing it from previous literature on the subject. It is then utilized to develop a distributed coverage control strategy for a heterogeneous sensor network, adaptively balancing the workload based on each sensor's current unused capacity. The algorithm is validated through a series of ROS and MATLAB simulations, demonstrating superior results compared to standard approaches that do not account for heterogeneity or current usage rates.

algorithm, robot, sensor, (17 more...)

2311.01707

Country:

Asia > China > Jiangsu Province > Nanjing (0.04)
North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
Asia > Middle East > Saudi Arabia (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Communications > Networks > Sensor Networks (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)

Finocchiaro, Jessie, Frongillo, Rafael, Waggoner, Bo

An Embedding Framework for Consistent Polyhedral Surrogates

arXiv.org Machine LearningJul-17-2019

We formalize and study the natural approach of designing convex surrogate loss functions via embeddings for problems such as classification or ranking. In this approach, one embeds each of the finitely many predictions (e.g. classes) as a point in R^d, assigns the original loss values to these points, and convexifies the loss in between to obtain a surrogate. We prove that this approach is equivalent, in a strong sense, to working with polyhedral (piecewise linear convex) losses. Moreover, given any polyhedral loss $L$, we give a construction of a link function through which $L$ is a consistent surrogate for the loss it embeds. We go on to illustrate the power of this embedding framework with succinct proofs of consistency or inconsistency of various polyhedral surrogates in the literature.

artificial intelligence, machine learning, polyhedral, (18 more...)

1907.0733

Country:

North America > United States > Colorado (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Balestriero, Randall, Cosentino, Romain, Aazhang, Behnaam, Baraniuk, Richard

The Geometry of Deep Networks: Power Diagram Subdivision

arXiv.org Machine LearningMay-21-2019

We study the geometry of deep (neural) networks (DNs) with piecewise affine and convex nonlinearities. The layers of such DNs have been shown to be {\em max-affine spline operators} (MASOs) that partition their input space and apply a region-dependent affine mapping to their input to produce their output. We demonstrate that each MASO layer's input space partitioning corresponds to a {\em power diagram} (an extension of the classical Voronoi tiling) with a number of regions that grows exponentially with respect to the number of units (neurons). We further show that a composition of MASO layers (e.g., the entire DN) produces a progressively subdivided power diagram and provide its analytical form. The subdivision process constrains the affine maps on the (exponentially many) power diagram regions to greatly reduce their complexity. For classification problems, we obtain a formula for a MASO DN's decision boundary in the input space plus a measure of its curvature that depends on the DN's nonlinearities, weights, and architecture. Numerous numerical experiments support and extend our theoretical results.

artificial intelligence, deep learning, machine learning, (17 more...)

1905.08443

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

arXiv.org Machine LearningDec-18-2017

A Geometric View of Optimal Transportation and Generative Model

Lei, Na, Su, Kehua, Cui, Li, Yau, Shing-Tung, Gu, David Xianfeng

In this work, we show the intrinsic relations between optimal transportation and convex geometry, especially the variational approach to solve Alexandrov problem: constructing a convex polytope with prescribed face normals and volumes. This leads to a geometric interpretation to generative models, and leads to a novel framework for generative models. By using the optimal transportation view of GAN model, we show that the discriminator computes the Kantorovich potential, the generator calculates the transportation map. For a large class of transportation costs, the Kantorovich potential can give the optimal transportation map by a close-form formula. Therefore, it is sufficient to solely optimize the discriminator. This shows the adversarial competition can be avoided, and the computational architecture can be simplified. Preliminary experimental results show the geometric method outperforms WGAN for approximating probability measures with multiple clusters in low dimensional space.

arxiv preprint arxiv, machine learning, natural language, (18 more...)

1710.05488

Country:

Asia > China (0.46)
North America > United States (0.28)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Vision (0.94)
Information Technology > Artificial Intelligence > Natural Language > Generation (0.82)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)

Borgwardt, Steffen, Brieden, Andreas, Gritzmann, Peter

A balanced k-means algorithm for weighted point sets

arXiv.org Machine LearningAug-3-2016

The classical $k$-means algorithm for partitioning $n$ points in $\mathbb{R}^d$ into $k$ clusters is one of the most popular and widely spread clustering methods. The need to respect prescribed lower bounds on the cluster sizes has been observed in many scientific and business applications. In this paper, we present and analyze a generalization of $k$-means that is capable of handling weighted point sets and prescribed lower and upper bounds on the cluster sizes. We call it weight-balanced $k$-means. The key difference to existing models lies in the ability to handle the combination of weighted point sets with prescribed bounds on the cluster sizes. This imposes the need to perform partial membership clustering, and leads to significant differences. For example, while finite termination of all $k$-means variants for unweighted point sets is a simple consequence of the existence of only finitely many partitions of a given set of points, the situation is more involved for weighted point sets, as there are infinitely many partial membership clusterings. Using polyhedral theory, we show that the number of iterations of weight-balanced $k$-means is bounded above by $n^{O(dk)}$, so in particular it is polynomial for fixed $k$ and $d$. This is similar to the known worst-case upper bound for classical $k$-means for unweighted point sets and unrestricted cluster sizes, despite the much more general framework. We conclude with the discussion of some additional favorable properties of our method.

artificial intelligence, machine learning, power diagram, (18 more...)

1308.4004

Country: North America > United States (0.28)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.89)