quiver
Are Sparse Autoencoders Useful? A Case Study in Sparse Probing
Kantamneni, Subhash, Engels, Joshua, Rajamanoharan, Senthooran, Tegmark, Max, Nanda, Neel
Sparse autoencoders (SAEs) are a popular method for interpreting concepts represented in large language model (LLM) activations. However, there is a lack of evidence regarding the validity of their interpretations due to the lack of a ground truth for the concepts used by an LLM, and a growing number of works have presented problems with current SAEs. One alternative source of evidence would be demonstrating that SAEs improve performance on downstream tasks beyond existing baselines. We test this by applying SAEs to the real-world task of LLM activation probing in four regimes: data scarcity, class imbalance, label noise, and covariate shift. Due to the difficulty of detecting concepts in these challenging settings, we hypothesize that SAEs' basis of interpretable, concept-level latents should provide a useful inductive bias. However, although SAEs occasionally perform better than baselines on individual datasets, we are unable to design ensemble methods combining SAEs with baselines that consistently outperform ensemble methods solely using baselines. Additionally, although SAEs initially appear promising for identifying spurious correlations, detecting poor dataset quality, and training multi-token probes, we are able to achieve similar results with simple non-SAE baselines as well. Though we cannot discount SAEs' utility on other tasks, our findings highlight the shortcomings of current SAEs and the need to rigorously evaluate interpretability methods on downstream tasks with strong baselines.
- Europe > United Kingdom (0.14)
- Europe > Italy (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- (4 more...)
- Leisure & Entertainment (0.93)
- Government (0.67)
- Health & Medicine > Therapeutic Area > Oncology (0.46)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
Machines and Mathematical Mutations: Using GNNs to Characterize Quiver Mutation Classes
He, Jesse, Jenne, Helen, Chau, Herman, Brown, Davis, Raugas, Mark, Billey, Sara, Kvinge, Henry
Machine learning is becoming an increasingly valuable tool in mathematics, enabling one to identify subtle patterns across collections of examples so vast that they would be impossible for a single researcher to feasibly review and analyze. In this work, we use graph neural networks to investigate quiver mutation -- an operation that transforms one quiver (or directed multigraph) into another -- which is central to the theory of cluster algebras with deep connections to geometry, topology, and physics. In the study of cluster algebras, the question of mutation equivalence is of fundamental concern: given two quivers, can one efficiently determine if one quiver can be transformed into the other through a sequence of mutations? Currently, this question has only been resolved in specific cases. In this paper, we use graph neural networks and AI explainability techniques to discover mutation equivalence criteria for the previously unknown case of quivers of type $\tilde{D}_n$. Along the way, we also show that even without explicit training to do so, our model captures structure within its hidden representation that allows us to reconstruct known criteria from type $D_n$, adding to the growing evidence that modern machine learning models are capable of learning abstract and general rules from mathematical data.
Machine Learning Mutation-Acyclicity of Quivers
Armstrong-Williams, Kymani T. K., Hirst, Edward, Jackson, Blake, Lee, Kyu-Hwan
Machine learning (ML) has emerged as a powerful tool in mathematical research in recent years. This paper applies ML techniques to the study of quivers--a type of directed multigraph with significant relevance in algebra, combinatorics, computer science, and mathematical physics. Specifically, we focus on the challenging problem of determining the mutation-acyclicity of a quiver on 4 vertices, a property that is pivotal since mutation-acyclicity is often a necessary condition for theorems involving path algebras and cluster algebras. Although this classification is known for quivers with at most 3 vertices, little is known about quivers on more than 3 vertices. We give a computer-assisted proof of a theorem to prove that mutation-acyclicity is decidable for quivers on 4 vertices with edge weight at most 2. By leveraging neural networks (NNs) and support vector machines (SVMs), we then accurately classify more general 4-vertex quivers as mutation-acyclic or non-mutation-acyclic. Our results demonstrate that ML models can efficiently detect mutation-acyclicity, providing a promising computational approach to this combinatorial problem, from which the trained SVM equation provides a starting point to guide future theoretical development.
- North America > United States > Connecticut > Tolland County > Storrs (0.14)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Optimal and Near-Optimal Adaptive Vector Quantization
Ben-Basat, Ran, Ben-Itzhak, Yaniv, Mitzenmacher, Michael, Vargaftik, Shay
Quantization is a fundamental optimization for many machine-learning use cases, including compressing gradients, model weights and activations, and datasets. The most accurate form of quantization is \emph{adaptive}, where the error is minimized with respect to a given input, rather than optimizing for the worst case. However, optimal adaptive quantization methods are considered infeasible in terms of both their runtime and memory requirements. We revisit the Adaptive Vector Quantization (AVQ) problem and present algorithms that find optimal solutions with asymptotically improved time and space complexity. We also present an even faster near-optimal algorithm for large inputs. Our experiments show our algorithms may open the door to using AVQ more extensively in a variety of machine learning applications.
Quiver: Supporting GPUs for Low-Latency, High-Throughput GNN Serving with Workload Awareness
Tan, Zeyuan, Yuan, Xiulong, He, Congjie, Sit, Man-Kit, Li, Guo, Liu, Xiaoze, Ai, Baole, Zeng, Kai, Pietzuch, Peter, Mai, Luo
Systems for serving inference requests on graph neural networks (GNN) must combine low latency with high throughout, but they face irregular computation due to skew in the number of sampled graph nodes and aggregated GNN features. This makes it challenging to exploit GPUs effectively: using GPUs to sample only a few graph nodes yields lower performance than CPU-based sampling; and aggregating many features exhibits high data movement costs between GPUs and CPUs. Therefore, current GNN serving systems use CPUs for graph sampling and feature aggregation, limiting throughput. We describe Quiver, a distributed GPU-based GNN serving system with low-latency and high-throughput. Quiver's key idea is to exploit workload metrics for predicting the irregular computation of GNN requests, and governing the use of GPUs for graph sampling and feature aggregation: (1) for graph sampling, Quiver calculates the probabilistic sampled graph size, a metric that predicts the degree of parallelism in graph sampling. Quiver uses this metric to assign sampling tasks to GPUs only when the performance gains surpass CPU-based sampling; and (2) for feature aggregation, Quiver relies on the feature access probability to decide which features to partition and replicate across a distributed GPU NUMA topology. We show that Quiver achieves up to 35 times lower latency with an 8 times higher throughput compared to state-of-the-art GNN approaches (DGL and PyG).
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
- North America > United States > Georgia > Fulton County > Atlanta (0.04)
- (4 more...)
Composition Machines: Programming Self-Organising Software Models for the Emergence of Sequential Program Spaces
We are entering a new era in which software systems are becoming more and more complex and larger. So, the composition of such systems is becoming infeasible by manual means. To address this challenge, self-organising software models represent a promising direction since they allow the (bottom-up) emergence of complex computational structures from simple rules. In this paper, we propose an abstract machine, called the composition machine, which allows the definition and the execution of such models. Unlike typical abstract machines, our proposal does not compute individual programs but enables the emergence of multiple programs at once. We particularly present the machine's semantics and provide examples to demonstrate its operation with well-known rules from the realm of Boolean logic and elementary cellular automata.
- North America > United States > Rhode Island > Providence County > Providence (0.04)
- North America > United States > Massachusetts > Plymouth County > Hanover (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- (4 more...)
A Weighted Quiver Kernel using Functor Homology
In many applications, vertices or edges of graphs and quivers are labeled and have costs associated with them, also called weights. In this paper, we are interested in edge-weighted quivers. These weights are not restricted to just scalar values, but can also represent much more complex and richer relations between the nodes of an edge by modeling them as label sets or a function of several variables. Such weighted quivers arise frequently when modeling real-world applications, especially where the relationships among objects play an important role. Below are a few applications of weighted quivers that cover wide and diverse fields: - Physics: weighted quivers are used to represent atomic structures, where an atom is depicted as a vertex and the interactive forces between the atoms (i.e., vertices) are shown as directed edges between pairs of vertices. The edge weights here can model the strength of interaction between two vertices. Note that such a weighted quiver also accepts multiple edges between the same pair of vertices, where each edge potentially represents a different type of interactive force.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > Nevada (0.04)
- North America > United States > Maryland > Baltimore (0.04)
- (3 more...)
Quiver Mutations, Seiberg Duality and Machine Learning
Bao, Jiakang, Franco, Sebastián, He, Yang-Hui, Hirst, Edward, Musiker, Gregg, Xiao, Yan
We initiate the study of applications of machine learning to Seiberg duality, focusing on the case of quiver gauge theories, a problem also of interest in mathematics in the context of cluster algebras. Within the general theme of Seiberg duality, we define and explore a variety of interesting questions, broadly divided into the binary determination of whether a pair of theories picked from a series of duality classes are dual to each other, as well as the multi-class determination of the duality class to which a given theory belongs. We study how the performance of machine learning depends on several variables, including number of classes and mutation type (finite or infinite). In addition, we evaluate the relative advantages of Naive Bayes classifiers versus Convolutional Neural Networks. Finally, we also investigate how the results are affected by the inclusion of additional data, such as ranks of gauge/flavor groups and certain variables motivated by the existence of underlying Diophantine equations. In all questions considered, high accuracy and confidence can be achieved.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.28)
- North America > United States > New York > New York County > New York City (0.14)
- North America > Canada > Alberta > Census Division No. 6 > Calgary Metropolitan Region > Calgary (0.04)
- (3 more...)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)