Goto

Collaborating Authors

 icb


Scalable iterative pruning of large language and vision models using block coordinate descent

Rosenberg, Gili, Brubaker, J. Kyle, Schuetz, Martin J. A., Zhu, Elton Yechao, Kadıoğlu, Serdar, Borujeni, Sima E., Katzgraber, Helmut G.

arXiv.org Artificial Intelligence

Pruning neural networks, which involves removing a fraction of their weights, can often maintain high accuracy while significantly reducing model complexity, at least up to a certain limit. We present a neural network pruning technique that builds upon the Combinatorial Brain Surgeon, but solves an optimization problem over a subset of the network weights in an iterative, block-wise manner using block coordinate descent. The iterative, block-based nature of this pruning technique, which we dub ``iterative Combinatorial Brain Surgeon'' (iCBS) allows for scalability to very large models, including large language models (LLMs), that may not be feasible with a one-shot combinatorial optimization approach. When applied to large models like Mistral and DeiT, iCBS achieves higher performance metrics at the same density levels compared to existing pruning methods such as Wanda. This demonstrates the effectiveness of this iterative, block-wise pruning method in compressing and optimizing the performance of large deep learning models, even while optimizing over only a small fraction of the weights. Moreover, our approach allows for a quality-time (or cost) tradeoff that is not available when using a one-shot pruning technique alone. The block-wise formulation of the optimization problem enables the use of hardware accelerators, potentially offsetting the increased computational costs compared to one-shot pruning methods like Wanda. In particular, the optimization problem solved for each block is quantum-amenable in that it could, in principle, be solved by a quantum computer.


KANICE: Kolmogorov-Arnold Networks with Interactive Convolutional Elements

Ferdaus, Md Meftahul, Abdelguerfi, Mahdi, Ioup, Elias, Dobson, David, Niles, Kendall N., Pathak, Ken, Sloan, Steven

arXiv.org Artificial Intelligence

We introduce KANICE (Kolmogorov-Arnold Networks with Interactive Convolutional Elements), a novel neural architecture that combines Convolutional Neural Networks (CNNs) with Kolmogorov-Arnold Network (KAN) principles. KANICE integrates Interactive Convolutional Blocks (ICBs) and KAN linear layers into a CNN framework. This leverages KANs' universal approximation capabilities and ICBs' adaptive feature learning. KANICE captures complex, non-linear data relationships while enabling dynamic, context-dependent feature extraction based on the Kolmogorov-Arnold representation theorem. We evaluated KANICE on four datasets: MNIST, Fashion-MNIST, EMNIST, and SVHN, comparing it against standard CNNs, CNN-KAN hybrids, and ICB variants. KANICE consistently outperformed baseline models, achieving 99.35% accuracy on MNIST and 90.05% on the SVHN dataset. Furthermore, we introduce KANICE-mini, a compact variant designed for efficiency. A comprehensive ablation study demonstrates that KANICE-mini achieves comparable performance to KANICE with significantly fewer parameters. KANICE-mini reached 90.00% accuracy on SVHN with 2,337,828 parameters, compared to KANICE's 25,432,000. This study highlights the potential of KAN-based architectures in balancing performance and computational efficiency in image classification tasks. Our work contributes to research in adaptive neural networks, integrates mathematical theorems into deep learning architectures, and explores the trade-offs between model complexity and performance, advancing computer vision and pattern recognition. The source code for this paper is publicly accessible through our GitHub repository (https://github.com/m-ferdaus/kanice).


Bounding generalization error with input compression: An empirical study with infinite-width networks

Galloway, Angus, Golubeva, Anna, Salem, Mahmoud, Nica, Mihai, Ioannou, Yani, Taylor, Graham W.

arXiv.org Artificial Intelligence

Estimating the Generalization Error (GE) of Deep Neural Networks (DNNs) is an important task that often relies on availability of held-out data. The ability to better predict GE based on a single training set may yield overarching DNN design principles to reduce a reliance on trial-and-error, along with other performance assessment advantages. In search of a quantity relevant to GE, we investigate the Mutual Information (MI) between the input and final layer representations, using the infinite-width DNN limit to bound MI. An existing input compression-based GE bound is used to link MI and GE. To the best of our knowledge, this represents the first empirical study of this bound. In our attempt to empirically falsify the theoretical bound, we find that it is often tight for best-performing models. Furthermore, it detects randomization of training labels in many cases, reflects test-time perturbation robustness, and works well given only few training samples. These results are promising given that input compression is broadly applicable where MI can be estimated with confidence.


Inverse Contextual Bandits: Learning How Behavior Evolves over Time

Hüyük, Alihan, Jarrett, Daniel, van der Schaar, Mihaela

arXiv.org Machine Learning

Understanding an agent's priorities by observing their behavior is critical for transparency and accountability in decision processes, such as in healthcare. While conventional approaches to policy learning almost invariably assume stationarity in behavior, this is hardly true in practice: Medical practice is constantly evolving, and clinical professionals are constantly fine-tuning their priorities. We desire an approach to policy learning that provides (1) interpretable representations of decision-making, accounts for (2) non-stationarity in behavior, as well as operating in an (3) offline manner. First, we model the behavior of learning agents in terms of contextual bandits, and formalize the problem of inverse contextual bandits (ICB). Second, we propose two algorithms to tackle ICB, each making varying degrees of assumptions regarding the agent's learning strategy. Finally, through both real and simulated data for liver transplantations, we illustrate the applicability and explainability of our method, as well as validating its accuracy.


ICBS: Improved Conflict-Based Search Algorithm for Multi-Agent Pathfinding

Boyarski, Eli (Bar_Ilan University) | Felner, Ariel (Ben-Gurion University) | Stern, Roni (Ben-Gurion Univerity) | Sharon, Guni (Ben-Gurion University) | Tolpin, David (Ben-Gurion University) | Betzalel, Oded (Ben-Gurion University) | Shimony, Eyal (Ben-Gurion University)

AAAI Conferences

Conflict-Based Search (CBS) and its enhancements, Meta-Agent CBS and bypassing conflicts are amongst the strongest newly introduced algorithms for Multi-Agent Path Finding. This paper introduces two new improvements to CBS and incorporates them into a coherent, improved version of CBS, namely ICBS. Experimental results show that each of these improvements further reduces the runtime over the existing CBS-based approaches. When all improvements are combined, an even larger improvement is achieved, producing state-of-the art results for a number of domains.


ICBS: The Improved Conflict-Based Search Algorithm for Multi-Agent Pathfinding

Boyarski, Eli (Bar-Ilan University) | Felner, Ariel (Ben-Gurion University of the Negev) | Stern, Roni (Ben-Gurion University of the Negev) | Sharon, Guni (Ben-Gurion University of the Negev) | Betzalel, Oded (Ben-Gurion University of the Negev) | Tolpin, David (Ben-Gurion University of the Negev) | Shimony, Eyal (Ben-Gurion University of the Negev)

AAAI Conferences

Conflict-Based Search (CBS) and its generalization, Meta-Agent CBS are amongst the strongest newly introduced algorithms for Multi-Agent Path Finding. This paper introduces ICBS, an improved version of CBS. ICBS incorporates three orthogonal improvements to CBS which are systematically described and studied. Experimental results show that each of these improvements reduces the runtime over basic CBS by up to 20x in many cases. When all three improvements are combined, an even larger improvement is achieved, producing state-ofthe art results for a number of domains.