Goto

Collaborating Authors

 output activation


A Regularizing Optimal Transport with f-Divergences Name f (v) f

Neural Information Processing Systems

The primal and dual are related by the Lagrangian L (,',), L ( ',,)= E We proceed to proofs of the theorems stated in Section 4. Assumption NTK, the regularization parameter, and it may also depend indirectly on the bound R . Theorem 4.2 follows immediately from Lemmas B.1 and B.2. Theorem The following result follows from Proposition E.4 and E.5 of of Luise et al. Interestingly, the rate of estimation of the Sinkhorn plan breaks the curse of dimensionality. B.2 Log-concavity of Sinkhorn Factor The optimal entropy regularized Sinkhorn plan is given by The optimal potentials satisfy fixed point equations. Using this result, one can prove the following lemma.


Deep Neural Nets with Interpolating Function as Output Activation

Neural Information Processing Systems

We replace the output layer of deep neural nets, typically the softmax function, by a novel interpolating function. And we propose end-to-end training and testing algorithms for this new architecture. Compared to classical neural nets with softmax function as output activation, the surrogate with interpolating function as output activation combines advantages of both deep and manifold learning. The new framework demonstrates the following major advantages: First, it is better applicable to the case with insufficient training data. Second, it significantly improves the generalization accuracy on a wide variety of networks. The algorithm is implemented in PyTorch, and the code is available at https://github.com/


Deep Neural Nets with Interpolating Function as Output Activation

Bao Wang, Xiyang Luo, Zhen Li, Wei Zhu, Zuoqiang Shi, Stanley Osher

Neural Information Processing Systems

And we propose end-to-end training and testing algorithms for this new architecture. Compared to classical neural nets with softmax function as output activation, the surrogate with interpolating function as output activation combines advantages of both deep and manifold learning.


Deep Neural Nets with Interpolating Function as Output Activation

Bao Wang, Xiyang Luo, Zhen Li, Wei Zhu, Zuoqiang Shi, Stanley Osher

Neural Information Processing Systems

And we propose end-to-end training and testing algorithms for this new architecture. Compared to classical neural nets with softmax function as output activation, the surrogate with interpolating function as output activation combines advantages of both deep and manifold learning.


Channel Gating Neural Networks

Weizhe Hua, Yuan Zhou, Christopher M. De Sa, Zhiru Zhang, G. Edward Suh

Neural Information Processing Systems

Unlike static network pruning, channel gating optimizes CNN inference at run-time by exploiting input-specific characteristics, which allows substantially reducing the compute cost with almost no accuracy loss. We experimentally show that applying channel gating in state-of-the-art networks achieves 2.7-8.0



A Regularizing Optimal Transport with f-Divergences Name f (v) f

Neural Information Processing Systems

The primal and dual are related by the Lagrangian L (,',), L ( ',,)= E We proceed to proofs of the theorems stated in Section 4. Assumption NTK, the regularization parameter, and it may also depend indirectly on the bound R . Theorem 4.2 follows immediately from Lemmas B.1 and B.2. Theorem The following result follows from Proposition E.4 and E.5 of of Luise et al. Interestingly, the rate of estimation of the Sinkhorn plan breaks the curse of dimensionality. B.2 Log-concavity of Sinkhorn Factor The optimal entropy regularized Sinkhorn plan is given by The optimal potentials satisfy fixed point equations. Using this result, one can prove the following lemma.


Deriving Equivalent Symbol-Based Decision Models from Feedforward Neural Networks

Seidel, Sebastian, Borghoff, Uwe M.

arXiv.org Artificial Intelligence

Artificial intelligence (AI) has emerged as a transformative force across industries, driven by advances in deep learning and natural language processing, and fueled by large-scale data and computing resources. Despite its rapid adoption, the opacity of AI systems poses significant challenges to trust and acceptance. This work explores the intersection of connectionist and symbolic approaches to artificial intelligence, focusing on the derivation of interpretable symbolic models, such as decision trees, from feedforward neural networks (FNNs). Decision trees provide a transparent framework for elucidating the operations of neural networks while preserving their functionality. The derivation is presented in a step-by-step approach and illustrated with several examples. A systematic methodology is proposed to bridge neural and symbolic paradigms by exploiting distributed representations in FNNs to identify symbolic components, including fillers, roles, and their interrelationships. The process traces neuron activation values and input configurations across network layers, mapping activations and their underlying inputs to decision tree edges. The resulting symbolic structures effectively capture FNN decision processes and enable scalability to deeper networks through iterative refinement of subpaths for each hidden layer. To validate the theoretical framework, a prototype was developed using Keras .h5-data and emulating TensorFlow within the Java JDK/JavaFX environment. This prototype demonstrates the feasibility of extracting symbolic representations from neural networks, enhancing trust in AI systems, and promoting accountability.


ACE: Exploring Activation Cosine Similarity and Variance for Accurate and Calibration-Efficient LLM Pruning

Mi, Zhendong, Kong, Zhenglun, Yuan, Geng, Huang, Shaoyi

arXiv.org Artificial Intelligence

With the rapid expansion of large language models (LLMs), the demand for memory and computational resources has grown significantly. Recent advances in LLM pruning aim to reduce the size and computational cost of these models. However, existing methods often suffer from either suboptimal pruning performance or low time efficiency during the pruning process. In this work, we propose an efficient and effective pruning method that simultaneously achieves high pruning performance and fast pruning speed with improved calibration efficiency. Our approach introduces two key innovations: (1) An activation cosine similarity loss-guided pruning metric, which considers the angular deviation of the output activation between the dense and pruned models. (2) An activation variance-guided pruning metric, which helps preserve semantic distinctions in output activations after pruning, enabling effective pruning with shorter input sequences. These two components can be readily combined to enhance LLM pruning in both accuracy and efficiency. Experimental results show that our method achieves up to an 18% reduction in perplexity and up to 63% decrease in pruning time on prevalent LLMs such as LLaMA, LLaMA-2, and OPT.


Ensemble Kalman filter for uncertainty in human language comprehension

Bhandari, Diksha, Lopopolo, Alessandro, Rabovsky, Milena, Reich, Sebastian

arXiv.org Machine Learning

Artificial neural networks (ANNs) are widely used in modeling sentence processing but often exhibit deterministic behavior, contrasting with human sentence comprehension, which manages uncertainty during ambiguous or unexpected inputs. This is exemplified by reversal anomalies--sentences with unexpected role reversals that challenge syntax and semantics--highlighting the limitations of traditional ANN models, such as the Sentence Gestalt (SG) Model. To address these limitations, we propose a Bayesian framework for sentence comprehension, applying an extention of the ensemble Kalman filter (EnKF) for Bayesian inference to quantify uncertainty. By framing language comprehension as a Bayesian inverse problem, this approach enhances the SG model's ability to reflect human sentence processing with respect to the representation of uncertainty. Numerical experiments and comparisons with maximum likelihood estimation (MLE) demonstrate that Bayesian methods improve uncertainty representation, enabling the model to better approximate human cognitive processing when dealing with linguistic ambiguities. Introduction Artificial neural networks (ANNs) have become indispensable tools in modeling sentence processing within the field of natural language processing and cognitive science. These models are capable of handling complex linguistic structures, making accurate predictions, and resolving ambiguities with a notable degree of certainty, even when they are wrong Guo et al. (2017); Hein et al. (2019). However, this behavior stands in contrast to human sentence comprehension, which often involves managing uncertainty, especially when faced with ambiguous or unexpected language inputs. The research has been funded by the Deutsche Forschungsgemeinschaft (DFG)- Project-ID 318763901 - SFB1294.