AITopics | Canada

Collaborating Authors

Canada

An Improved Empirical Fisher Approximation for Natural Gradient Descent Xiaodong Wu1 Philip Woodland

Neural Information Processing SystemsJun-2-2025, 15:41:02 GMT

Approximate Natural Gradient Descent (NGD) methods are an important family of optimisers for deep learning models, which use approximate Fisher information matrices to pre-condition gradients during training. The empirical Fisher (EF) method approximates the Fisher information matrix empirically by reusing the per-sample gradients collected during back-propagation. Despite its ease of implementation, the EF approximation has its theoretical and practical limitations. This paper investigates the inversely-scaled projection issue of EF, which is shown to be a major cause of its poor empirical approximation quality. An improved empirical Fisher (iEF) method is proposed to address this issue, which is motivated as a generalised NGD method from a loss reduction perspective, meanwhile retaining the practical convenience of EF.

artificial intelligence, deep learning, machine learning, (19 more...)

Neural Information Processing Systems

Country:

Europe (1.00)
North America > United States (0.92)
North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report > Experimental Study (1.00)

Industry: Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

NE: Surrogate-Assisted Federated Neighbor Embedding for Dimensionality Reduction

Neural Information Processing SystemsJun-2-2025, 15:38:31 GMT

Federated learning (FL) has rapidly evolved as a promising paradigm that enables collaborative model training across distributed participants without exchanging their local data. Despite its broad applications in fields such as computer vision, graph learning, and natural language processing, the development of a data projection model that can be effectively used to visualize data in the context of FL is crucial yet remains heavily under-explored. Neighbor embedding (NE) is an essential technique for visualizing complex high-dimensional data, but collaboratively learning a joint NE model is difficult. The key challenge lies in the objective function, as effective visualization algorithms like NE require computing loss functions among pairs of data.

data mining, machine learning, natural language, (16 more...)

Neural Information Processing Systems

Country: North America > Canada > Ontario > Toronto (0.14)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Dimensionality Reduction (0.41)

Add feedback

Energy-based Hopfield Boosting for Out-of-Distribution Detection Claus Hofmann 1 Simon Schmid 2 Daniel Klotz

Neural Information Processing SystemsJun-2-2025, 15:35:55 GMT

Out-of-distribution (OOD) detection is critical when deploying machine learning models in the real world. Outlier exposure methods, which incorporate auxiliary outlier data in the training process, can drastically improve OOD detection performance compared to approaches without advanced training strategies. We introduce Hopfield Boosting, a boosting approach, which leverages modern Hopfield energy to sharpen the decision boundary between the in-distribution and OOD data. Hopfield Boosting encourages the model to focus on hard-to-distinguish auxiliary outlier examples that lie close to the decision boundary between in-distribution and auxiliary outlier data. Our method achieves a new state-of-the-art in OOD detection with outlier exposure, improving the FPR95 from 2.28 to 0.92 on CIFAR-10, from 11.76 to 7.94 on CIFAR-100, and from 50.74 to 36.60 on ImageNet-1K.

artificial intelligence, detection, machine learning, (16 more...)

Neural Information Processing Systems

Country:

North America > United States (0.28)
Europe > Spain (0.27)
Europe > Austria > Upper Austria (0.14)
North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report > Experimental Study (1.00)

Industry:

Information Technology (0.67)
Health & Medicine (0.67)
Education > Educational Setting > Online (0.45)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Clustering with Same-Cluster Queries

Hassan Ashtiani, Shrinu Kushagra, Shai Ben-David

Neural Information Processing SystemsJun-2-2025, 15:18:34 GMT

We propose a framework for Semi-Supervised Active Clustering framework (SSAC), where the learner is allowed to interact with a domain expert, asking whether two given instances belong to the same cluster or not. We study the query and computational complexity of clustering in this framework. We consider a setting where the expert conforms to a center-based clustering with a notion of margin. We show that there is a trade off between computational complexity and query complexity; We prove that for the case of k-means clustering (i.e., when the expert conforms to a solution of k-means), having access to relatively few such queries allows efficient solutions to otherwise NP hard problems.

artificial intelligence, machine learning, query, (18 more...)

Neural Information Processing Systems

Country:

North America > United States > California (0.14)
North America > Canada > Ontario (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

On Differentially Private Graph Sparsification and Applications

Raman Arora, Jalaj Upadhyay

Neural Information Processing SystemsJun-2-2025, 14:35:29 GMT

In this paper, we study private sparsification of graphs. In particular, we give an algorithm that given an input graph, returns a sparse graph which approximates the spectrum of the input graph while ensuring differential privacy. This allows one to solve many graph problems privately yet efficiently and accurately. This is exemplified with application of the proposed meta-algorithm to graph algorithms for privately answering cut-queries, as well as practical algorithms for computing MAX-CUT and SPARSEST-CUT with better accuracy than previously known. We also give an efficient private algorithm to learn Laplacian eigenmap on a graph.

artificial intelligence, graph, machine learning, (17 more...)

Neural Information Processing Systems

Country: North America > Canada (0.14)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Security & Privacy (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.46)

Add feedback

Graph-based Discriminators: Sample Complexity and Expressiveness

Roi Livni, Yishay Mansour

Neural Information Processing SystemsJun-2-2025, 14:34:40 GMT

A basic question in learning theory is to identify if two distributions are identical when we have access only to examples sampled from the distributions. This basic task is considered, for example, in the context of Generative Adversarial Networks (GANs), where a discriminator is trained to distinguish between a reallife distribution and a synthetic distribution. Classically, we use a hypothesis class H and claim that the two distributions are distinct if for some h H the expected value on the two distributions is (significantly) different. Our starting point is the following fundamental problem: "is having the hypothesis dependent on more than a single random example beneficial". To address this challenge we define k-ary based discriminators, which have a family of Boolean k-ary functions G.

artificial intelligence, discriminator, machine learning, (14 more...)

Neural Information Processing Systems

Country:

North America > Canada (0.14)
Europe > Netherlands (0.14)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.73)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.49)

Add feedback

Visual Pinwheel Centers Act as Geometric Saliency Detectors Mingyi Huang

Neural Information Processing SystemsJun-2-2025, 14:17:05 GMT

During natural evolution, the primary visual cortex (V1) of lower mammals typically forms salt-and-pepper organizations, while higher mammals and primates develop pinwheel structures with distinct topological properties. Despite the general belief that V1 neurons primarily serve as edge detectors, the functional advantages of pinwheel structures over salt-and-peppers are not well recognized. To this end, we propose a two-dimensional self-evolving spiking neural network that integrates Hebbian-like plasticity and empirical morphological data.

artificial intelligence, machine learning, neuron, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > Texas (0.14)
North America > Canada > Ontario > Toronto (0.14)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (0.92)

Add feedback

Gaussian-Based Pooling for Convolutional Neural Networks

Takumi Kobayashi

Neural Information Processing SystemsJun-2-2025, 14:16:58 GMT

Convolutional neural networks (CNNs) contain local pooling to effectively downsize feature maps for increasing computation efficiency as well as robustness to input variations. The local pooling methods are generally formulated in a form of convex combination of local neuron activations for retaining the characteristics of an input feature map in a manner similar to image downscaling. In this paper, to improve performance of CNNs, we propose a novel local pooling method based on the Gaussian-based probabilistic model over local neuron activations for flexibly pooling (extracting) features, in contrast to the previous model restricting the output within the convex hull of local neurons. In the proposed method, the local neuron activations are aggregated into the statistics of mean and standard deviation in a Gaussian distribution, and then on the basis of those statistics, we construct the probabilistic model suitable for the pooling in accordance with the knowledge about local pooling in CNNs. Through the probabilistic model equipped with trainable parameters, the proposed method naturally integrates two schemes of adaptively training the pooling form based on input feature maps and stochastically performing the pooling throughout the end-to-end learning. The experimental results on image classification demonstrate that the proposed method favorably improves performance of various CNNs in comparison with the other pooling methods.

artificial intelligence, machine learning, probabilistic model, (18 more...)

Neural Information Processing Systems

Country: North America > Canada > Ontario > Toronto (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)

Add feedback

xLSTM: Extended Long Short-Term Memory Maximilian Beck 1,2,3 Korbinian Pöppel

Neural Information Processing SystemsJun-2-2025, 14:13:40 GMT

In the 1990s, the constant error carousel and gating were introduced as the central ideas of the Long Short-Term Memory (LSTM). Since then, LSTMs have stood the test of time and contributed to numerous deep learning success stories, in particular they constituted the first Large Language Models (LLMs). However, the advent of the Transformer technology with parallelizable self-attention at its core marked the dawn of a new era, outpacing LSTMs at scale. We now raise a simple question: How far do we get in language modeling when scaling LSTMs to billions of parameters, leveraging the latest techniques from modern LLMs, but mitigating known limitations of LSTMs? Firstly, we introduce exponential gating with appropriate normalization and stabilization techniques. Secondly, we modify the LSTM memory structure, obtaining: (i) sLSTM with a scalar memory, a scalar update, and new memory mixing, (ii) mLSTM that is fully parallelizable with a matrix memory and a covariance update rule. Integrating these LSTM extensions into residual block backbones yields xLSTM blocks that are then residually stacked into xLSTM architectures. Exponential gating and modified memory structures boost xLSTM capabilities to perform favorably when compared to state-of-the-art Transformers and State Space Models, both in performance and scaling.

artificial intelligence, machine learning, xlstm, (18 more...)

Neural Information Processing Systems

Country:

Europe (0.67)
North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report > New Finding (0.45)

Industry:

Health & Medicine (1.00)
Media > News (0.45)
Leisure & Entertainment > Games (0.45)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Are Disentangled Representations Helpful for Abstract Visual Reasoning?

Sjoerd van Steenkiste, Francesco Locatello, Jürgen Schmidhuber, Olivier Bachem

Neural Information Processing SystemsJun-2-2025, 14:12:17 GMT

Although it is often argued that this representational format is useful in learning to solve many real-world down-stream tasks, there is little empirical evidence that supports this claim. In this paper, we conduct a large-scale study that investigates whether disentangled representations are more suitable for abstract reasoning tasks. Using two new tasks similar to Raven's Progressive Matrices, we evaluate the usefulness of the representations learned by 360 state-of-the-art unsupervised disentanglement models. Based on these representations, we train 3600 abstract reasoning models and observe that disentangled representations do in fact lead to better down-stream performance. In particular, they enable quicker learning using fewer samples.

artificial intelligence, machine learning, representation, (13 more...)

Neural Information Processing Systems

Country: North America > Canada (0.28)

Genre: Research Report (0.46)

Technology: