AITopics

Collaborating Authors

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

RSN: Randomized Subspace Newton

Neural Information Processing SystemsMar-26-2025, 18:21:13 GMT

We develop a randomized Newton method capable of solving learning problems with huge dimensional feature spaces, which is a common setting in applications such as medical imaging, genomics and seismology. Our method leverages randomized sketching in a new way, by finding the Newton direction constrained to the space spanned by a random sketch. We develop a simple global linear convergence theory that holds for practically all sketching techniques, which gives the practitioners the freedom to design custom sketching approaches suitable for particular applications. We perform numerical experiments which demonstrate the efficiency of our method as compared to accelerated gradient descent and the full Newton method. Our method can be seen as a refinement and randomized extension of the results of Karimireddy, Stich, and Jaggi [18].

artificial intelligence, machine learning, newton method, (15 more...)

Neural Information Processing Systems

Country:

North America > United States (0.28)
Europe > Germany (0.28)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.35)

Add feedback

bc6dc48b743dc5d013b1abaebd2faed2-AuthorFeedback.pdf

Neural Information Processing SystemsMar-26-2025, 18:20:59 GMT

Dear reviewers, thank you for taking the time to review our paper. All issues raised are easy to address. We will incorporate all of your suggestions. First, they are simply different algorithms. We achieve this by entirely bypassing the theory of one shot sketches, showing it is not at all necessary. We will include this discussion in the paper.

artificial intelligence, assumption, machine learning, (15 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.32)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.30)

Add feedback

Insights on representational similarity in neural networks with canonical correlation

Ari Morcos, Maithra Raghu, Samy Bengio

Neural Information Processing SystemsMar-26-2025, 18:20:49 GMT

Comparing different neural network representations and determining how representations evolve over time remain challenging open questions in our understanding of the function of neural networks. Comparing representations in neural networks is fundamentally difficult as the structure of representations varies greatly, even across groups of networks trained on identical tasks, and over the course of training. Here, we develop projection weighted CCA (Canonical Correlation Analysis) as a tool for understanding neural networks, building off of SVCCA, a recently proposed method [22]. We first improve the core method, showing how to differentiate between signal and noise, and then apply this technique to compare across a group of CNNs, demonstrating that networks which generalize converge to more similar representations than networks which memorize, that wider networks converge to more similar solutions than narrow networks, and that trained networks with identical topology but different learning rates converge to distinct clusters with diverse representations. We also investigate the representational dynamics of RNNs, across both training and sequential timesteps, finding that RNNs converge in a bottom-up pattern over the course of training and that the hidden state is highly variable over the course of a sequence, even when accounting for linear transforms. Together, these results provide new insights into the function of CNNs and RNNs, and demonstrate the utility of using CCA to understand representations.

artificial intelligence, machine learning, representation, (17 more...)

Neural Information Processing Systems

Genre: Research Report (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Add feedback

Unifying Homophily and Heterophily for Spectral Graph Neural Networks via Triple Filter Ensembles

Neural Information Processing SystemsMar-26-2025, 18:20:35 GMT

Polynomial-based learnable spectral graph neural networks (GNNs) utilize polynomial to approximate graph convolutions and have achieved impressive performance on graphs. Nevertheless, there are three progressive problems to be solved. Some models use polynomials with better approximation for approximating filters, yet perform worse on real-world graphs. Carefully crafted graph learning methods, sophisticated polynomial approximations, and refined coefficient constraints leaded to overfitting, which diminishes the generalization of the models. How to design a model that retains the ability of polynomial-based spectral GNNs to approximate filters while it possesses higher generalization and performance? In this paper, we propose a spectral GNN with triple filter ensemble (TFE-GNN), which extracts homophily and heterophily from graphs with different levels of homophily adaptively while utilizing the initial features.

artificial intelligence, machine learning, tfe-gnn, (15 more...)

Neural Information Processing Systems

Country:

Asia > China (0.28)
North America > United States (0.28)

Genre: Research Report > Experimental Study (0.93)

Industry: Information Technology (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

A Projection-free Algorithm for Constrained Stochastic Multi-level Composition Optimization

Neural Information Processing SystemsMar-26-2025, 18:20:24 GMT

We propose a projection-free conditional gradient-type algorithm for smooth stochastic multi-level composition optimization, where the objective function is a nested composition of T functions and the constraint set is a closed convex set. Our algorithm assumes access to noisy evaluations of the functions and their gradients, through a stochastic first-order oracle satisfying certain standard unbiasedness and second-moment assumptions.

algorithm, artificial intelligence, machine learning, (12 more...)

Neural Information Processing Systems

Country: North America > United States (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.70)

Add feedback

Normalization Layer Per-Example Gradients are Sufficient to Predict Gradient Noise Scale in Transformers

Neural Information Processing SystemsMar-26-2025, 18:20:13 GMT

Per-example gradient norms are a vital ingredient for estimating gradient noise scale (GNS) with minimal variance. Observing the tensor contractions required to compute them, we propose a method with minimal FLOPs in 3D or greater tensor regimes by simultaneously computing the norms while computing the parameter gradients. Using this method we are able to observe the GNS of different layers at higher accuracy than previously possible. We find that the total GNS of contemporary transformer models is predicted well by the GNS of only the normalization layers. As a result, focusing only on the normalization layer, we develop a custom kernel to compute the per-example gradient norms while performing the Layer-Norm backward pass with zero throughput overhead. Tracking GNS on only those layers, we are able to guide a practical batch size schedule that reduces training time by 18% on a Chinchilla-optimal language model.

large language model, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country: North America > Canada > Ontario (0.14)

Genre: