AITopics

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Learning rule influences recurrent network representations but not attractor structure in decision-making tasks

Neural Information Processing SystemsJan-18-2025, 21:03:46 GMT

Recurrent neural networks (RNNs) are popular tools for studying computational dynamics in neurobiological circuits. However, due to the dizzying array of design choices, it is unclear if computational dynamics unearthed from RNNs provide reliable neurobiological inferences. Understanding the effects of design choices on RNN computation is valuable in two ways. First, invariant properties that persist in RNNs across a wide range of design choices are more likely to be candidate neurobiological mechanisms. Second, understanding what design choices lead to similar dynamical solutions reduces the burden of imposing that all design choices be totally faithful replications of biology.

decision-making task, learning rule influence, rule influence recurrent network representation, (6 more...)

Neural Information Processing Systems

Industry: Health & Medicine > Therapeutic Area > Neurology (0.91)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.62)

Add feedback

Mirror Descent Maximizes Generalized Margin and Can Be Implemented Efficiently

Neural Information Processing SystemsJan-18-2025, 21:03:42 GMT

Driven by the empirical success and wide use of deep neural networks, understanding the generalization performance of overparameterized models has become an increasingly popular question. To this end, there has been substantial effort to characterize the implicit bias of the optimization algorithms used, such as gradient descent (GD), and the structural properties of their preferred solutions. This paper answers an open question in this literature: For the classification setting, what solution does mirror descent (MD) converge to? Specifically, motivated by its efficient implementation, we consider the family of mirror descent algorithms with potential function chosen as the p -th power of the \ell_p -norm, which is an important generalization of GD. We call this algorithm p - \textsf{GD} . For this family, we characterize the solutions it obtains and show that it converges in direction to a generalized maximum-margin solution with respect to the \ell_p -norm for linearly separable classification.

generalization performance, implemented efficiently, mirror descent maximize generalized margin, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.50)

Add feedback

Actively Testing Your Model While It Learns: Realizing Label-Efficient Learning in Practice

Neural Information Processing SystemsJan-18-2025, 21:03:39 GMT

In active learning (AL), we focus on reducing the data annotation cost from the model training perspective. However, "testing'', which often refers to the model evaluation process of using empirical risk to estimate the intractable true generalization risk, also requires data annotations. The annotation cost for "testing'' (model evaluation) is under-explored. Even in works that study active model evaluation or active testing (AT), the learning and testing ends are disconnected. In this paper, we propose a novel active testing while learning (ATL) framework that integrates active learning with active testing.

annotation cost, model evaluation, realizing label-efficient learning, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.93)

Add feedback

Optimal Algorithms for Decentralized Stochastic Variational Inequalities

Neural Information Processing SystemsJan-18-2025, 21:03:36 GMT

Variational inequalities are a formalism that includes games, minimization, saddle point, and equilibrium problems as special cases. Methods for variational inequalities are therefore universal approaches for many applied tasks, including machine learning problems. This work concentrates on the decentralized setting, which is increasingly important but not well understood. In particular, we consider decentralized stochastic (sum-type) variational inequalities over fixed and time-varying networks. We present lower complexity bounds for both communication and local iterations and construct optimal algorithms that match these lower bounds.

decentralized stochastic variational inequality, optimal algorithm, stochastic case

Neural Information Processing Systems

Industry: Education > Focused Education > Special Education (0.31)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.88)

Add feedback

A Theoretical Understanding of Gradient Bias in Meta-Reinforcement Learning

Neural Information Processing SystemsJan-18-2025, 21:03:28 GMT

Gradient-based Meta-RL (GMRL) refers to methods that maintain two-level optimisation procedures wherein the outer-loop meta-learner guides the inner-loop gradient-based reinforcement learner to achieve fast adaptations. In this paper, we develop a unified framework that describes variations of GMRL algorithms and points out that existing stochastic meta-gradient estimators adopted by GMRL are actually \textbf{biased}. We study tabular MDPs empirically and offer quantitative evidence that testifies our theoretical findings on existing stochastic meta-gradient estimators. Furthermore, we conduct experiments on Iterated Prisoner's Dilemma and Atari games to show how other methods such as off-policy learning and low-bias estimator can help fix the gradient bias for GMRL algorithms in general.

gradient bias, meta-reinforcement learning, stochastic meta-gradient estimator, (3 more...)

Neural Information Processing Systems

Industry: Leisure & Entertainment > Games > Computer Games (0.61)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.40)

Add feedback

Practical and Asymptotically Exact Conditional Sampling in Diffusion Models

Neural Information Processing SystemsJan-18-2025, 21:03:25 GMT

Diffusion models have been successful on a range of conditional generation tasks including molecular design and text-to-image generation. However, these achievements have primarily depended on task-specific conditional training or error-prone heuristic approximations. Ideally, a conditional generation method should provide exact samples for a broad range of conditional distributions without requiring task-specific training. To this end, we introduce the Twisted Diffusion Sampler, or TDS. TDS is a sequential Monte Carlo (SMC) algorithm that targets the conditional distributions of diffusion models through simulating a set of weighted particles.

asymptotically exact conditional sampling, diffusion model, practical and asymptotically, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Certifying Some Distributional Fairness with Subpopulation Decomposition

Neural Information Processing SystemsJan-18-2025, 21:03:21 GMT

Extensive efforts have been made to understand and improve the fairness of machine learning models based on observational metrics, especially in high-stakes domains such as medical insurance, education, and hiring decisions. However, there is a lack of certified fairness considering the end-to-end performance of an ML model. In this paper, we first formulate the certified fairness of an ML model trained on a given data distribution as an optimization problem based on the model performance loss bound on a fairness constrained distribution, which is within bounded distributional distance with the training distribution. We then propose a general fairness certification framework and instantiate it for both sensitive shifting and general shifting scenarios. In particular, we propose to solve the optimization problem by decomposing the original data distribution into analytical subpopulations and proving the convexity of the subproblems to solve them.

distributional fairness, fairness, subpopulation decomposition, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Learning Viewpoint-Agnostic Visual Representations by Recovering Tokens in 3D Space

Neural Information Processing SystemsJan-18-2025, 21:03:14 GMT

Humans are remarkably flexible in understanding viewpoint changes due to visual cortex supporting the perception of 3D structure. In contrast, most of the computer vision models that learn visual representation from a pool of 2D images often fail to generalize over novel camera viewpoints. Recently, the vision architectures have shifted towards convolution-free architectures, visual Transformers, which operate on tokens derived from image patches. However, these Transformers do not perform explicit operations to learn viewpoint-agnostic representation for visual understanding. To this end, we propose a 3D Token Representation Layer (3DTRL) that estimates the 3D positional information of the visual tokens and leverages it for learning viewpoint-agnostic representations.

learning viewpoint-agnostic visual representation, transformer, viewpoint-agnostic representation, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Vision (1.00)

Add feedback

Three Towers: Flexible Contrastive Learning with Pretrained Image Models

Neural Information Processing SystemsJan-18-2025, 21:03:11 GMT

We introduce Three Towers (3T), a flexible method to improve the contrastive learning of vision-language models by incorporating pretrained image classifiers. While contrastive models are usually trained from scratch, LiT (Zhai et al., 2022) has recently shown performance gains from using pretrained classifier embeddings. However, LiT directly replaces the image tower with the frozen embeddings, excluding any potential benefits from training the image tower contrastively. With 3T, we propose a more flexible strategy that allows the image tower to benefit from both pretrained embeddings and contrastive training. To achieve this, we introduce a third tower that contains the frozen pretrained embeddings, and we encourage alignment between this third tower and the main image-text towers. Empirically, 3T consistently improves over LiT and the CLIP-style from-scratch baseline for retrieval tasks.

flexible contrastive learning, from-scratch baseline, pretrained image model, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.80)

Add feedback

A Comprehensively Tight Analysis of Gradient Descent for PCA

Neural Information Processing SystemsJan-18-2025, 21:03:04 GMT

We study the Riemannian gradient method for PCA on which a crucial fact is that despite the simplicity of the considered setting, i.e., deterministic version of Krasulina's method, the convergence rate has not been well-understood yet. In this work, we provide a general tight analysis for the gap-dependent rate at O(\frac{1}{\Delta}\log\frac{1}{\epsilon}) that holds for any real symmetric matrix. More importantly, when the gap \Delta is significantly smaller than the target accuracy \epsilon on the objective sub-optimality of the final solution, the rate of this type is actually not tight any more, which calls for a worst-case rate. We further give the first worst-case analysis that achieves a rate of convergence at O(\frac{1}{\epsilon}\log\frac{1}{\epsilon}) . Particularly, our gap-dependent analysis suggests a new promising learning rate for stochastic variance reduced PCA algorithms.

comprehensively tight analysis, frac, gradient descent, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.40)

Add feedback