AITopics

Plotting

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Patching open-vocabulary models by interpolating weights Gabriel Ilharco 1 Mitchell Wortsman 1 Shuran Song

Neural Information Processing SystemsMar-27-2025, 14:36:46 GMT

Open-vocabulary models like CLIP achieve high accuracy across many image classification tasks. However, there are still settings where their zero-shot performance is far from optimal. We study model patching, where the goal is to improve accuracy on specific tasks without degrading accuracy on tasks where performance is already adequate. Towards this goal, we introduce PAINT, a patching method that uses interpolations between the weights of a model before fine-tuning and the weights after fine-tuning on a task to be patched. On nine tasks where zeroshot CLIP performs poorly, PAINT increases accuracy by 15 to 60 percentage points while preserving accuracy on ImageNet within one percentage point of the zero-shot model. PAINT also allows a single model to be patched on multiple tasks and improves with model scale. Furthermore, we identify cases of broad transfer, where patching on one task increases accuracy on other tasks even when the tasks have disjoint classes. Finally, we investigate applications beyond common benchmarks such as counting or reducing the impact of typographic attacks on CLIP. Our findings demonstrate that it is possible to expand the set of tasks on which open-vocabulary models achieve high accuracy without re-training them from scratch.

accuracy, machine learning, natural language, (14 more...)

Neural Information Processing Systems

Country: North America (0.28)

Genre: Research Report > New Finding (1.00)

Industry:

Government (0.46)
Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)

Add feedback

Appendix for Back Razor: Memory-Efficient Transfer Learning by Self-Sparsified Backpropagation

Neural Information Processing SystemsMar-27-2025, 14:36:38 GMT

A.1 Assumptions We provide grounds for assumptions in this section. Assumption 1 is a fine-grained version of the standard Lipschitz smoothness, i.e., We follow [1] to make this assumption. Assumption 2 combines the assumption of the unbiased gradient and bounded variance, which is also standard in literature. This is because 1. z All the reported results are the top1 accuracy (%). The memory footprint of the training at a batch size of 128 (compute in CIFAR100) is reported in the second column.

artificial intelligence, back razor, machine learning, (13 more...)

Neural Information Processing Systems

Country: North America > United States > Texas (0.30)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Backpropagation (0.41)
Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (0.41)

Add feedback

Back Razor: Memory-Efficient Transfer Learning by Self-Sparsified Backpropagation

Neural Information Processing SystemsMar-27-2025, 14:36:35 GMT

Transfer learning from the model trained on large datasets to customized downstream tasks has been widely used as the pre-trained model can greatly boost the generalizability. However, the increasing sizes of pre-trained models also lead to a prohibitively large memory footprints for downstream transferring, making them unaffordable for personal devices. Previous work recognizes the bottleneck of the footprint to be the activation, and hence proposes various solutions such as injecting specific lite modules. In this work, we present a novel memory-efficient transfer framework called Back Razor, that can be plug-and-play applied to any pre-trained network without changing its architecture. The key idea of Back Razor is asymmetric sparsifying: pruning the activation stored for back-propagation, while keeping the forward activation dense. It is based on the observation that the stored activation, that dominates the memory footprint, is only needed for backpropagation. Such asymmetric pruning avoids affecting the precision of forward computation, thus making more aggressive pruning possible. Furthermore, we conduct the theoretical analysis for the convergence rate of Back Razor, showing that under mild conditions, our method retains the similar convergence rate as vanilla SGD. Extensive transfer learning experiments on both Convolutional Neural Networks and Vision Transformers with classification, dense prediction, and language modeling tasks show that Back Razor could yield up to 97% sparsity, saving 9.2x memory usage, without losing accuracy.

artificial intelligence, back razor, machine learning, (15 more...)

Neural Information Processing Systems

Country: North America > United States > Texas (0.28)

Genre: Research Report (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Treatment of Statistical Estimation Problems in Randomized Smoothing for Adversarial Robustness

Neural Information Processing SystemsMar-27-2025, 14:36:28 GMT

Randomized smoothing is a popular certified defense against adversarial attacks.

artificial intelligence, confidence interval, machine learning, (18 more...)

Neural Information Processing Systems

Country: Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.14)

Genre: Research Report > Experimental Study (0.93)

Industry:

Information Technology > Security & Privacy (0.34)
Government (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

Ex Uno Pluria: Insights on Ensembling in Low Precision Number Systems

Neural Information Processing SystemsMar-27-2025, 14:34:06 GMT

While ensembling deep neural networks has shown promise in improving generalization performance, scaling current ensemble methods for large models remains challenging. Given that recent progress in deep learning is largely driven by the scale, exemplified by the widespread adoption of large-scale neural network architectures, scalability emerges an increasingly critical issue for machine learning algorithms in the era of large-scale models. In this work, we first showcase the potential of low precision ensembling, where ensemble members are derived from a single model within low precision number systems in a training-free manner. Our empirical analysis demonstrates the effectiveness of our proposed low precision ensembling method compared to existing ensemble approaches.

artificial intelligence, bayesian inference, machine learning, (15 more...)

Neural Information Processing Systems

Country: Asia > South Korea (0.14)

Genre:

Research Report > New Finding (0.68)
Research Report > Experimental Study (0.46)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)

Add feedback

Large Language Models Play StarCraft II: Benchmarks and A Chain of Summarization Approach

Neural Information Processing SystemsMar-27-2025, 14:33:45 GMT

With the continued advancement of Large Language Models (LLMs) Agents in reasoning, planning, and decision-making, benchmarks have become crucial in evaluating these skills. However, there is a notable gap in benchmarks for real-time strategic decision-making. StarCraft II (SC2), with its complex and dynamic nature, serves as an ideal setting for such evaluations. To this end, we have developed TextStarCraft II, a specialized environment for assessing LLMs in real-time strategic scenarios within SC2. Addressing the limitations of traditional Chain of Thought (CoT) methods, we introduce the Chain of Summarization (CoS) method, enhancing LLMs' capabilities in rapid and effective decision-making. Our key experiments included: 1. LLM Evaluation: Tested 10 LLMs in TextStarCraft II, most of them defeating LV5 build-in AI, showcasing effective strategy skills.

large language model, machine learning, natural language, (21 more...)

Neural Information Processing Systems

Country:

Europe > Sweden (0.14)
Asia > South Korea (0.14)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Overview (0.67)

Industry:

Leisure & Entertainment > Sports (1.00)
Leisure & Entertainment > Games > Computer Games (1.00)
Information Technology (1.00)
Government > Military (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Add feedback

TD Convergence: An Optimization Perspective

Neural Information Processing SystemsMar-27-2025, 14:33:34 GMT

We study the convergence behavior of the celebrated temporal-difference (TD) learning algorithm. By looking at the algorithm through the lens of optimization, we first argue that TD can be viewed as an iterative optimization algorithm where the function to be minimized changes per iteration. By carefully investigating the divergence displayed by TD on a classical counter example, we identify two forces that determine the convergent or divergent behavior of the algorithm. We next formalize our discovery in the linear TD setting with quadratic loss and prove that convergence of TD hinges on the interplay between these two forces. We extend this optimization perspective to prove convergence of TD in a much broader setting than just linear approximation and squared loss. Our results provide a theoretical explanation for the successful application of TD in reinforcement learning.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Discrete Flow Matching

Neural Information Processing SystemsMar-27-2025, 14:33:29 GMT

Despite Flow Matching and diffusion models having emerged as powerful generative paradigms for continuous variables such as images and videos, their application to high-dimensional discrete data, such as language, is still limited. In this work, we present Discrete Flow Matching, a novel discrete flow paradigm designed specifically for generating discrete data. Discrete Flow Matching offers several key contributions: (i) it works with a general family of probability paths interpolating between source and target distributions; (ii) it allows for a generic formula for sampling from these probability paths using learned posteriors such as the probability denoiser (x-prediction) and noise-prediction (ϵ-prediction); (iii) practically, focusing on specific probability paths defined with different schedulers improves generative perplexity compared to previous discrete diffusion and flow models; and (iv) by scaling Discrete Flow Matching models up to 1.7B parameters, we reach 6.7% Pass@1 and 13.4% Pass@10 on HumanEval and 6.7% Pass@1 and 20.6% Pass@10 on 1-shot MBPP coding benchmarks. Our approach is capable of generating high-quality discrete data in a non-autoregressive fashion, significantly closing the gap between autoregressive models and discrete flow models.

arxiv preprint arxiv, large language model, machine learning, (18 more...)

Neural Information Processing Systems

Country:

Asia > North Korea (0.93)
North America > United States > Ohio (0.14)

Genre: Research Report > New Finding (0.46)

Industry:

Education (0.92)
Government > Regional Government > Asia Government > North Korea Government (0.46)
Government > Regional Government > North America Government > United States Government (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Pile of Law: Learning Responsible Data Filtering from the Law and a 256GB Open-Source Legal Dataset

Neural Information Processing SystemsMar-27-2025, 14:33:21 GMT

Emerging ethical approaches have attempted to filter pretraining material, but such approaches have been ad hoc and failed to take context into account. We offer an approach to filtering grounded in law, which has directly addressed the tradeoffs in filtering material. First, we gather and make available the Pile of Law, a 256GB (and growing) dataset of open-source English-language legal and administrative data, covering court opinions, contracts, administrative rules, and legislative records. Pretraining on the Pile of Law may help with legal tasks that have the promise to improve access to justice. Second, we distill the legal norms that governments have developed to constrain the inclusion of toxic or private content into actionable lessons for researchers and discuss how our dataset reflects these norms. Third, we show how the Pile of Law offers researchers the opportunity to learn such filtering rules directly from the data, providing an exciting new research direction in model-based processing. Warning: this paper contains quotations that may be offensive or upsetting.

data mining, machine learning, natural language, (17 more...)

Neural Information Processing Systems

Country:

North America > United States (1.00)
Europe (1.00)
Asia (0.92)
North America > Canada (0.68)

Genre:

Research Report > New Finding (0.67)
Research Report > Experimental Study (0.67)

Industry:

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Information Management (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(6 more...)

Add feedback

Filters

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Patching open-vocabulary models by interpolating weights Gabriel Ilharco 1 Mitchell Wortsman 1 Shuran Song

Appendix for Back Razor: Memory-Efficient Transfer Learning by Self-Sparsified Backpropagation

Back Razor: Memory-Efficient Transfer Learning by Self-Sparsified Backpropagation

Treatment of Statistical Estimation Problems in Randomized Smoothing for Adversarial Robustness

Ex Uno Pluria: Insights on Ensembling in Low Precision Number Systems

bc3a68a20e5c8ba5cbefc1ecf74bfaaa-Paper-Datasets_and_Benchmarks.pdf

Large Language Models Play StarCraft II: Benchmarks and A Chain of Summarization Approach

TD Convergence: An Optimization Perspective

Discrete Flow Matching

Pile of Law: Learning Responsible Data Filtering from the Law and a 256GB Open-Source Legal Dataset