AITopics | merge

Collaborating Authors

merge

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Accelerating ERM for data-driven algorithm design using output-sensitive techniques

Neural Information Processing SystemsFeb-16-2026, 07:49:24 GMT

Data-driven algorithm design is a promising, learning-based approach for beyond worst-case analysis of algorithms with tunable parameters. An important open problem is the design of computationally efficient data-driven algorithms for combinatorial algorithm families with multiple parameters.

artificial intelligence, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > Florida > Palm Beach County > Boca Raton (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > France (0.04)

Genre: Research Report > Experimental Study (0.92)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.92)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.87)

Add feedback

Efficient Centroid-Linkage Clustering

Neural Information Processing SystemsFeb-14-2026, 01:05:57 GMT

We also evaluate our algorithm empirically. By leveraging a state-of-the-art nearest-neighbor search library, we obtain a fast and accurate Centroid-Linkage HAC algorithm.

artificial intelligence, data mining, machine learning, (18 more...)

Neural Information Processing Systems

Country:

North America > United States > Maryland > Prince George's County > College Park (0.04)
North America > United States > Nevada > Clark County > Las Vegas (0.04)
North America > United States > Massachusetts (0.04)
(2 more...)

Genre:

Research Report > Experimental Study (0.93)
Research Report > New Finding (0.67)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.93)

Add feedback

4f2accafe6fa355624f3ee42207cc7b8-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-8-2026, 21:29:32 GMT

A.1 DomainSpecificLanguage(DSL)Specifications Table 5 shows the domain-specific language (DSL) designed for E-MAPP in theOvercooked-v2 environment. Each convolutional layer has a kernel size of3except for the first one, which has a kernel sizeof5. The inventory statesinv is encoded by a three-layer MLP with hidden size 128 for all layers. The output goal featurefgoal is a640-dim feature vector.fgoal Name Value learningrate 3e-4 updatebatchsize 128 In cooperative settings, the goal input of the assistive agent is the leading agent's goal.

artificial intelligence, choppedtomato, machine learning, (17 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

10e6dfea9a673bef4a7b1cb9234891bc-Paper-Conference.pdf

Neural Information Processing SystemsFeb-8-2026, 01:57:33 GMT

The pretraining data of today's strongest language models is opaque; in particular,

large language model, machine learning, natural language, (22 more...)

Neural Information Processing Systems

Country:

Asia > Singapore (0.04)
Asia > Indonesia > Bali (0.04)
Asia > Thailand > Bangkok > Bangkok (0.04)
(7 more...)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.67)

Industry: Information Technology (0.46)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(2 more...)

Add feedback

Hierarchical topological clustering

Carpio, Ana, Duro, Gema

arXiv.org Machine LearningJan-6-2026

Topological methods have the potential of exploring data clouds without making assumptions on their the structure. Here we propose a hierarchical topological clustering algorithm that can be implemented with any distance choice. The persistence of outliers and clusters of arbitrary shape is inferred from the resulting hierarchy. We demonstrate the potential of the algorithm on selected datasets in which outliers play relevant roles, consisting of images, medical and economic data. These methods can provide meaningful clusters in situations in which other techniques fail to do so.

algorithm, artificial intelligence, machine learning, (17 more...)

arXiv.org Machine Learning

2601.00892

Country:

Europe (1.00)
North America > United States > California (0.28)

Genre: Research Report (0.50)

Industry: Health & Medicine > Therapeutic Area > Oncology (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Subquadratic High-Dimensional Hierarchical Clustering

Neural Information Processing SystemsDec-26-2025, 01:08:01 GMT

We consider the widely-used average-linkage, single-linkage, and Ward's methods for computing hierarchical clusterings of high-dimensional Euclidean inputs. It is easy to show that there is no efficient implementation of these algorithms in high dimensional Euclidean space since it implicitly requires to solve the closest pair problem, a notoriously difficult problem. However, how fast can these algorithms be implemented if we allow approximation? More precisely: these algorithms successively merge the clusters that are at closest average (for average-linkage), minimum distance (for single-linkage), or inducing the least sum-of-square error (for Ward's). We ask whether one could obtain a significant running-time improvement if the algorithm can merge $\gamma$-approximate closest clusters (namely, clusters that are at distance (average, minimum, or sum-of-square error) at most $\gamma$ times the distance of the closest clusters). We show that one can indeed take advantage of the relaxation and compute the approximate hierarchical clustering tree using $\widetilde{O}(n)$ $\gamma$-approximate nearest neighbor queries.

algorithm, name change, subquadratic high-dimensional hierarchical clustering, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.82)

Add feedback

Grow and Merge: A Unified Framework for Continuous Categories Discovery

Neural Information Processing SystemsDec-25-2025, 00:18:45 GMT

Although a number of studies are devoted to novel category discovery, most of them assume a static setting where both labeled and unlabeled data are given at once for finding new categories. In this work, we focus on the application scenarios where unlabeled data are continuously fed into the category discovery system. We refer to it as the {\bf Continuous Category Discovery} ({\bf CCD}) problem, which is significantly more challenging than the static setting. A common challenge faced by novel category discovery is that different sets of features are needed for classification and category discovery: class discriminative features are preferred for classification, while rich and diverse features are more suitable for new category mining. This challenge becomes more severe for dynamic setting as the system is asked to deliver good performance for known classes over time, and at the same time continuously discover new classes from unlabeled data. To address this challenge, we develop a framework of {\bf Grow and Merge} ({\bf GM}) that works by alternating between a growing phase and a merge phase: in the growing phase, it increases the diversity of features through a continuous self-supervised learning for effective category mining, and in the merging phase, it merges the grown model with a static one to ensure satisfying performance for known classes. Our extensive studies verify that the proposed GM framework is significantly more effective than the state-of-the-art approaches for continuous category discovery.

category discovery, continuous category discovery, unified framework, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Hierarchical Clustering With Confidence

Wu, Di, Bien, Jacob, Panigrahi, Snigdha

arXiv.org Machine LearningDec-9-2025

Agglomerative hierarchical clustering is one of the most widely used approaches for exploring how observations in a dataset relate to each other. However, its greedy nature makes it highly sensitive to small perturbations in the data, often producing different clustering results and making it difficult to separate genuine structure from spurious patterns. In this paper, we show how randomizing hierarchical clustering can be useful not just for measuring stability but also for designing valid hypothesis testing procedures based on the clustering results. We propose a simple randomization scheme together with a method for constructing a valid p-value at each node of the hierarchical clustering dendrogram that quantifies evidence against performing the greedy merge. Our test controls the Type I error rate, works with any hierarchical linkage without case-specific derivations, and simulations show it is substantially more powerful than existing selective inference approaches. To demonstrate the practical utility of our p-values, we develop an adaptive $α$-spending procedure that estimates the number of clusters, with a probabilistic guarantee on overestimation. Experiments on simulated and real data show that this estimate yields powerful clustering and can be used, for example, to assess clustering stability across multiple runs of the randomized algorithm.

algorithm, merge, procedure, (16 more...)

arXiv.org Machine Learning

2512.06522

Country:

North America > United States > California (0.14)
North America > United States > Michigan (0.04)
Antarctica (0.04)

Genre:

Research Report > New Finding (0.67)
Research Report > Experimental Study (0.58)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Teaching Old Tokenizers New Words: Efficient Tokenizer Adaptation for Pre-trained Models

Purason, Taido, Chizhov, Pavel, Yamshchikov, Ivan P., Fishel, Mark

arXiv.org Artificial IntelligenceDec-4-2025

Tokenizer adaptation plays an important role in transferring pre-trained language models to new domains or languages. In this work, we address two complementary aspects of this process: vocabulary extension and pruning. The common approach to extension trains a new tokenizer on domain-specific text and appends the tokens that do not overlap with the existing vocabulary, which often results in many tokens that are unreachable or never used. We propose continued BPE training, which adapts a pre-trained tokenizer by continuing the BPE merge learning process on new data. Experiments across multiple languages and model families show that this approach improves tokenization efficiency and leads to better utilization of added vocabulary. We also introduce leaf-based vocabulary pruning, which removes redundant tokens while preserving model quality. Together, these methods provide practical tools for controlled vocabulary modification, which we release as an open-source package.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2512.03989

Country: