AITopics

Country: North America > United States (1.00)

Genre: Research Report > New Finding (0.46)

Industry: Information Technology (0.67)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Neural Information Processing SystemsFeb-10-2026, 07:45:31 GMT

81b8390039b7302c909cb769f8b6cd93-Paper-Conference.pdf

Clustering models constitute aclass of unsupervised machine learning methods which are used in a number of application pipelines, and play a vital role in moderndatascience.

artificial intelligence, clustering, machine learning, (17 more...)

Country:

North America > United States (0.04)
North America > Canada > Ontario > Toronto (0.04)

Genre: Research Report (0.46)

Technology:

Information Technology > Data Science (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Neural Information Processing SystemsFeb-8-2026, 00:24:11 GMT

31fefc0e570cb3860f2a6d4b38c6490d-AuthorFeedback.pdf

dataset, nmi, variant, (11 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.71)

arXiv.org Artificial IntelligenceNov-24-2025

A Hybrid Computational Intelligence Framework for scRNA-seq Imputation: Integrating scRecover and Random Forests

Anaissi, Ali, Liu, Deshao, Jia, Yuanzhe, Huang, Weidong, Alyassine, Widad, Akram, Junaid

Single-cell RNA sequencing (scRNA-seq) enables transcrip-tomic profiling at cellular resolution but suffers from pervasive dropout events that obscure biological signals. We present SCR-MF, a modular two-stage workflow that combines principled dropout detection using scRecover with robust non-parametric imputation via missForest. Across public and simulated datasets, SCR-MF achieves robust and interpretable performance comparable to or exceeding existing imputation methods in most cases, while preserving biological fidelity and transparency. Runtime analysis demonstrates that SCR-MF provides a competitive balance between accuracy and computational efficiency, making it suitable for mid-scale single-cell datasets.

artificial intelligence, imputation, machine learning, (15 more...)

2511.16923

Country:

Asia (0.28)
Oceania > Australia (0.15)

Genre: Research Report > New Finding (0.68)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.94)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

arXiv.org Artificial IntelligenceNov-6-2025

Layer Importance for Mathematical Reasoning is Forged in Pre-Training and Invariant after Post-Training

Nepal, Aadim, Shrestha, Safal, Shrestha, Anubhav, Kim, Minwu, Naghiyev, Jalal, Shwartz-Ziv, Ravid, Ross, Keith

Large language models improve at math after instruction tuning, reinforcement learning, or knowledge distillation. We ask whether these gains come from major changes in the transformer layers or from smaller adjustments that keep the original structure. Using layer-wise ablation on base and trained variants, we find that math reasoning depends on a few critical layers, which stay important across all post-training methods. Removing these layers reduces math accuracy by as much as 80%, whereas factual recall tasks only show relatively smaller drops. This suggests that specialized layers for mathematical tasks form during pre-training and remain stable afterward. As measured by Normalized Mutual Information (NMI), we find that near these critical layers, tokens drift from their original syntactic clusters toward representations aligned with tokens less syntactically related but potentially more useful for downstream task.

critical layer, large language model, machine learning, (14 more...)

2506.22638

Country:

North America > United States (0.68)
Europe (0.68)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.91)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.70)

Sampaio, Phillipe R., Maxcici, Helene

Unsupervised Document and Template Clustering using Multimodal Embeddings

arXiv.org Artificial IntelligenceOct-28-2025

We study unsupervised clustering of documents at both the category and template levels using frozen multimodal encoders and classical clustering algorithms. We systematize a model-agnostic pipeline that (i) projects heterogeneous last-layer states from text-layout-vision encoders into token-type-aware document vectors and (ii) performs clustering with centroid- or density-based methods, including an HDBSCAN + $k$-NN assignment to eliminate unlabeled points. We evaluate eight encoders (text-only, layout-aware, vision-only, and vision-language) with $k$-Means, DBSCAN, HDBSCAN + $k$-NN, and BIRCH on five corpora spanning clean synthetic invoices, their heavily degraded print-and-scan counterparts, scanned receipts, and real identity and certificate documents. The study reveals modality-specific failure modes and a robustness-accuracy trade-off, with vision features nearly solving template discovery on clean pages while text dominates under covariate shift, and fused encoders offering the best balance. We detail a reproducible, oracle-free tuning protocol and the curated evaluation settings to guide future work on unsupervised document organization.

data mining, machine learning, natural language, (19 more...)

2506.12116

Country: Europe > France (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(3 more...)

Neural Information Processing SystemsOct-2-2025, 14:51:32 GMT

31fefc0e570cb3860f2a6d4b38c6490d-AuthorFeedback.pdf

dataset, nmi, variant, (11 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.71)

Neural Information Processing SystemsOct-2-2025, 04:02:38 GMT

Reply to Reviewer # 1

Q1: What other ways to generate fake sequences may be suitable for this problem? A1: That is a good question. GAN to generate some more difficult fake sequences to further improve the ability of the encoder. Q1: Comparison with other state-of-the-art deep clustering methods which are not designed for time-series. A1: Following your suggestion, we compare our method with two state-of-the-art deep clustering methods (i.e., DEC (Xie et al., Table 1: Comparisons on 36 time series datasets (The No. of datasets is consistent with the one in Table 2 in main text)Dataset DEC(RI) IDEC(RI) DTCR(RI) DTCR(NMI) DTCR(ACC) Dataset DEC(RI) IDEC(RI) DTCR(RI) DTCR(NMI) DTCR(ACC)1 0.5817 0.6210 0.6868(0.0026)

dataset, reviewer, table 1, (15 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.70)

Chen, Chenggang, Yang, Zhiyu

No Free Lunch from Audio Pretraining in Bioacoustics: A Benchmark Study of Embeddings

arXiv.org Artificial IntelligenceAug-15-2025

Bioacoustics, the study of animal sounds, offers a non-invasive method to monitor ecosystems. Extracting embeddings from audio-pretrained deep learning (DL) models without fine-tuning has become popular for obtaining bioacoustic features for tasks. However, a recent benchmark study reveals that while fine-tuned audio-pretrained VGG and transformer models achieve state-of-the-art performance in some tasks, they fail in others. This study benchmarks 11 DL models on the same tasks by reducing their learned embeddings' dimensionality and evaluating them through clustering. We found that audio-pretrained DL models 1) without fine-tuning even underperform fine-tuned AlexNet, 2) both with and without fine-tuning fail to separate the background from labeled sounds, but ResNet does, and 3) outperform other models when fewer background sounds are included during fine-tuning. This study underscores the necessity of fine-tuning audio-pretrained models and checking the embeddings after fine-tuning. Our codes are available: https://github.com/NeuroscienceAI/Audio\_Embeddings

artificial intelligence, deep learning, machine learning, (15 more...)

2508.1023

Country: North America > United States (0.14)

Genre: Research Report (1.00)

Industry: Health & Medicine (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)