AITopics

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.67)

Technology:

Information Technology > Hardware (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Neural Information Processing SystemsJun-14-2026, 09:41:27 GMT

Deep Taxonomic Networks for Unsupervised Hierarchical Prototype Discovery

Inspired by the human ability to learn and organize knowledge into hierarchical taxonomies with prototypes, this paper addresses key limitations in current deep hierarchical clustering methods. Existing methods often tie the structure to the number of classes and underutilize the rich prototype information available at intermediate hierarchical levels. We introduce deep taxonomic networks, a novel deep latent variable approach designed to bridge these gaps. Our method optimizes a large latent taxonomic hierarchy, specifically a complete binary tree structured mixture-of-Gaussian prior within a variational inference framework, to automatically discover taxonomic structures and associated prototype clusters directly from unlabeled data without assuming true label sizes. We analytically show that optimizing the ELBO of our method encourages the discovery of hierarchical relationships among prototypes. Empirically, our learned models demonstrate strong hierarchical clustering performance, outperforming baselines across diverse image classification datasets using our novel evaluation mechanism that leverages prototype clusters discovered at all hierarchical levels. Qualitative results further reveal that deep taxonomic networks discover rich and interpretable hierarchical taxonomies, capturing both coarse-grained semantic categories and fine-grained visual distinctions.

artificial intelligence, deep learning, machine learning, (18 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Transportation > Passenger (0.67)
Transportation > Ground > Road (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Neural Information Processing SystemsFeb-11-2026, 02:53:53 GMT

f4e3ce3e7b581ff32e40968298ba013d-Supplemental.pdf

We also discuss other relevant hyper-parameters. Finally, we provide additional results from the data domaindifferentthanimages. BackgroundandSetting For the convenience of the reader, we restate our notation here. We define two kinds of purity ofA on Sn. We first develop some notation. LetVd be the volume of the unit d-dimensional ball.

artificial intelligence, machine learning, probability, (17 more...)

Country: North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.47)

Neural Information Processing SystemsFeb-9-2026, 11:37:57 GMT

98dce83da57b0395e163467c9dae521b-Supplemental.pdf

caption, main paper, purity, (16 more...)

Country: North America > Canada (0.05)

Technology: Information Technology > Artificial Intelligence (1.00)

WIREDJan-8-2026, 10:00:00 GMT

Is Craigslist the Last Real Place on the Internet?

Is Craigslist the Last Real Place on the Internet? Millennials are still using Craigslist to find jobs, find love, and even to cast creative projects--eschewing other AIand algorithm-dominated online spaces. The writer and comedian Megan Koester got her first writing job, reviewing internet pornography, from a Craigslist ad she responded to more than 15 years ago. Several years after that, she used the listings website to find the rent-controlled apartment where she still lives today. When she wanted to buy property, she scrolled through Craigslist and found a parcel of land in the Mojave Desert.

craigslist, koester, real place, (15 more...)

WIRED

Country:

Oceania > Australia (0.05)
Pacific Ocean > North Pacific Ocean > San Francisco Bay (0.04)
North America > United States > Pennsylvania (0.04)
(5 more...)

Industry:

Media (1.00)
Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence (1.00)

Nguyen, Manh, Pimentel-Alarcón, Daniel

Nonnegative Matrix Factorization through Cone Collapse

arXiv.org Artificial IntelligenceDec-10-2025

Nonnegative matrix factorization (NMF) is a widely used tool for learning parts-based, low-dimensional representations of nonnegative data, with applications in vision, text, and bioinformatics. In clustering applications, orthogonal NMF (ONMF) variants further impose (approximate) orthogonality on the representation matrix so that its rows behave like soft cluster indicators. Existing algorithms, however, are typically derived from optimization viewpoints and do not explicitly exploit the conic geometry induced by NMF: data points lie in a convex cone whose extreme rays encode fundamental directions or "topics". In this work we revisit NMF from this geometric perspective and propose Cone Collapse, an algorithm that starts from the full nonnegative orthant and iteratively shrinks it toward the minimal cone generated by the data. We prove that, under mild assumptions on the data, Cone Collapse terminates in finitely many steps and recovers the minimal generating cone of $\mathbf{X}^\top$ . Building on this basis, we then derive a cone-aware orthogonal NMF model (CC-NMF) by applying uni-orthogonal NMF to the recovered extreme rays. Across 16 benchmark gene-expression, text, and image datasets, CC-NMF consistently matches or outperforms strong NMF baselines-including multiplicative updates, ANLS, projective NMF, ONMF, and sparse NMF-in terms of clustering purity. These results demonstrate that explicitly recovering the data cone can yield both theoretically grounded and empirically strong NMF-based clustering methods.

artificial intelligence, cone, machine learning, (15 more...)

2512.07879

Country: North America > United States > Wisconsin (0.14)

Genre: Research Report (0.84)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.35)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.34)

Rair, Nisrine, Goupil, Alban, Vrabie, Valeriu, Chochoy, Emmanuel

When Annotators Disagree, Topology Explains: Mapper, a Topological Tool for Exploring Text Embedding Geometry and Ambiguity

arXiv.org Artificial IntelligenceNov-5-2025

Language models are often evaluated with scalar metrics like accuracy, but such measures fail to capture how models internally represent ambiguity, especially when human annotators disagree. We propose a topological perspective to analyze how fine-tuned models encode ambiguity and more generally instances. Applied to RoBERTa-Large on the MD-Offense dataset, Mapper, a tool from topological data analysis, reveals that fine-tuning restructures embedding space into modular, non-convex regions aligned with model predictions, even for highly ambiguous cases. Over $98\%$ of connected components exhibit $\geq 90\%$ prediction purity, yet alignment with ground-truth labels drops in ambiguous data, surfacing a hidden tension between structural confidence and label uncertainty. Unlike traditional tools such as PCA or UMAP, Mapper captures this geometry directly uncovering decision regions, boundary collapses, and overconfident clusters. Our findings position Mapper as a powerful diagnostic tool for understanding how models resolve ambiguity. Beyond visualization, it also enables topological metrics that may inform proactive modeling strategies in subjective NLP tasks.

artificial intelligence, machine learning, natural language, (16 more...)

2510.17548

Country: North America > United States > Minnesota (0.28)

Genre: Research Report > New Finding (0.66)

Industry:

Health & Medicine (0.46)
Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.68)

Mendonca, Margarida, Figueira, Alvaro

Modeling Political Discourse with Sentence-BERT and BERTopic

arXiv.org Artificial IntelligenceOct-28-2025

Social media has reshaped political discourse, offering politicians a platform for direct engagement while reinforcing polarization and ideological divides. This study introduces a novel topic evolution framework that integrates BERTopic-based topic modeling with Moral Foundations Theory (MFT) to analyze the longevity and moral dimensions of political topics in Twitter activity during the 117th U.S. Congress. We propose a methodology for tracking dynamic topic shifts over time and measuring their association with moral values and quantifying topic persistence. Our findings reveal that while overarching themes remain stable, granular topics tend to dissolve rapidly, limiting their long-term influence. Moreover, moral foundations play a critical role in topic longevity, with Care and Loyalty dominating durable topics, while partisan differences manifest in distinct moral framing strategies. This work contributes to the field of social network analysis and computational political discourse by offering a scalable, interpretable approach to understanding moral-driven topic evolution on social media.

longevity, machine learning, natural language, (19 more...)

2510.22904

Country:

North America > United States (1.00)
Asia (0.95)

Genre: Research Report > New Finding (0.34)

Industry:

Information Technology > Services (1.00)
Government > Regional Government > North America Government > United States Government (1.00)
Government > Voting & Elections (0.93)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Popescu, Eduard, Groza, Adrian, Cernat, Andreea

Combining Deep Learning and Explainable AI for Toxicity Prediction of Chemical Compounds

arXiv.org Artificial IntelligenceOct-28-2025

The task here is to predict the toxicological activity of chemical compounds based on the Tox21 dataset, a benchmark in computational toxicology. After a domain-specific overview of chemical toxicity, we discuss current computational strategies, focusing on machine learning and deep learning. Several architectures are compared in terms of performance, robustness, and interpretability. This research introduces a novel image-based pipeline based on DenseNet121, which processes 2D graphical representations of chemical structures. Additionally, we employ Grad-CAM visualizations, an explainable AI technique, to interpret the model's predictions and highlight molecular regions contributing to toxicity classification. The proposed architecture achieves competitive results compared to traditional models, demonstrating the potential of deep convolutional networks in cheminformatics. Our findings emphasize the value of combining image-based representations with explainable AI methods to improve both predictive accuracy and model transparency in toxicology.

artificial intelligence, machine learning, prediction, (17 more...)

2510.22572

Country: North America > United States (0.68)

Genre: Research Report (0.84)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Materials > Chemicals (0.72)
Government > Regional Government > North America Government > United States Government (0.68)
Health & Medicine > Therapeutic Area > Toxicology (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Karbab, ElMouatez Billah, Debbabi, Mourad

Applying Graph Analysis for Unsupervised Fast Malware Fingerprinting

arXiv.org Artificial IntelligenceOct-16-2025

Malware proliferation is increasing at a tremendous rate, with hundreds of thousands of new samples identified daily. Manual investigation of such a vast amount of malware is an unrealistic, time-consuming, and overwhelming task. To cope with this volume, there is a clear need to develop specialized techniques and efficient tools for preliminary filtering that can group malware based on semantic similarity. In this paper, we propose TrapNet, a novel, scalable, and unsupervised framework for malware fingerprinting and grouping. TrapNet employs graph community detection techniques for malware fingerprinting and family attribution based on static analysis, as follows: (1) TrapNet detects packed binaries and unpacks them using known generic packer tools. (2) From each malware sample, it generates a digest that captures the underlying semantics. Since the digest must be dense, efficient, and suitable for similarity checking, we designed FloatHash (FH), a novel numerical fuzzy hashing technique that produces a short real-valued vector summarizing the underlying assembly items and their order. FH is based on applying Principal Component Analysis (PCA) to ordered assembly items (e.g., opcodes, function calls) extracted from the malware's assembly code. (3) Representing malware with short numerical vectors enables high-performance, large-scale similarity computation, which allows TrapNet to build a malware similarity network. (4) Finally, TrapNet employs state-of-the-art community detection algorithms to identify dense communities, which represent groups of malware with similar semantics. Our extensive evaluation of TrapNet demonstrates its effectiveness in terms of the coverage and purity of the detected communities, while also highlighting its runtime efficiency, which outperforms other state-of-the-art solutions.

data mining, machine learning, malware sample, (17 more...)

2510.12811

Genre: Research Report > Promising Solution (0.48)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.66)