AITopics

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.86)
Information Technology > Artificial Intelligence > Vision (0.63)

Neural Information Processing SystemsFeb-18-2026, 14:45:35 GMT

ecd92623ac899357312aaa8915853699-Paper-Conference.pdf

diffusion model, machine learning, natural language, (18 more...)

Country:

Asia > South Korea (0.04)
Europe > Romania > Sud - Muntenia Development Region > Giurgiu County > Giurgiu (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Wang, Xiang Xiang, Cottrell, Sean, Wei, Guo-Wei

Multiscale Grassmann Manifolds for Single-Cell Data Analysis

arXiv.org Artificial IntelligenceNov-18-2025

Single-cell data analysis seeks to characterize cellular heterogeneity based on high-dimensional gene expression profiles. Conventional approaches represent each cell as a vector in Euclidean space, which limits their ability to capture intrinsic correlations and multiscale geometric structures. We propose a multiscale framework based on Grassmann manifolds that integrates machine learning with subspace geometry for single-cell data analysis. By generating embeddings under multiple representation scales, the framework combines their features from different geometric views into a unified Grassmann manifold. A power-based scale sampling function is introduced to control the selection of scales and balance in- formation across resolutions. Experiments on nine benchmark single-cell RNA-seq datasets demonstrate that the proposed approach effectively preserves meaningful structures and provides stable clustering performance, particularly for small to medium-sized datasets. These results suggest that Grassmann manifolds offer a coherent and informative foundation for analyzing single cell data.

artificial intelligence, grassmann manifold, machine learning, (17 more...)

2511.11717

Country: North America > United States > Michigan > Ingham County (0.14)

Genre: Research Report > New Finding (1.00)

Industry:

Telecommunications (1.00)
Information Technology > Networks (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area (0.93)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.88)

arXiv.org Artificial IntelligenceNov-18-2025

Multi-Metric Preference Alignment for Generative Speech Restoration

Zhang, Junan, Zhang, Xueyao, Yang, Jing, Wang, Yuancheng, Fan, Fan, Wu, Zhizheng

Recent generative models have significantly advanced speech restoration tasks, yet their training objectives often misalign with human perceptual preferences, resulting in suboptimal quality. While post-training alignment has proven effective in other generative domains like text and image generation, its application to generative speech restoration remains largely under-explored. This work investigates the challenges of applying preference-based post-training to this task, focusing on how to define a robust preference signal and curate high-quality data to avoid reward hacking. To address these challenges, we propose a multi-metric preference alignment strategy. We construct a new dataset, GenSR-Pref, comprising 80K preference pairs, where each chosen sample is unanimously favored by a complementary suite of metrics covering perceptual quality, signal fidelity, content consistency, and timbre preservation. This principled approach ensures a holistic preference signal. Applying Direct Preference Optimization (DPO) with our dataset, we observe consistent and significant performance gains across three diverse generative paradigms: autoregressive models (AR), masked generative models (MGM), and flow-matching models (FM) on various restoration benchmarks, in both objective and subjective evaluations. Ablation studies confirm the superiority of our multi-metric strategy over single-metric approaches in mitigating reward hacking. Furthermore, we demonstrate that our aligned models can serve as powerful ''data annotators'', generating high-quality pseudo-labels to serve as a supervision signal for traditional discriminative models in data-scarce scenarios like singing voice restoration. Demo Page:https://gensr-pref.github.io

arxiv preprint arxiv, large language model, machine learning, (19 more...)

2508.17229

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Deschenaux, Justin, Tran, Lan, Gulcehre, Caglar

Partition Generative Modeling: Masked Modeling Without Masks

arXiv.org Artificial IntelligenceOct-13-2025

Masked generative models (MGMs) are widely used to capture complex data and enable faster generation than autoregressive models (AR) through parallel decoding. However, MGMs typically operate on fixed-length inputs, which can be inefficient: early in sampling, most tokens are masked and carry no information, leading to wasted computation. In contrast, AR models process only tokens generated previously, making early iterations faster. In this work, we introduce the Partition Generative Model (PGM), a novel approach that combines the strengths of AR and MGMs. Rather than masking, PGM partitions tokens into two groups and employs sparse attention to block information flow between them. Since there is no information flow between partitions, the model can process the previously-generated tokens only during sampling, while retaining the ability to generate tokens in parallel and in any order. On OpenWebText, PGMs offer at least $5\times$ improvements in sampling latency and throughput, while producing samples with superior Generative Perplexity, compared to Masked Diffusion Language Models. On ImageNet, PGMs achieve a $7.5\times$ higher throughput than MaskGIT, with only a slight increase in FID (5.54 vs. 5.35). With twice as many sampling steps, the FID reduces to 4.56 while while being $3.9\times$ faster than MaskGIT. Finally, PGMs integrate seamlessly with MGM distillation, providing further inference speedups.

large language model, machine learning, natural language, (19 more...)

2505.18883

Country: Europe (0.46)

Genre: Research Report (0.74)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Neural Information Processing SystemsOct-10-2025, 20:30:54 GMT

Unlocking the Capabilities of Masked Generative Models for Image Synthesis via Self-Guidance

In this paper, we extend these guidance methods to generalized guidance formulation for MGMs and propose a self-guidance sampling method, which leads to better generation quality.

arxiv preprint arxiv, diffusion model, guidance, (14 more...)

Country:

Asia > South Korea (0.04)
Europe > Romania > Sud - Muntenia Development Region > Giurgiu County > Giurgiu (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Been Kim, Julie A. Shah, Finale Doshi-Velez

Mind the Gap: A Generative Approach to Interpretable Feature Selection and Extraction

Neural Information Processing SystemsOct-2-2025, 09:53:45 GMT

We present the Mind the Gap Model (MGM), an approach for interpretable feature extraction and selection. By placing interpretability criteria directly into the model, we allow for the model to both optimize parameters related to interpretabil-ity and to directly report a global set of distinguishable dimensions to assist with further data exploration and hypothesis generation.

data mining, dimension, machine learning, (16 more...)

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
Asia > Middle East > Jordan (0.04)
North America > United States > Pennsylvania (0.04)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.68)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.69)

Neural Information Processing SystemsMay-27-2025, 20:35:05 GMT

Unlocking the Capabilities of Masked Generative Models for Image Synthesis via Self-Guidance

image synthesis, masked generative model, self-guidance, (6 more...)

Technology:

Information Technology > Artificial Intelligence > Vision (0.99)
Information Technology > Artificial Intelligence > Machine Learning (0.93)
Information Technology > Artificial Intelligence > Natural Language > Generation (0.65)

Manzoor, Muhammad Arslan, Zeng, Ruihong, Azizov, Dilshod, Nakov, Preslav, Liang, Shangsong

MGM: Global Understanding of Audience Overlap Graphs for Predicting the Factuality and the Bias of News Media

arXiv.org Machine LearningDec-12-2024

In the current era of rapidly growing digital data, evaluating the political bias and factuality of news outlets has become more important for seeking reliable information online. In this work, we study the classification problem of profiling news media from the lens of political bias and factuality. Traditional profiling methods, such as Pre-trained Language Models (PLMs) and Graph Neural Networks (GNNs) have shown promising results, but they face notable challenges. PLMs focus solely on textual features, causing them to overlook the complex relationships between entities, while GNNs often struggle with media graphs containing disconnected components and insufficient labels. To address these limitations, we propose MediaGraphMind (MGM), an effective solution within a variational Expectation-Maximization (EM) framework. Instead of relying on limited neighboring nodes, MGM leverages features, structural patterns, and label information from globally similar nodes. Such a framework not only enables GNNs to capture long-range dependencies for learning expressive node representations but also enhances PLMs by integrating structural information and therefore improving the performance of both models. The extensive experiments demonstrate the effectiveness of the proposed framework and achieve new state-of-the-art results. Further, we share our repository1 which contains the dataset, code, and documentation

machine learning, natural language, node, (20 more...)

arXiv.org Machine Learning

2412.10467

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
Europe > Greece > Central Macedonia > Thessaloniki (0.04)
(3 more...)

Genre: Research Report > New Finding (0.68)

Industry: Media > News (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Huynh, Chuong, Oh, Seoung Wug, Shrivastava, Abhinav, Lee, Joon-Young

MaGGIe: Masked Guided Gradual Human Instance Matting

arXiv.org Artificial IntelligenceApr-24-2024

Human matting is a foundation task in image and video processing, where human foreground pixels are extracted from the input. Prior works either improve the accuracy by additional guidance or improve the temporal consistency of a single instance across frames. We propose a new framework MaGGIe, Masked Guided Gradual Human Instance Matting, which predicts alpha mattes progressively for each human instances while maintaining the computational cost, precision, and consistency. Our method leverages modern architectures, including transformer attention and sparse convolution, to output all instance mattes simultaneously without exploding memory and latency. Although keeping constant inference costs in the multiple-instance scenario, our framework achieves robust and versatile performance on our proposed synthesized benchmarks. With the higher quality image and video matting benchmarks, the novel multi-instance synthesis approach from publicly available sources is introduced to increase the generalization of models in real-world scenarios.

consistency, dataset, video, (15 more...)

2404.16035

Country: North America > United States > Maryland > Prince George's County > College Park (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)