AITopics | latent

Collaborating Authors

latent

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Aligning What Matters: Masked Latent Adaptation for Text-to-Audio-Video Generation

Neural Information Processing SystemsJun-23-2026, 04:10:30 GMT

Text-to-Audio-Video (T2AV) generation aims to produce temporally and semantically aligned visual and auditory content from natural language descriptions. While recent progress in text-to-audio and text-to-video models has improved generation quality within each modality, jointly modeling them remains challenging due to incomplete and asymmetric correspondence: audio often reflects only a subset of the visual scene, and vice versa. Naively enforcing full alignment introduces semantic noise and temporal mismatches. To address this, we propose a novel framework that performs selective cross-modal alignment through a learnable masking mechanism, enabling the model to isolate and align only the shared latent components relevant to both modalities. This mechanism is integrated into an adaptation module that interfaces with pretrained encoders and decoders from latent video and audio diffusion models, preserving their generative capacity with reduced training overhead. Theoretically, we show that our masked objective provably recovers the minimal set of shared latent variables across modalities. Empirically, our method achieves stateof-the-art performance on standard T2AV benchmarks, demonstrating significant improvements in audiovisual synchronization and semantic consistency.

artificial intelligence, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.87)

Industry:

Media (0.67)
Information Technology (0.46)
Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Measuring and Guiding Monosemanticity

Neural Information Processing SystemsJun-23-2026, 02:51:43 GMT

There is growing interest in leveraging mechanistic interpretability and controllability to better understand and influence the internal dynamics of large language models (LLMs). However, current methods face fundamental challenges in reliably localizing and manipulating feature representations. Sparse Autoencoders (SAEs) have recently emerged as a promising direction for feature extraction at scale, yet they, too, are limited by incomplete feature isolation and unreliable monosemanticity. To systematically quantify these limitations, we introduce Feature Monosemanticity Score (FMS), a novel metric to quantify feature monosemanticity in latent representation. Building on these insights, we propose Guided Sparse Autoencoders (G-SAE), a method that conditions latent representations on labeled concepts during training. We demonstrate that reliable localization and disentanglement of target concepts within the latent space improve interpretability, detection of behavior, and control. Specifically, our evaluations on toxicity detection, writing style identification, and privacy attribute recognition show that G-SAE not only enhances monosemanticity but also enables more effective and fine-grained steering with less quality degradation. Our findings provide actionable guidelines for measuring and advancing mechanistic interpretability and control of LLMs.1

data mining, large language model, machine learning, (22 more...)

Neural Information Processing Systems

Country: Europe (0.45)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Education (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Extracting task-relevant preserved dynamics from contrastive aligned neural recordings

Neural Information Processing SystemsJun-23-2026, 01:32:55 GMT

Recent work indicates that low-dimensional dynamics of neural and behavioral data are often preserved across days and subjects. However, extracting these preserved dynamics remains challenging: high-dimensional neural population activity and the recorded neuron populations vary across recording sessions. While existing modeling tools can improve alignment between neural and behavioral data, they often operate on a per-subject basis or discretize behavior into categories, disrupting its natural continuity and failing to capture the underlying dynamics. We introduce Contrastive Aligned Neural DYnamics (CANDY), an end-to-end framework that aligns neural and behavioral data using rank-based contrastive learning, adapted for continuous behavioral variables, to project neural activity from different sessions onto a shared low-dimensional embedding space. CANDY fits a shared linear dynamical system to the aligned embeddings, enabling an interpretable model of the conserved temporal structure in the latent space.

artificial intelligence, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Country: North America > United States > California (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Government > Regional Government > North America Government > United States Government (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
(3 more...)

Add feedback

Towards foundational LiDAR world models with efficient latent flow matching

Neural Information Processing SystemsJun-23-2026, 00:16:25 GMT

LiDAR-based world models offer more structured and geometry-aware representations than their image-based counterparts. However, existing LiDAR world models are narrowly trained; each model excels only in the domain for which it was built. This raises a critical question: can we develop LiDAR world models that exhibit strong transferability across multiple domains? To answer this, we conduct the first systematic domain transfer study across three demanding scenarios: (i) outdoor to indoor generalization, (ii) sparse-to dense-beam adaptation, and (iii) non-semantic to semantic transfer. Given different amounts of fine-tuning data, our experiments show that a single pretrained model can achieve up to 11% absolute improvement (83% relative) over training from scratch and outperforms training from scratch in 30/36 of our comparisons. This transferability significantly reduces the reliance on manually annotated data for semantic occupancy forecasting: our method exceeds previous baselines with only 5% of the labeled training data of prior work. We also observed inefficiencies of current generative-model-based LiDAR world models, mainly through their under-compression of LiDAR data and inefficient training objectives. To address these issues, we propose a latent conditional flow matching (CFM)-based framework that achieves state-of-the-art reconstruction accuracy using only half the training data and a compression ratio 6 times higher than that of prior methods. Our model also achieves SOTA performance on semantic occupancy forecasting while being 1.98x-23x more computationally efficient (a 1.1x-3.9x

artificial intelligence, forecasting, machine learning, (17 more...)

Neural Information Processing Systems

Country: North America > Canada (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)

Add feedback

d7b3cef7c31b94a4a533db83d01a8882-Paper-Conference.pdf

Neural Information Processing SystemsJun-22-2026, 22:12:57 GMT

Latent action models (LAMs) aim to learn action-relevant changes from unlabeled videos by compressing changes between frames as latents. However, differences between video frames can be caused by controllable changes as well as exogenous noise, leading to an important concern - do latents capture the changes caused by actions or irrelevant noise?

information, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Genre: Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
(3 more...)

Add feedback

Sparse Image Synthesis via Joint Latent and RoIFlow

Neural Information Processing SystemsJun-22-2026, 22:03:51 GMT

However, most generative models rely on dense grid-based pixels or latents, neglecting this inherent sparsity. In this paper, we explore modeling visual generation paradigm via sparse non-grid latent representations. Specifically, we design a sparse autoencoder that represents an image as a small number of latents with their positional properties (i.e., regions of interest, RoIs) with high reconstruction quality. We then explore training flow-matching transformers jointly on non-grid latents and RoI values. To the best knowledge, we are the first to address spatial sparsity using RoIs in generative process. Experimental results show that our sparse flow-based transformers have competitive performance compared with dense grid-based counterparts with significantly reduced lower compute, and reaches a competitive 2.76 FID with just 64 latents on class-conditional ImageNet 256 256 generation.

artificial intelligence, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Genre: Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (0.93)
Information Technology > Artificial Intelligence > Natural Language (0.89)

Add feedback

d607a260b3bc0b9a704a1a04dd64040a-Paper-Conference.pdf

Neural Information Processing SystemsJun-22-2026, 21:57:46 GMT

We introduce PlayerOne, the first egocentric realistic world simulator, facilitating immersive and unrestricted exploration within vividly dynamic environments. Gi the ven corresponding an egocentric w scene orld and image generate from the egocentric user, PlayerOne videos can that accurately are strictly construct aligned with the real-scene human motion of the user captured by an exocentric camera. PlayerOne is trained in a coarse-to-fine pipeline that first performs pretraining on ing, lar follo ge-scale wed by egocentric finetuning text-video on synchronous pairs for coarse-le motion-video vel egocentric data extracted understandfrom egocentric-exocentric video datasets with our automatic construction pipeline.

artificial intelligence, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Genre: Research Report > Experimental Study (1.00)

Industry:

Information Technology (0.93)
Leisure & Entertainment > Games (0.68)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Human Computer Interaction (0.93)
(2 more...)

Add feedback

Active Seriation: Efficient Ordering Recovery with Statistical Guarantees

Neural Information Processing SystemsJun-22-2026, 21:47:17 GMT

Active seriation aims at recovering an unknown ordering of nitems by adaptively querying pairwise similarities. The observations are noisy measurements of entries of an underlying n npermuted Robinson matrix, whose permutation encodes the latent ordering. The framework allows the algorithm to start with partial information on the latent ordering, including seriation from scratch as a special case. We propose an active seriation algorithm that provably recovers the latent ordering with high probability. Under a uniform separation condition on the similarity matrix, optimal performance guarantees are established, both in terms of the probability of error and the number of observations required for successful recovery.

algorithm, artificial intelligence, machine learning, (19 more...)

Neural Information Processing Systems

Genre: Research Report > Experimental Study (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)

Add feedback

Dense SAELatents Are Features, Not Bugs

Neural Information Processing SystemsJun-22-2026, 19:27:19 GMT

Sparse autoencoders (SAEs) are designed to extract interpretable features from language models by enforcing a sparsity constraint. Ideally, training an SAE would yield latents that are both sparse and semantically meaningful. However, many SAE latents activate frequently (i.e., are dense), raising concerns that they may be undesirable artifacts of the training procedure. In this work, we systematically investigate the geometry, function, and origin of dense latents and show that they are not only persistent but often reflect meaningful model representations. We first demonstrate that dense latents tend to form antipodal pairs that reconstruct specific directions in the residual stream, and that ablating their subspace suppresses the emergence of new dense features in retrained SAEs--suggesting that high density features are an intrinsic property of the residual space. We then introduce a taxonomy of dense latents, identifying classes tied to position tracking, context binding, entropy regulation, letter-specific output signals, part-of-speech, and principal component reconstruction. Finally, we analyze how these features evolve across layers, revealing a shift from structural features in early layers, to semantic features in mid layers, and finally to output-oriented signals in the last layers of the model. Our findings indicate that dense latents serve functional roles in language model computation and should not be dismissed as training noise.

large language model, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Country:

North America > United States (1.00)
Asia (1.00)

Genre: Research Report > Experimental Study (1.00)

Industry:

Health & Medicine (0.93)
Government > Regional Government > North America Government > United States Government (0.92)
Media (0.68)
Leisure & Entertainment (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Disentangled Representation Learning via Modular Compositional Bias

Neural Information Processing SystemsJun-22-2026, 16:44:23 GMT

Recent disentangled representation learning (DRL) methods heavily rely on factorspecific strategies--either learning objectives for attributes or model architectures for objects--to embed inductive biases. Such divergent approaches result in significant overhead when novel factors of variation do not align with prior assumptions, such as statistical independence or spatial exclusivity, or when multiple factors coexist, as practitioners must redesign architectures or objectives. To address this, we propose a compositional bias, a modular inductive bias decoupled from both objectives and architectures. Our key insight is that different factors obey distinct "recombination rules" in the data distribution: global attributes are mutually exclusive, e.g., a face has one nose, while objects share a common support (any subset of objects can co-exist). We therefore randomly remix latents according to factor-specific rules, i.e., a mixing strategy, and force the encoder to discover whichever factor structure the mixing strategy reflects through two complementary objectives: (i) a prior loss that ensures every remix decodes into a realistic image, and (ii) the compositional consistency loss introduced by Wiedemer et al. [50], which aligns each composite image with its corresponding composite latent. Under this general framework, simply adjusting the mixing strategy enables disentanglement of attributes, objects, and even both, without modifying the objectives or architectures. Extensive experiments demonstrate that our method shows competitive performance in both attribute and object disentanglement, and uniquely achieves joint disentanglement of global style and objects.

artificial intelligence, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Genre: