AITopics | repa

Collaborating Authors

repa

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

iteration 200K 500K 9.0 7.0-0.1 7.4 10.7 +0.6 7.5 11.3 FID

Neural Information Processing SystemsJun-22-2026, 17:46:58 GMT

Diffusion Transformers (DiTs) deliver state-of-the-art image quality, yet their training remains notoriously slow. A recent remedy--representation alignment (REPA) that matches DiT hidden features to those of a non-generative teacher (e.g., DINO)--dramatically accelerates the early epochs but plateaus or even degrades performance later. We trace this failure to the capacity mismatch: once the generative student begins modeling the joint data distribution, the teacher's lower-dimensional embeddings and attention patterns become a straitjacket rather than a guide. We then introduce HASTE (Holistic Alignment with Stage-wise Termination for Efficient training), a two-phase schedule that keeps the help and drops the hindrance. Phase I applies a holistic alignment loss that simultaneously distills attention maps (relational priors) and feature projections (semantic anchors) from the teacher into mid-level layers of the DiT, yielding rapid convergence. Phase II then performs one-shot termination that deactivates the alignment loss, once a simple trigger such as a fixed iteration is hit, freeing the DiT to focus on denoising and exploit its generative capacity.

artificial intelligence, machine learning, natural language, (16 more...)

Neural Information Processing Systems

Genre: Research Report > Experimental Study (1.00)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
(2 more...)

Add feedback

Representation Entanglement for Generation: Training Diffusion Transformers Is Much Easier Than You Think

Neural Information Processing SystemsJun-14-2026, 14:57:53 GMT

REPA and its variants effectively mitigate training challenges in diffusion models by incorporating external visual representations from pretrained models, through alignment between the noisy hidden projections of denoising networks and foundational clean image representations. We argue that the external alignment, which is absent during the entire denoising inference process, falls short of fully harnessing the potential of discriminative representations. In this work, we propose a straightforward method called Representation Entanglement for Generation (REG), which entangles low-level image latents with a single high-level class token from pretrained foundation models for denoising. REG acquires the capability to produce coherent image-class pairs directly from pure noise, substantially improving both generation quality and training efficiency. This is accomplished with negligible additional inference overhead, requiring only one single additional token for denoising (<0.5% increase in FLOPs and latency). The inference process concurrently reconstructs both image latents and their corresponding global semantics, where the acquired semantic knowledge actively guides and enhances the image generation process. On ImageNet 256 256, SiT-XL/2 + REG demonstrates remarkable convergence acceleration, achieving 63 and 23 faster training than SiT-XL/2 and SiT-XL/2 + REPA, respectively.

machine learning, natural language, sit-xl 2, (18 more...)

Neural Information Processing Systems

Country: Asia > China (0.46)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.93)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Add feedback

Improved Baselines with Representation Autoencoders

Singh, Jaskirat, Zheng, Boyang, Wu, Zongze, Zhang, Richard, Shechtman, Eli, Xie, Saining

arXiv.org Machine LearningMay-19-2026

Representation Autoencoders (RAE) replace traditional VAE with pretrained vision encoders. In this paper, we systematically investigate several design choices and find three insights which simplify and improve RAE. First, we study a generalized formulation where the representation is defined as sum of the last k encoder layers rather than solely the final layer. This simple change greatly improves reconstruction without encoder finetuning or specialized data (e.g., text, faces). Second, we study the prevalent assumption that RAE (using pretrained representation as encoder) replaces representation alignment (REPA), which distills the same representation to intermediate layers instead. Through large-scale empirical analysis, we uncover a surprising finding: RAE and REPA exhibit complementary working mechanisms, allowing the same representation to be used as both encoder and target for intermediate diffusion layers. Finally, the original RAE struggles with classifier-free guidance (CFG) and requires training a second, weaker diffusion model for AutoGuidance (AG). We show that REPA itself can be viewed as x-prediction in RAE latent space. By simply re-parameterizing the output of the DiT model, it can provide guidance for "free". Overall, RAEv2 leads to more than 10x faster convergence over the original RAE, achieving a state-of-the-art gFID of 1.06 in just 80 epochs on ImageNet-256. On FDr^k, RAEv2 achieves a state-of-the-art 2.17 at just 80 epochs compared to the previous best 3.26 (800 epochs) without any post-training. This motivates EP_FID@k (epochs to reach unguided gFID <= k) as a measure of training efficiency. RAEv2 attains an EP_FID@2 of 35 epochs, versus 177 for the original RAE. We also validate our approach across diverse settings for text-to-image generation and navigation world models, showing consistent improvements. Code is available at https://raev2.github.io.

artificial intelligence, encoder, machine learning, (16 more...)

arXiv.org Machine Learning

2605.18324

Country: Asia (0.28)

Genre: Research Report (0.64)

Industry: Media > Photography (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

REPA: Russian Error Types Annotation for Evaluating Text Generation and Judgment Capabilities

Pugachev, Alexander, Fenogenova, Alena, Mikhailov, Vladislav, Artemova, Ekaterina

arXiv.org Artificial IntelligenceMar-17-2025

Recent advances in large language models (LLMs) have introduced the novel paradigm of using LLMs as judges, where an LLM evaluates and scores the outputs of another LLM, which often correlates highly with human preferences. However, the use of LLM-as-a-judge has been primarily studied in English. In this paper, we evaluate this framework in Russian by introducing the Russian Error tyPes Annotation dataset (REPA), a dataset of 1k user queries and 2k LLM-generated responses. Human annotators labeled each response pair expressing their preferences across ten specific error types, as well as selecting an overall preference. We rank six generative LLMs across the error types using three rating systems based on human preferences. We also evaluate responses using eight LLM judges in zero-shot and few-shot settings. We describe the results of analyzing the judges and position and length biases. Our findings reveal a notable gap between LLM judge performance in Russian and English. However, rankings based on human and LLM preferences show partial alignment, suggesting that while current LLM judges struggle with fine-grained evaluation in Russian, there is potential for improvement.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2503.13102

Country:

South America > Suriname > Marowijne District > Albina (0.04)
Asia > Thailand > Bangkok > Bangkok (0.04)
North America > United States > New York (0.04)
(8 more...)

Genre: Research Report > New Finding (0.34)

Industry: Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.53)

Add feedback

Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think

Yu, Sihyun, Kwak, Sangkyung, Jang, Huiwon, Jeong, Jongheon, Huang, Jonathan, Shin, Jinwoo, Xie, Saining

arXiv.org Artificial IntelligenceDec-5-2024

Recent studies have shown that the denoising process in (generative) diffusion models can induce meaningful (discriminative) representations inside the model, though the quality of these representations still lags behind those learned through recent self-supervised learning methods. We argue that one main bottleneck in training large-scale diffusion models for generation lies in effectively learning these representations. Moreover, training can be made easier by incorporating high-quality external visual representations, rather than relying solely on the diffusion models to learn them independently. We study this by introducing a straightforward regularization called REPresentation Alignment (REPA), which aligns the projections of noisy input hidden states in denoising networks with clean image representations obtained from external, pretrained visual encoders. The results are striking: our simple strategy yields significant improvements in both training efficiency and generation quality when applied to popular diffusion and flow-based transformers, such as DiTs and SiTs. For instance, our method can speed up SiT training by over 17.5$\times$, matching the performance (without classifier-free guidance) of a SiT-XL model trained for 7M steps in less than 400K steps. In terms of final generation quality, our approach achieves state-of-the-art results of FID=1.42 using classifier-free guidance with the guidance interval.

diffusion model, repa, representation, (12 more...)

arXiv.org Artificial Intelligence

2410.0694

Country: North America > United States > New York (0.04)

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)

Add feedback

Learning Optimal Tax Design in Nonatomic Congestion Games

Cui, Qiwen, Fazel, Maryam, Du, Simon S.

arXiv.org Artificial IntelligenceFeb-12-2024

We study how to learn the optimal tax design to maximize the efficiency in nonatomic congestion games. It is known that self-interested behavior among the players can damage the system's efficiency. Tax mechanisms is a common method to alleviate this issue and induce socially optimal behavior. In this work, we take the initial step for learning the optimal tax that can minimize the social cost with \emph{equilibrium feedback}, i.e., the tax designer can only observe the equilibrium state under the enforced tax. Existing algorithms are not applicable due to the exponentially large tax function space, nonexistence of the gradient, and nonconvexity of the objective. To tackle these challenges, our algorithm leverages several novel components: (1) piece-wise linear tax to approximate the optimal tax; (2) an extra linear term to guarantee a strongly convex potential function; (3) efficient subroutine to find the ``boundary'' tax. The algorithm can find an $\epsilon$-optimal tax with $O(\beta F^2/\epsilon)$ sample complexity, where $\beta$ is the smoothness of the cost function and $F$ is the number of facilities.

algorithm 2, congestion game, nash equilibrium, (14 more...)

arXiv.org Artificial Intelligence

2402.07437

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > New York > New York County > New York City (0.04)
Europe > Middle East > Cyprus > Pafos > Paphos (0.04)
Asia > Middle East > Jordan (0.04)

Genre:

Research Report (0.50)
Workflow (0.48)

Industry:

Leisure & Entertainment > Games (0.46)
Government (0.46)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

REPA: Client Clustering without Training and Data Labels for Improved Federated Learning in Non-IID Settings

Radovič, Boris, Pejović, Veljko

arXiv.org Artificial IntelligenceSep-25-2023

Clustering clients into groups that exhibit relatively homogeneous data distributions represents one of the major means of improving the performance of federated learning (FL) in non-independent and identically distributed (non-IID) data settings. Yet, the applicability of current state-of-the-art approaches remains limited as these approaches cluster clients based on information, such as the evolution of local model parameters, that is only obtainable through actual on-client training. On the other hand, there is a need to make FL models available to clients who are not able to perform the training themselves, as they do not have the processing capabilities required for training, or simply want to use the model without participating in the training. Furthermore, the existing alternative approaches that avert the training still require that individual clients have a sufficient amount of labeled data upon which the clustering is based, essentially assuming that each client is a data annotator. In this paper, we present REPA, an approach to client clustering in non-IID FL settings that requires neither training nor labeled data collection. REPA uses a novel supervised autoencoder-based method to create embeddings that profile a client's underlying data-generating processes without exposing the data to the server and without requiring local training. Our experimental analysis over three different datasets demonstrates that REPA delivers state-of-the-art model performance while expanding the applicability of cluster-based FL to previously uncovered use cases.

accuracy, dataset, repa, (16 more...)

arXiv.org Artificial Intelligence

2309.14088

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > Texas > Travis County > Austin (0.04)
North America > United States > Oregon > Multnomah County > Portland (0.04)
(8 more...)

Genre: Research Report > Promising Solution (0.86)

Industry: Education (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.95)

Add feedback