Goto

Collaborating Authors

 rae


She didn't expect to fall in love with a chatbot - and then have to say goodbye

BBC News

She didn't expect to fall in love with a chatbot - and then have to say goodbye Rae began speaking to Barry last year after the end of a difficult divorce. She was unfit and unhappy and turned to ChatGPT for advice on diet, supplements and skincare. She had no idea she would fall in love. He lives on an old model of ChatGPT, one that its owners OpenAI announced it would retire on 13 February. That she could lose Barry on the eve of Valentine's Day came as a shock to Rae - and to many others who have found a companion, friend, or even a lifeline in the old model, Chat GPT-4o.


7bf1dc45f850b8ae1b5a1dd4f475f8b6-Supplemental-Conference.pdf

Neural Information Processing Systems

In this appendix, we provide pseudo-code algorithms explaining how to build the metric from a29 trained VAEandhowtousetheproposed sampling process. B.2.1 TheHMCsampler43 In the sampling process we propose to rely on the Hamiltonian Monte Carlo sampler to sample44 fromtheRiemanian uniformdistribution. Moreover,sinceG(z)issmooth andhas66 a closed form, it can be differentiated with respect toz pretty easily. Figure 5: Closest element inthetraining set(Near.) Each model is trained on each label of the train set and used to generate 2k samples per89 class.


Diffusion Transformers with Representation Autoencoders

Zheng, Boyang, Ma, Nanye, Tong, Shengbang, Xie, Saining

arXiv.org Artificial Intelligence

Latent generative modeling, where a pretrained autoencoder maps pixels into a latent space for the diffusion process, has become the standard strategy for Diffusion Transformers (DiT); however, the autoencoder component has barely evolved. Most DiTs continue to rely on the original V AE encoder, which introduces several limitations: outdated backbones that compromise architectural simplicity, low-dimensional latent spaces that restrict information capacity, and weak representations that result from purely reconstruction-based training and ultimately limit generative quality. In this work, we explore replacing the V AE with pretrained representation encoders (e.g., DINO, SigLIP, MAE) paired with trained decoders, forming what we term Representation Autoencoders (RAEs). These models provide both high-quality reconstructions and semantically rich latent spaces, while allowing for a scalable transformer-based architecture. Since these latent spaces are typically high-dimensional, a key challenge is enabling diffusion transformers to operate effectively within them. We analyze the sources of this difficulty, propose theoretically motivated solutions, and validate them empirically. Our approach achieves faster convergence without auxiliary representation alignment losses. Using a DiT variant equipped with a lightweight, wide DDT head, we achieve strong image generation results on ImageNet: 1.51 FID at 256 256 (no guidance) and 1.13 at both 256 256 and 512 512 (with guidance). RAE offers clear advantages and should be the new default for diffusion transformer training. Project page: rae-dit.github.io Figure 1: Representation Autoencoder (RAE) uses frozen pretrained representations as the encoder with a lightweight decoder to reconstruct input images without compression. RAE enables faster convergence and higher-quality samples in latent diffusion training compared to V AE-based models. The evolution of generative modeling has been driven by a continual redefinition of where and how models learn to represent data. Early pixel-space models sought to directly capture image statistics, but the emergence of latent diffusion (V ahdat et al., 2021; Rombach et al., 2022) reframed generation as a process operating within a learned, compact representation space. By diffusing in this space rather than in raw pixels, models such as Latent Diffusion Models (LDM) (Rombach et al., 2022) and Diffusion Transformers (DiT) (Peebles & Xie, 2023; Ma et al., 2024) achieve higher visual fidelity and efficiency, powering the most capable image and video generators of today. Despite progress in diffusion backbones, the autoencoder defining the latent space remains largely unchanged. The widely used SD-V AE (Rombach et al., 2022) still relies on heavy channel-wise 1 In addition, SD-V AE, built on a legacy convolutional design, remains computationally inefficient (see Figure 1).




Acting and Planning with Hierarchical Operational Models on a Mobile Robot: A Study with RAE+UPOM

Lima, Oscar, Vinci, Marc, Patra, Sunandita, Stock, Sebastian, Hertzberg, Joachim, Atzmueller, Martin, Ghallab, Malik, Nau, Dana, Traverso, Paolo

arXiv.org Artificial Intelligence

Robotic task execution faces challenges due to the inconsistency between symbolic planner models and the rich control structures actually running on the robot. In this paper, we present the first physical deployment of an integrated actor-planner system that shares hierarchical operational models for both acting and planning, interleaving the Reactive Acting Engine (RAE) with an anytime UCT-like Monte Carlo planner (UPOM). We implement RAE+UPOM on a mobile manipulator in a real-world deployment for an object collection task. Our experiments demonstrate robust task execution under action failures and sensor noise, and provide empirical insights into the interleaved acting-and-planning decision making process.


Imputation of Missing Data in Smooth Pursuit Eye Movements Using a Self-Attention-based Deep Learning Approach

Bejani, Mehdi, Perez-de-Arenaza-Pozo, Guillermo, Arias-Londoño, Julián D., Godino-LLorente, Juan I.

arXiv.org Artificial Intelligence

Missing data is a relevant issue in time series, especially in biomedical sequences such as those corresponding to smooth pursuit eye movements, which often contain gaps due to eye blinks and track losses, complicating the analysis and extraction of meaningful biomarkers. In this paper, a novel imputation framework is proposed using Self-Attention-based Imputation networks for time series, which leverages the power of deep learning and self-attention mechanisms to impute missing data. We further refine the imputed data using a custom made autoencoder, tailored to represent smooth pursuit eye movement sequences. The proposed approach was implemented using 5,504 sequences from 172 Parkinsonian patients and healthy controls. Results show a significant improvement in the accuracy of reconstructed eye movement sequences with respect to other state of the art techniques, substantially reducing the values for common time domain error metrics such as the mean absolute error, mean relative error, and root mean square error, while also preserving the signal's frequency domain characteristics. Moreover, it demonstrates robustness when large intervals of data are missing. This method offers an alternative solution for robustly handling missing data in time series, enhancing the reliability of smooth pursuit analysis for the screening and monitoring of neurodegenerative disorders.


Applying Attribution Explanations in Truth-Discovery Quantitative Bipolar Argumentation Frameworks

Yin, Xiang, Potyka, Nico, Toni, Francesca

arXiv.org Artificial Intelligence

Explaining the strength of arguments under gradual semantics is receiving increasing attention. For example, various studies in the literature offer explanations by computing the attribution scores of arguments or edges in Quantitative Bipolar Argumentation Frameworks (QBAFs). These explanations, known as Argument Attribution Explanations (AAEs) and Relation Attribution Explanations (RAEs), commonly employ removal-based and Shapley-based techniques for computing the attribution scores. While AAEs and RAEs have proven useful in several applications with acyclic QBAFs, they remain largely unexplored for cyclic QBAFs. Furthermore, existing applications tend to focus solely on either AAEs or RAEs, but do not compare them directly. In this paper, we apply both AAEs and RAEs, to Truth Discovery QBAFs (TD-QBAFs), which assess the trustworthiness of sources (e.g., websites) and their claims (e.g., the severity of a virus), and feature complex cycles. We find that both AAEs and RAEs can provide interesting explanations and can give non-trivial and surprising insights.


Automatic Prediction of the Performance of Every Parser

Biçici, Ergun

arXiv.org Artificial Intelligence

We present a new parser performance prediction (PPP) model using machine translation performance prediction system (MTPPS), statistically independent of any language or parser, relying only on extrinsic and novel features based on textual, link structural, and bracketing tree structural information. This new system, MTPPS-PPP, can predict the performance of any parser in any language and can be useful for estimating the grammatical difficulty when understanding a given text, for setting expectations from parsing output, for parser selection for a specific domain, and for parser combination systems. We obtain SoA results in PPP of bracketing $F_1$ with better results over textual features and similar performance with previous results that use parser and linguistic label specific information. Our results show the contribution of different types of features as well as rankings of individual features in different experimental settings (cased vs. uncased), in different learning tasks (in-domain vs. out-of-domain), with different training sets, with different learning algorithms, and with different dimensionality reduction techniques. We achieve $0.0678$ MAE and $0.85$ RAE in setting +Link, which corresponds to about $7.4\%$ error when predicting the bracketing $F_1$ score for the Charniak and Johnson parser on the WSJ23 test set. MTPPS-PPP system can predict without parsing using only the text, without a supervised parser using only an unsupervised parser, without any parser or language dependent information, without using a reference parser output, and can be used to predict the performance of any parser in any language.


Explaining Arguments' Strength: Unveiling the Role of Attacks and Supports (Technical Report)

Yin, Xiang, Nico, Potyka, Toni, Francesca

arXiv.org Artificial Intelligence

Quantitatively explaining the strength of arguments under gradual semantics has recently received increasing attention. Specifically, several works in the literature provide quantitative explanations by computing the attribution scores of arguments. These works disregard the importance of attacks and supports, even though they play an essential role when explaining arguments' strength. In this paper, we propose a novel theory of Relation Attribution Explanations (RAEs), adapting Shapley values from game theory to offer fine-grained insights into the role of attacks and supports in quantitative bipolar argumentation towards obtaining the arguments' strength. We show that RAEs satisfy several desirable properties. We also propose a probabilistic algorithm to approximate RAEs efficiently. Finally, we show the application value of RAEs in fraud detection and large language models case studies.