psd
Appendix: Remodel Self-Attention with Gaussian Kernel and Nyström Method
Y-axis: Cross Entropy Loss on validation set. Figure 1 shows the validation loss changes with respect to training time for 50k steps as supplementary results for the experiments in Section 5. In general, Skyformer converges faster and finishes 50k steps earlier than vanilla Attention and Kernelized Attention over all tasks. We further remark that on Text Classification, all models quickly fall into over-fitting, and thus the validation losses rise quickly. On Pathfinder, due to the difficulty of training, in the trial shown in the figure vanilla Attention fails to reach the best long-time limit under a certain setting. Figure 2 shows the singular value distribution of attention output from the second layer of a trained vanilla transformer.
Predictive-State Decoders: Encoding the Future into Recurrent Networks
Recurrent neural networks (RNNs) are a vital modeling technique that rely on internal states learned indirectly by optimization of a supervised, unsupervised, or reinforcement training loss. RNNs are used to model dynamic processes that are characterized by underlying latent states whose form is often unknown, precluding its analytic representation inside an RNN. In the Predictive-State Representation (PSR) literature, latent state processes are modeled by an internal state representation that directly models the distribution of future observations, and most recent work in this area has relied on explicitly representing and targeting sufficient statistics of this probability distribution. We seek to combine the advantages of RNNs and PSRs by augmenting existing state-of-the-art recurrent neural networks with Predictive-State Decoders (PSDs), which add supervision to the network's internal state representation to target predicting future observations. PSDs are simple to implement and easily incorporated into existing training pipelines via additional loss regularization. We demonstrate the effectiveness of PSDs with experimental results in three different domains: probabilistic filtering, Imitation Learning, and Reinforcement Learning. In each, our method improves statistical performance of state-of-the-art recurrent baselines and does so with fewer iterations and less data.
Stationarity and Spectral Characterization of Random Signals on Simplicial Complexes
Navarro, Madeline, Buciulea, Andrei, Segarra, Santiago, Marques, Antonio
It is increasingly common for data to possess intricate structure, necessitating new models and analytical tools. Graphs, a prominent type of structure, can encode the relationships between any two entities (nodes). However, graphs neither allow connections that are not dyadic nor permit relationships between sets of nodes. We thus turn to simplicial complexes for connecting more than two nodes as well as modeling relationships between simplices, such as edges and triangles. Our data then consist of signals lying on topological spaces, represented by simplicial complexes. Much recent work explores these topological signals, albeit primarily through deterministic formulations. We propose a probabilistic framework for random signals defined on simplicial complexes. Specifically, we generalize the classical notion of stationarity. By spectral dualities of Hodge and Dirac theory, we define stationary topological signals as the outputs of topological filters given white noise. This definition naturally extends desirable properties of stationarity that hold for both time-series and graph signals. Crucially, we properly define topological power spectral density (PSD) through a clear spectral characterization. We then discuss the advantages of topological stationarity due to spectral properties via the PSD. In addition, we empirically demonstrate the practicality of these benefits through multiple synthetic and real-world simulations.
Periodic Skill Discovery
Park, Jonghae, Cho, Daesol, Lee, Jusuk, Shim, Dongseok, Jang, Inkyu, Kim, H. Jin
Unsupervised skill discovery in reinforcement learning (RL) aims to learn diverse behaviors without relying on external rewards. However, current methods often overlook the periodic nature of learned skills, focusing instead on increasing the mutual dependence between states and skills or maximizing the distance traveled in latent space. Considering that many robotic tasks - particularly those involving locomotion - require periodic behaviors across varying timescales, the ability to discover diverse periodic skills is essential. Motivated by this, we propose Periodic Skill Discovery (PSD), a framework that discovers periodic behaviors in an unsupervised manner. The key idea of PSD is to train an encoder that maps states to a circular latent space, thereby naturally encoding periodicity in the latent representation. By capturing temporal distance, PSD can effectively learn skills with diverse periods in complex robotic tasks, even with pixel-based observations. We further show that these learned skills achieve high performance on downstream tasks such as hurdling. Moreover, integrating PSD with an existing skill discovery method offers more diverse behaviors, thus broadening the agent's repertoire. Our code and demos are available at https://jonghaepark.github.io/psd/
Predictive-State Decoders: Encoding the Future into Recurrent Networks
Recurrent neural networks (RNNs) are a vital modeling technique that rely on internal states learned indirectly by optimization of a supervised, unsupervised, or reinforcement training loss. RNNs are used to model dynamic processes that are characterized by underlying latent states whose form is often unknown, precluding its analytic representation inside an RNN. In the Predictive-State Representation (PSR) literature, latent state processes are modeled by an internal state representation that directly models the distribution of future observations, and most recent work in this area has relied on explicitly representing and targeting sufficient statistics of this probability distribution. We seek to combine the advantages of RNNs and PSRs by augmenting existing state-of-the-art recurrent neural networks with Predictive-State Decoders (PSDs), which add supervision to the network's internal state representation to target predicting future observations. PSDs are simple to implement and easily incorporated into existing training pipelines via additional loss regularization. We demonstrate the effectiveness of PSDs with experimental results in three different domains: probabilistic filtering, Imitation Learning, and Reinforcement Learning. In each, our method improves statistical performance of state-of-the-art recurrent baselines and does so with fewer iterations and less data.
Assessing the Geographic Generalization and Physical Consistency of Generative Models for Climate Downscaling
Saccardi, Carlo, Pierzyna, Maximilian, Borde, Haitz Sáez de Ocáriz, Monaco, Simone, Meo, Cristian, Liò, Pietro, Saathof, Rudolf, Joseph, Geethu, Dauwels, Justin
Kilometer-scale weather data is crucial for real-world applications but remains computationally intensive to produce using traditional weather simulations. An emerging solution is to use deep learning models, which offer a faster alternative for climate downscaling. However, their reliability is still in question, as they are often evaluated using standard machine learning metrics rather than insights from atmospheric and weather physics. This paper benchmarks recent state-of-the-art deep learning models and introduces physics-inspired diagnostics to evaluate their performance and reliability, with a particular focus on geographic generalization and physical consistency. Our experiments show that, despite the seemingly strong performance of models such as CorrDiff, when trained on a limited set of European geographies (e.g., central Europe), they struggle to generalize to other regions such as Iberia, Morocco in the south, or Scandinavia in the north. They also fail to accurately capture second-order variables such as divergence and vorticity derived from predicted velocity fields. These deficiencies appear even in in-distribution geographies, indicating challenges in producing physically consistent predictions. We propose a simple initial solution: introducing a power spectral density loss function that empirically improves geographic generalization by encouraging the reconstruction of small-scale physical structures. The code for reproducing the experimental results can be found at https://github.com/CarloSaccardi/PSD-Downscaling