Goto

Collaborating Authors

 jaeger


Echo State Transformer: Attention Over Finite Memories

arXiv.org Artificial Intelligence

While Large Language Models and their underlying Transformer architecture are remarkably efficient, they do not reflect how our brain processes and learns a diversity of cognitive tasks such as language and working memory. Furthermore, sequential data processing with Transformers encounters a fundamental barrier: quadratic complexity growth with sequence length. Motivated by these limitations, our ambition is to create more efficient models that are less reliant on intensive computations. We introduce Echo State Transformers (EST), a hybrid architecture that elegantly resolves this challenge while demonstrating exceptional performance in classification and detection tasks. EST integrates the Transformer attention mechanisms with principles from Reservoir Computing to create a fixed-size window distributed memory system. Drawing inspiration from Echo State Networks, the most prominent instance of the Reservoir Computing paradigm, our approach leverages reservoirs (random recurrent networks) as a lightweight and efficient memory. Our architecture integrates a new module called ''Working Memory'' based on several reservoirs working in parallel. These reservoirs work as independent working memory units with distinct internal dynamics. A novelty here is that the classical reservoir hyperparameters, controlling the dynamics, are now trained. Thus, the EST dynamically adapts the reservoir memory/non-linearity trade-off. Thanks to these working memory units, EST achieves constant computational complexity at each processing step, effectively breaking the quadratic scaling problem of standard Transformers. We evaluate ESTs on a recent challenging timeseries benchmark: the Time Series Library, which comprises 69 tasks across five categories. Results show that ESTs ranks first overall in two of five categories, outperforming strong state-of-the-art baselines on classification and anomaly detection tasks, while remaining competitive on short-term forecasting. These results position ESTs as a compelling alternative for time-series classification and anomaly detection, and a practical complement to transformer-style models in applications that prioritize robust representations and sensitive event detection.


Uniform Information Density and Syntactic Reduction: Revisiting $\textit{that}$-Mentioning in English Complement Clauses

arXiv.org Artificial Intelligence

Speakers often have multiple ways to express the same meaning. The Uniform Information Density (UID) hypothesis suggests that speakers exploit this variability to maintain a consistent rate of information transmission during language production. Building on prior work linking UID to syntactic reduction, we revisit the finding that the optional complementizer $\textit{that}$ in English complement clauses is more likely to be omitted when the clause has low information density (i.e., more predictable). We advance this line of research by analyzing a large-scale, contemporary conversational corpus and using machine learning and neural language models to refine estimates of information density. Our results replicated the established relationship between information density and $\textit{that}$-mentioning. However, we found that previous measures of information density based on matrix verbs' subcategorization probability capture substantial idiosyncratic lexical variation. By contrast, estimates derived from contextual word embeddings account for additional variance in patterns of complementizer usage.


RCUKF: Data-Driven Modeling Meets Bayesian Estimation

arXiv.org Machine Learning

Accurate modeling is crucial in many engineering and scientific applications, yet obtaining a reliable process model for complex systems is often challenging. To address this challenge, we propose a novel framework, reservoir computing with unscented Kalman filtering (RCUKF), which integrates data-driven modeling via reservoir computing (RC) with Bayesian estimation through the unscented Kalman filter (UKF). The RC component learns the nonlinear system dynamics directly from data, serving as a surrogate process model in the UKF prediction step to generate state estimates in high-dimensional or chaotic regimes where nominal mathematical models may fail. Meanwhile, the UKF measurement update integrates real-time sensor data to correct potential drift in the data-driven model. We demonstrate RCUKF effectiveness on well-known benchmark problems and a real-time vehicle trajectory estimation task in a high-fidelity simulation environment.


JAEGER: Dual-Level Humanoid Whole-Body Controller

arXiv.org Artificial Intelligence

Due to hardware constraints and the inherent complexity of the robotic action space, achieving effective whole-body control (WBC) for adult-sized humanoid robots, such as the Unitree H1-2, remains a significant challenge. Recent studies on WBC have demonstrated promising advancements, enabling humanoid robots to perform versatile motions by learning from extensive human data [1, 2, 3, 4, 5]. Based on different task settings, WBC methodologies can be broadly categorized into three types: root velocity tracking [6], kinematic position tracking [1, 3], and local joint angle tracking [6, 2, 4]. Root velocity tracking emphasizes coarse-grained control, where the robot tracks a given velocity without relying on a specific reference pose. In contrast, kinematic position and local joint angle tracking focus on accurately reproducing a given trajectory of reference poses, which can be regarded as fine-grained control for humanoids.


Export Reviews, Discussions, Author Feedback and Meta-Reviews

Neural Information Processing Systems

A nice advantage of predictive representations of stochastic processes is that they can be expressed in terms of families of linear operators --- the "observable operators" of Jaeger (oddly, not cited in this paper; also, see Upper, and the appendix to Shalizi and Crutchfield). This paper proposes (following some earlier work) to exploit this fact, by using the instrumental variables technique from econometrics to simplify the estimation of such models. Doing so results in an estimation procedure very similar to that of Langford et al. from 2009 (reference [16] in the paper), but with some advantages in terms of avoiding iterative re-estimation. However, there seems to be an important issue which isn't (that I saw) addressed here. The instrumental variable needs to be correlated with the input variable to the regression, but independent of the noise in the regression.


That's Optional: A Contemporary Exploration of "that" Omission in English Subordinate Clauses

arXiv.org Artificial Intelligence

First, effectiveness of their utterances when faced with we extend the investigation to a much larger corpus multiple options for structuring a message. The of informal written English collected from social UID hypothesis (Frank and Jaeger, 2008; Collins, media. Second, we use contemporary large language 2014; Hahn et al., 2020) suggests that speakers models (LLMs) to estimate the operationalizations tend to spread information evenly throughout an of information uniformity in syntactic reduction, utterance, avoiding large fluctuations in the perunit suggesting the robustness of our findings.


Volvo and Aurora introduce their first self-driving truck

Engadget

Volvo and Aurora have unveiled their first production autonomous truck, three years after the companies initially announced that they were teaming up. They've just showed off the Volvo VNL Autonomous truck, which was designed by autonomous trucking and robotaxi company Aurora but will be manufactured by Volvo, at ACT Expo in Las Vegas. It's powered by Aurora Driver, a level 4 autonomous driving system that uses high-resolution cameras, imaging radars, a LiDAR sensor that can detect objects up to 400 meters away and even more sensors. Aurora's technology has driven billions of virtual miles for training, as well as 1.5 million commercial miles on actual public roads. For safety purposes, the truck has "redundant steering, braking, communication, computation, power management, energy storage and vehicle motion management systems." According to TechCrunch, the vehicle will still have a human driver behind the wheel to take over whenever needed when it starts ferrying cargo across North America over the next few months.


Reservoir Computing Benchmarks: a review, a taxonomy, some best practices

arXiv.org Artificial Intelligence

Reservoir Computing is an Unconventional Computation model to perform computation on various different substrates, such as RNNs or physical materials. The method takes a "black-box" approach, training only the outputs of the system it is built on. As such, evaluating the computational capacity of these systems can be challenging. We review and critique the evaluation methods used in the field of Reservoir Computing. We introduce a categorisation of benchmark tasks. We review multiple examples of benchmarks from the literature as applied to reservoir computing, and note their strengths and shortcomings. We suggest ways in which benchmarks and their uses may be improved to the benefit of the reservoir computing community


Jaeger: A Concatenation-Based Multi-Transformer VQA Model

arXiv.org Artificial Intelligence

Document-based Visual Question Answering poses a challenging task between linguistic sense disambiguation and fine-grained multimodal retrieval. Although there has been encouraging progress in document-based question answering due to the utilization of large language and open-world prior models\cite{1}, several challenges persist, including prolonged response times, extended inference durations, and imprecision in matching. In order to overcome these challenges, we propose Jaegar, a concatenation-based multi-transformer VQA model. To derive question features, we leverage the exceptional capabilities of RoBERTa large\cite{2} and GPT2-xl\cite{3} as feature extractors. Subsequently, we subject the outputs from both models to a concatenation process. This operation allows the model to consider information from diverse sources concurrently, strengthening its representational capability. By leveraging pre-trained models for feature extraction, our approach has the potential to amplify the performance of these models through concatenation. After concatenation, we apply dimensionality reduction to the output features, reducing the model's computational effectiveness and inference time. Empirical results demonstrate that our proposed model achieves competitive performance on Task C of the PDF-VQA Dataset. If the user adds any new data, they should make sure to style it as per the instructions provided in previous sections.


A Cross-Linguistic Pressure for Uniform Information Density in Word Order

arXiv.org Artificial Intelligence

While natural languages differ widely in both canonical word order and word order flexibility, their word orders still follow shared cross-linguistic statistical patterns, often attributed to functional pressures. In the effort to identify these pressures, prior work has compared real and counterfactual word orders. Yet one functional pressure has been overlooked in such investigations: the uniform information density (UID) hypothesis, which holds that information should be spread evenly throughout an utterance. Here, we ask whether a pressure for UID may have influenced word order patterns cross-linguistically. To this end, we use computational models to test whether real orders lead to greater information uniformity than counterfactual orders. In our empirical study of 10 typologically diverse languages, we find that: (i) among SVO languages, real word orders consistently have greater uniformity than reverse word orders, and (ii) only linguistically implausible counterfactual orders consistently exceed the uniformity of real orders. These findings are compatible with a pressure for information uniformity in the development and usage of natural languages.