Goto

Collaborating Authors

 internal state


EVAAA: AVirtual Environment Platform for Essential Variables in Autonomous and Adaptive Agents

Neural Information Processing Systems

Appendix A describes the Unity-based interface implemented in EVAAA, including an environment setup, prefab structures, and object instantiation. Appendix B provides a comprehensive introduction to Essential Variables (EVs), including their design, dynamics, and role in internal state regulation. Appendix C explains the implementation of the reward system and its connection to the balance of internal states. Appendix E outlines the modular configuration to generate EVAAA environments, along with the instructions for environment customization. Appendix F presents the structure and progression of naturalistic training environments. Appendix G describes the design of unseen experimental testbeds for evaluation. Appendix I provides analyses of agent behavior across training and test environments, including emergent behavioral patterns. All code and data are publicly available at: https://github.com/cocoanlab/evaaa A.1 Prefabs Environmental elements such as terrain, resources, obstacles, and predators are implemented as reusable and configurable Unity prefabs. Prefabs are grouped into Agents, Environment, and Materials. Each category includes reusable components for constructing and customizing interactive scenes: Agents (main agent and predators), Environment (terrain and containers), and Materials (varied textures and colors for visual distinction). This modular system enables rapid prototyping, task generation, condition randomization, and reproducible scene setup. Prefabs can be customized through the Unity Editor or programmatically at runtime, and reused across scenes without manual rebuilding.



Unsupervised Learning for Physical Interaction through Video Prediction

Neural Information Processing Systems

A core challenge for an agent learning to interact with the world is to predicthow its actions affect objects in its environment. Many existing methods for learning the dynamics of physical interactions require labeled object information. However, to scale real-world interaction learning to a variety of scenes and objects, acquiring labeled data becomes increasingly impractical. To learn about physical object motion without labels, we develop an action-conditioned video prediction model that explicitly models pixel motion, by predicting a distribution over pixel motion from previous frames. Because our model explicitly predicts motion, it is partially invariant to object appearance, enabling it to generalize to previously unseen objects. To explore video prediction for real-world interactive agents, we also introduce a dataset of 59,000 robot interactions involving pushing motions, including a test set with novel objects. In this dataset, accurate prediction of videosconditioned on the robot's future actions amounts to learning a "visual imagination"of different futures based on different courses of action. Our experiments show that our proposed method produces more accurate video predictions both quantitatively and qualitatively, when compared to prior methods.



Appendices A Proof for Theorem 1 Before proceeding, let us define an additional term S = null

Neural Information Processing Systems

Note that trivially, we have S KM . 's are generated under policy We now construct a high-probability confidence set. This lemma is based on Lemma 17 of Jaksch et al. [2010], which is based on the following From Theorem 2.1 in Weissman et al. [2003], for any null > 0, we have P {nullp () ˆ p()null On the other hand, based on the Hoeffding's inequality, if we choose "constant-shift" property of DP operator, and (c) follows from


A Quantifiable Information-Processing Hierarchy Provides a Necessary Condition for Detecting Agency

arXiv.org Machine Learning

As intelligent systems are developed across diverse substrates - from machine learning models and neuromorphic hardware to in vitro neural cultures - understanding what gives a system agency has become increasingly important. Existing definitions, however, tend to rely on top-down descriptions that are difficult to quantify. We propose a bottom-up framework grounded in a system's information-processing order: the extent to which its transformation of input evolves over time. We identify three orders of information processing. Class I systems are reactive and memoryless, mapping inputs directly to outputs. Class II systems incorporate internal states that provide memory but follow fixed transformation rules. Class III systems are adaptive; their transformation rules themselves change as a function of prior activity. While not sufficient on their own, these dynamics represent necessary informational conditions for genuine agency. This hierarchy offers a measurable, substrate-independent way to identify the informational precursors of agency. We illustrate the framework with neurophysiological and computational examples, including thermostats and receptor-like memristors, and discuss its implications for the ethical and functional evaluation of systems that may exhibit agency.


Globally Convergent Policy Search for Output Estimation

Neural Information Processing Systems

We introduce the first direct policy search algorithm which provably converges to the globally optimal dynamic filter for the classical problem of predicting the outputs of a linear dynamical system, given noisy, partial observations. Despite the ubiquity of partial observability in practice, theoretical guarantees for direct policy search algorithms, one of the backbones of modern reinforcement learning, have proven difficult to achieve. This is primarily due to the degeneracies which arise when optimizing over filters that maintain an internal state. In this paper, we provide a new perspective on this challenging problem based on the notion of informativity, which intuitively requires that all components of a filter's internal state are representative of the true state of the underlying dynamical system. We show that informativity overcomes the aforementioned degeneracy. Specifically, we propose a regularizer which explicitly enforces informativity, and establish that gradient descent on this regularized objective - combined with a "reconditioning step" - converges to the globally optimal cost at a $O(1/T)$ rate.


Differentially Private Learning Needs Hidden State (Or Much Faster Convergence)

Neural Information Processing Systems

Prior work on differential privacy analysis of randomized SGD algorithms relies on composition theorems, where the implicit (unrealistic) assumption is that the internal state of the iterative algorithm is revealed to the adversary. As a result, the R\'enyi DP bounds derived by such composition-based analyses linearly grow with the number of training epochs. When the internal state of the algorithm is hidden, we prove a converging privacy bound for noisy stochastic gradient descent (on strongly convex smooth loss functions). We show how to take advantage of privacy amplification by sub-sampling and randomized post-processing, and prove the dynamics of privacy bound for sample without replacement'' stochastic mini-batch gradient descent schemes. We prove that, in these settings, our privacy bound converges exponentially fast and is substantially smaller than the composition bounds, notably after a few number of training epochs. Thus, unless the DP algorithm converges fast, our privacy analysis shows that hidden state analysis can significantly amplify differential privacy.


Defining the Scope of Learning Analytics: An Axiomatic Approach for Analytic Practice and Measurable Learning Phenomena

arXiv.org Machine Learning

Learning Analytics (LA) has rapidly expanded through practical and technological innovation, yet its foundational identity has remained theoretically under-specified. This paper addresses this gap by proposing the first axiomatic theory that formally defines the essential structure, scope, and limitations of LA. Derived from the psychological definition of learning and the methodological requirements of LA, the framework consists of five axioms specifying discrete observation, experience construction, state transition, and inference. From these axioms, we derive a set of theorems and propositions that clarify the epistemological stance of LA, including the inherent unobservability of learner states, the irreducibility of temporal order, constraints on reachable states, and the impossibility of deterministically predicting future learning. We further define LA structure and LA practice as formal objects, demonstrating the sufficiency and necessity of the axioms and showing that diverse LA approaches -- such as Bayesian Knowledge Tracing and dashboards -- can be uniformly explained within this framework. The theory provides guiding principles for designing analytic methods and interpreting learning data while avoiding naive behaviorism and category errors by establishing an explicit theoretical inference layer between observations and states. This work positions LA as a rigorous science of state transition systems based on observability, establishing the theoretical foundation necessary for the field's maturation as a scholarly discipline.


Classifying German Language Proficiency Levels Using Large Language Models

arXiv.org Artificial Intelligence

Assessing language proficiency is essential for education, as it enables instruction tailored to learners needs. This paper investigates the use of Large Language Models (LLMs) for automatically classifying German texts according to the Common European Framework of Reference for Languages (CEFR) into different proficiency levels. To support robust training and evaluation, we construct a diverse dataset by combining multiple existing CEFR-annotated corpora with synthetic data. We then evaluate prompt-engineering strategies, fine-tuning of a LLaMA-3-8B-Instruct model and a probing-based approach that utilizes the internal neural state of the LLM for classification. Our results show a consistent performance improvement over prior methods, highlighting the potential of LLMs for reliable and scalable CEFR classification.