Goto

Collaborating Authors

 vect


Self-Attention as a Parametric Endofunctor: A Categorical Framework for Transformer Architectures

arXiv.org Artificial Intelligence

Self-attention mechanisms have revolutionised deep learning architectures, yet their core mathematical structures remain incompletely understood. In this work, we develop a category-theoretic framework focusing on the linear components of self-attention. Specifically, we show that the query, key, and value maps naturally define a parametric 1-morphism in the 2-category $\mathbf{Para(Vect)}$. On the underlying 1-category $\mathbf{Vect}$, these maps induce an endofunctor whose iterated composition precisely models multi-layer attention. We further prove that stacking multiple self-attention layers corresponds to constructing the free monad on this endofunctor. For positional encodings, we demonstrate that strictly additive embeddings correspond to monoid actions in an affine sense, while standard sinusoidal encodings, though not additive, retain a universal property among injective (faithful) position-preserving maps. We also establish that the linear portions of self-attention exhibit natural equivariance to permutations of input tokens, and show how the "circuits" identified in mechanistic interpretability can be interpreted as compositions of parametric 1-morphisms. This categorical perspective unifies geometric, algebraic, and interpretability-based approaches to transformer analysis, making explicit the underlying structures of attention. We restrict to linear maps throughout, deferring the treatment of nonlinearities such as softmax and layer normalisation, which require more advanced categorical constructions. Our results build on and extend recent work on category-theoretic foundations for deep learning, offering deeper insights into the algebraic structure of attention mechanisms.


A Multi-Bennett 8R Mechanism Obtained From Factorization of Bivariate Motion Polynomials

arXiv.org Artificial Intelligence

Overconstrained linkages is a long-lasting but still highly active topic of research in mechanism science. For several decades, researchers focused on overconstrained mechanisms consisting of a single loop of n 6 revolute joints (R), prismatic joints (P), or, sometimes, helical joints (H). New linkages of that type are continuously being discovered, often by craftily combining known linkages [2, 3, 28], sometimes via novel concepts for their construction. One of these concepts is the factorization of motion polynomials [8]. It gave rise to the construction of the only class of overconstrained 6R linkages with still unknown relations between its Denavit-Hartenberg parameters. In [6, 9, 17-20], motion polynomial factorization was exploited for the synthesis of linkages. In spite of some attempts, a complete classification of overconstrained single-loop linkages is currently out of reach. It is thus natural that research efforts shifted towards the investigation of single-loop linkages consisting of n 7 links with, generically, n 6 1 degrees of freedom.


AI Neurotechnology for Aging Societies -- Task-load and Dementia EEG Digital Biomarker Development Using Information Geometry Machine Learning Methods

arXiv.org Artificial Intelligence

Dementia and especially Alzheimer's disease (AD) are the most common causes of cognitive decline in elderly people. A spread of the above mentioned mental health problems in aging societies is causing a significant medical and economic burden in many countries around the world. According to a recent World Health Organization (WHO) report, it is approximated that currently, worldwide, about 47 million people live with a dementia spectrum of neurocognitive disorders. This number is expected to triple by 2050, which calls for possible application of AI-based technologies to support an early screening for preventive interventions and a subsequent mental wellbeing monitoring as well as maintenance with so-called digital-pharma or beyond a pill therapeutical approaches. This paper discusses our attempt and preliminary results of brainwave (EEG) techniques to develop digital biomarkers for dementia progress detection and monitoring. We present an information geometry-based classification approach for automatic EEG-derived event related responses (ERPs) discrimination of low versus high task-load auditory or tactile stimuli recognition, of which amplitude and latency variabilities are similar to those in dementia. The discussed approach is a step forward to develop AI, and especially machine learning (ML) approaches, for the subsequent application to mild-cognitive impairment (MCI) and AD diagnostics.


OpenIAS Hybrid Generative-Discriminative Deep Models

@machinelearnbot

Deep discriminative classifiers perform remarkably well on problems with a lot of labeled data. So-called deep generative models tend to excel when labeled training data is scarce. Can we do a hybrid, combining the best of both worlds? In this post I outline a hybrid generative-discriminative deep model loosely based on the importance weighted autoencoder (Burda et al., 2015). Don't miss the pretty pictures.