Goto

Collaborating Authors

 timmy


PaPaformer: Language Model from Pre-trained Parallel Paths

Tapaninaho, Joonas, Oussala, Mourad

arXiv.org Artificial Intelligence

The training of modern large-language models requires an increasingly amount of computation power and time. Even smaller variants, such as small-language models (SLMs), take several days to train in the best-case scenarios, often requiring multiple GPUs. This paper explores methods to train and evaluate decoder-only transformer-based language models in hours instead of days/weeks. We introduces \textit{PaPaformer}, a decoder-only transformer architecture variant, whose lower-dimensional parallel paths are combined into larger model. The paper shows that these lower-dimensional paths can be trained individually with different types of training data and then combined into one larger model. This method gives the option to reduce the total number of model parameters and the training time with increasing performance. Moreover, the use of parallel path structure opens interesting possibilities to customize paths to accommodate specific task requirements.


Identifying Sparsely Active Circuits Through Local Loss Landscape Decomposition

Chrisman, Brianna, Bushnaq, Lucius, Sharkey, Lee

arXiv.org Artificial Intelligence

Much of mechanistic interpretability has focused on understanding the activation spaces of large neural networks. However, activation space-based approaches reveal little about the underlying circuitry used to compute features. To better understand the circuits employed by models, we introduce a new decomposition method called Local Loss Landscape Decomposition (L3D). L3D identifies a set of low-rank subnetworks: directions in parameter space of which a subset can reconstruct the gradient of the loss between any sample's output and a reference output vector. We design a series of progressively more challenging toy models with well-defined subnetworks and show that L3D can nearly perfectly recover the associated subnetworks. Additionally, we investigate the extent to which perturbing the model in the direction of a given subnetwork affects only the relevant subset of samples. Finally, we apply L3D to a real-world transformer model and a convolutional neural network, demonstrating its potential to identify interpretable and relevant circuits in parameter space.


Teaching Shortest Path Algorithms With a Robot and Overlaid Projections

Jolakoski, Pavel, Deja, Jordan Aiko, Pucihar, Klen Čopič, Kljun, Matjaž

arXiv.org Artificial Intelligence

Robots have the potential to enhance teaching of advanced computer science topics, making abstract concepts more tangible and interactive. In this paper, we present Timmy-a GoPiGo robot augmented with projections to demonstrate shortest path algorithms in an interactive learning environment. We integrated a JavaScript-based application that is projected around the robot, which allows users to construct graphs and visualise three different shortest path algorithms with colour-coded edges and vertices. Animated graph exploration and traversal are augmented by robot movements. To evaluate Timmy, we conducted two user studies. An initial study (= 10) to explore the feasibility of this type of teaching where participants were just observing both robot-synced and the on-screen-only visualisations. And a pilot study (= 6) where participants actively interacted with the system, constructed graphs and selected desired algorithms. In both studies we investigated the preferences towards the system and not the teaching outcome. Initial findings suggest that robots offer an engaging tool for teaching advanced algorithmic concepts, but highlight the need for further methodological refinements and larger-scale studies to fully evaluate their effectiveness.


Implicit Geometry of Next-token Prediction: From Language Sparsity Patterns to Model Representations

Zhao, Yize, Behnia, Tina, Vakilian, Vala, Thrampoulidis, Christos

arXiv.org Artificial Intelligence

Next-token prediction (NTP) over large text corpora has become the go-to paradigm to train large language models. Yet, it remains unclear how NTP influences the mapping of linguistic patterns to geometric properties of the resulting model representations. We frame training of large language models as soft-label classification over sparse probabilistic label vectors, coupled with an analytical approximation that allows unrestricted generation of context embeddings. This approach links NTP training to rank-constrained, nuclear-norm regularized optimization in the logit domain, offering a framework for analyzing the geometry of word and context embeddings. In large embedding spaces, we find that NTP implicitly favors learning logits with a sparse plus low-rank structure. While the sparse component captures the co-occurrence frequency of context-word pairs, the orthogonal low-rank component, which becomes dominant as training progresses, depends solely on the sparsity pattern of the co-occurrence matrix. Consequently, when projected onto an appropriate subspace, representations of contexts that are followed by the same set of next-tokens collapse, a phenomenon we term subspace-collapse. We validate our findings on synthetic and small-scale real language datasets. Finally, we outline potential research directions aimed at deepening the understanding of NTP's influence on the learning of linguistic patterns and regularities.


Assessing Language Models' Worldview for Fiction Generation

Khatun, Aisha, Brown, Daniel G.

arXiv.org Artificial Intelligence

The use of Large Language Models (LLMs) has become ubiquitous, with abundant applications in computational creativity. One such application is fictional story generation. Fiction is a narrative that occurs in a story world that is slightly different than ours. With LLMs becoming writing partners, we question how suitable they are to generate fiction. This study investigates the ability of LLMs to maintain a state of world essential to generate fiction. Through a series of questions to nine LLMs, we find that only two models exhibit consistent worldview, while the rest are self-conflicting. Subsequent analysis of stories generated by four models revealed a strikingly uniform narrative pattern. This uniformity across models further suggests a lack of `state' necessary for fiction. We highlight the limitations of current LLMs in fiction writing and advocate for future research to test and create story worlds for LLMs to reside in. All code, dataset, and the generated responses can be found in https://github.com/tanny411/llm-reliability-and-consistency-evaluation.


Lost in the Woods -- an AI Rescue Story!

#artificialintelligence

Timmy's first experience with a project-based learning activity was exciting. He loved the idea that he was able to help the homeless people in his neighborhood. As part of the experience, he was able to meet with and talk to business owners, local area experts, and the city mayor. His continued efforts resulted in renewed community support for the homeless problem and the building of two new homeless shelters in his neighborhood. When his science teacher indicated that his class was about to embark on a new project-based learning activity, he was excited once again.