Goto

Collaborating Authors

 term memory


Scaling Competence, Shrinking Reasoning: Cognitive Signatures in Language Model Learning

Singh, Mukul, Singha, Ananya, Radhakrishna, Arjun, Gulwani, Sumit

arXiv.org Artificial Intelligence

We analyze reasoning in language models during task-specific fine-tuning and draws parallel between reasoning tokens--intermediate steps generated while solving problem and the human working memory. Drawing from cognitive science, we align training dynamics with the Four Stages of Competence: models initially produce incorrect outputs without reasoning, then begin reasoning (but still fail), eventually reason effectively, and finally solve tasks without explicit reasoning. We find that reasoning token length expands as performance improves, peaks at the stage of conscious competence, then declines as the model internalizes the task. Notably, after training, models retain performance even when reasoning is removed--suggesting it scaffolded learning but is no longer needed. This progression offers actionable insights: reasoning token dynamics can serve as a signal for diagnosing training stage, identifying convergence, and guiding early stopping. We propose metrics to track this trajectory and argue that reasoning behavior is valuable for understanding and optimizing reasoning model training.


Artificial Intelligence Software Structured to Simulate Human Working Memory, Mental Imagery, and Mental Continuity

Reser, Jared Edward

arXiv.org Artificial Intelligence

This article presents an artificial intelligence (AI) architecture intended to simulate the iterative updating of the human working memory system. It features several interconnected neural networks designed to emulate the specialized modules of the cerebral cortex. These are structured hierarchically and integrated into a global workspace. They are capable of temporarily maintaining high-level representational patterns akin to the psychological items maintained in working memory. This maintenance is made possible by persistent neural activity in the form of two modalities: sustained neural firing (resulting in a focus of attention) and synaptic potentiation (resulting in a short-term store). Representations held in persistent activity are recursively replaced resulting in incremental changes to the content of the working memory system. As this content gradually evolves, successive processing states overlap and are continuous with one another. The present article will explore how this architecture can lead to iterative shift in the distribution of coactive representations, ultimately leading to mental continuity between processing states, and thus to human-like thought and cognition. Like the human brain, this AI working memory store will be linked to multiple imagery (topographic map) generation systems corresponding to various sensory modalities. As working memory is iteratively updated, the maps created in response will construct sequences of related mental imagery. Thus, neural networks emulating the prefrontal cortex and its reciprocal interactions with early sensory and motor cortex capture the imagery guidance functions of the human brain. This sensory and motor imagery creation, coupled with an iteratively updated working memory store may provide an AI system with the cognitive assets needed to achieve synthetic consciousness or artificial sentience.


Hall Effect Thruster Forecasting using a Topological Approach for Data Assimilation

Chumley, Max M., Khasawneh, Firas A.

arXiv.org Artificial Intelligence

Hall Effect Thrusters (HETs) are electric thrusters that eject heavy ionized gas particles from the spacecraft to generate thrust. Although traditionally they were used for station keeping, recently They have been used for interplanetary space missions due to their high delta-V potential and their operational longevity in contrast to other thrusters, e.g., chemical. However, the operation of HETs involves complex processes such as ionization of gases, strong magnetic fields, and complicated solar panel power supply interactions. Therefore, their operation is extremely difficult to model thus necessitating Data Assimilation (DA) approaches for estimating and predicting their operational states. Because HET's operating environment is often noisy with non-Gaussian sources, this significantly limits applicable DA tools. We describe a topological approach for data assimilation that bypasses these limitations that does not depend on the noise model, and utilize it to forecast spatiotemporal plume field states of HETs. Our approach is a generalization of the Topological Approach for Data Assimilation (TADA) method that allows including different forecast functions. We show how TADA can be combined with the Long Short-Term Memory network for accurate forecasting. We then apply our approach to high-fidelity Hall Effect Thruster (HET) simulation data from the Air Force Research Laboratory (AFRL) rocket propulsion division where we demonstrate the forecast resiliency of TADA on noise contaminated, high-dimensional data.


Creating Scalable AGI: the Open General Intelligence Framework

Dollinger, Daniel A., Singleton, Michael

arXiv.org Artificial Intelligence

Recent advancements in Artificial Intelligence (AI), particularly with Large Language Models (LLMs), have led to significant progress in narrow tasks such as image classification, language translation, coding, and writing. However, these models face limitations in reliability and scalability due to their siloed architectures, which are designed to handle only one data modality (data type) at a time. This single modal approach hinders their ability to integrate the complex set of data points required for real-world challenges and problem-solving tasks like medical diagnosis, quality assurance, equipment troubleshooting, and financial decision-making. Addressing these real-world challenges requires a more capable Artificial General Intelligence (AGI) system. Our primary contribution is the development of the Open General Intelligence (OGI) framework, a novel systems architecture that serves as a macro design reference for AGI. The OGI framework adopts a modular approach to the design of intelligent systems, based on the premise that cognition must occur across multiple specialized modules that can seamlessly operate as a single system. OGI integrates these modules using a dynamic processing system and a fabric interconnect, enabling real-time adaptability, multi-modal integration, and scalable processing. The OGI framework consists of three key components: (1) Overall Macro Design Guidance that directs operational design and processing, (2) a Dynamic Processing System that controls routing, primary goals, instructions, and weighting, and (3) Framework Areas, a set of specialized modules that operate cohesively to form a unified cognitive system. By incorporating known principles from human cognition into AI systems, the OGI framework aims to overcome the challenges observed in today's intelligent systems, paving the way for more holistic and context-aware problem-solving capabilities.


Exploring Unseen Environments with Robots using Large Language and Vision Models through a Procedurally Generated 3D Scene Representation

S, Arjun P, Melnik, Andrew, Nandi, Gora Chand

arXiv.org Artificial Intelligence

Recent advancements in Generative Artificial Intelligence, particularly in the realm of Large Language Models (LLMs) and Large Vision Language Models (LVLMs), have enabled the prospect of leveraging cognitive planners within robotic systems. This work focuses on solving the object goal navigation problem by mimicking human cognition to attend, perceive and store task specific information and generate plans with the same. We introduce a comprehensive framework capable of exploring an unfamiliar environment in search of an object by leveraging the capabilities of Large Language Models(LLMs) and Large Vision Language Models (LVLMs) in understanding the underlying semantics of our world. A challenging task in using LLMs to generate high level sub-goals is to efficiently represent the environment around the robot. We propose to use a 3D scene modular representation, with semantically rich descriptions of the object, to provide the LLM with task relevant information. But providing the LLM with a mass of contextual information (rich 3D scene semantic representation), can lead to redundant and inefficient plans. We propose to use an LLM based pruner that leverages the capabilities of in-context learning to prune out irrelevant goal specific information.


Extending Memory for Language Modelling

Nugaliyadde, Anupiya

arXiv.org Artificial Intelligence

Breakthroughs in deep learning and memory networks have made major advances in natural language understanding. Language is sequential and information carried through the sequence can be captured through memory networks. Learning the sequence is one of the key aspects in learning the language. However, memory networks are not capable of holding infinitely long sequences in their memories and are limited by various constraints such as the vanishing or exploding gradient problem. Therefore, natural language understanding models are affected when presented with long sequential text. We introduce Long Term Memory network (LTM) to learn from infinitely long sequences. LTM gives priority to the current inputs to allow it to have a high impact. Language modeling is an important factor in natural language understanding. LTM was tested in language modeling, which requires long term memory. LTM is tested on Penn Tree bank dataset, Google Billion Word dataset and WikiText-2 dataset. We compare LTM with other language models which require long term memory.


Using Machine Learning to Classify Tweets

#artificialintelligence

I recently had the opportunity to take on a project with Inspirit AI where I worked with a team to use machine learning to classify whether tweets were considered positive, negative, or neutral as they related to different stocks. In order to do that we explored three different machine learning models for classifying text: bag-of-words, long short-term memory (LSTM), and bidirectional encoder representations from transformers (BERT). Here I describe our experience in solving this problem and highlight what we learned from using the different methods. The bag of words model works by grouping each word into a bag or frequency count based on how often the word is used. The frequency count can be used as a feature in a machine learning model.


Papers to Read on using Long Short Term Memory(LSTM) architecture in forecasting

#artificialintelligence

Abstract: The spread of COVID-19 has coincided with the rise of Graph Neural Networks (GNNs), leading to several studies proposing their use to better forecast the evolution of the pandemic. Many such models also include Long Short TermMemory (LSTM) networks, a common tool for time series forecasting. In this work, we further investigate the integration of these two methods by implementing GNNs within the gates of an LSTM and exploiting spatial information. In addition, we introduce a skip connection which proves critical to jointly capture the spatial and temporal patterns in the data. We validate our daily COVID-19 new cases forecast model on data of 37 European nations for the last 472 days and show superior performance compared to state-of-the-art graph time series models based on mean absolute scaled error (MASE).


Gated Recurrent Unit

#artificialintelligence

GRU or Gated recurrent unit is an advancement of the standard RNN i.e recurrent neural network. It was introduced by Kyunghyun Cho et al in the year 2014. Note: If you are more interested in learning concepts in an Audio-Visual format, We have this entire article explained in the video below. If not, you may continue reading. GRUs are very similar to Long Short Term Memory(LSTM).


LSTM Vs GRU in Recurrent Neural Network: A Comparative Study

#artificialintelligence

A recurrent neural network is a type of ANN that is used when users want to perform predictive operations on sequential or time-series based data. These Deep learning layers are commonly used for ordinal or temporal problems such as Natural Language Processing, Neural Machine Translation, automated image captioning tasks and likewise. Today's modern voice assistance devices such as Google Assistance, Alexa, Siri are incorporated with these layers to fulfil hassle-free experiences for users. The main difference between the RNN and CNN is that RNN is incorporated with memory to take any information from prior inputs to influence the Current input and output. Training methods for this network are the same.