AITopics | pfsa

Collaborating Authors

pfsa

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Information Locality as an Inductive Bias for Neural Language Models

Someya, Taiga, Svete, Anej, DuSell, Brian, O'Donnell, Timothy J., Giulianelli, Mario, Cotterell, Ryan

arXiv.org Artificial IntelligenceJun-6-2025

Inductive biases are inherent in every machine learning system, shaping how models generalize from finite data. In the case of neural language models (LMs), debates persist as to whether these biases align with or diverge from human processing constraints. To address this issue, we propose a quantitative framework that allows for controlled investigations into the nature of these biases. Within our framework, we introduce $m$-local entropy$\unicode{x2013}$an information-theoretic measure derived from average lossy-context surprisal$\unicode{x2013}$that captures the local uncertainty of a language by quantifying how effectively the $m-1$ preceding symbols disambiguate the next symbol. In experiments on both perturbed natural language corpora and languages defined by probabilistic finite-state automata (PFSAs), we show that languages with higher $m$-local entropy are more difficult for Transformer and LSTM LMs to learn. These results suggest that neural LMs, much like humans, are highly sensitive to the local statistical structure of a language.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2506.05136

Country:

North America > United States (1.00)
Europe (0.93)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

On the Representational Capacity of Neural Language Models with Chain-of-Thought Reasoning

Nowak, Franz, Svete, Anej, Butoi, Alexandra, Cotterell, Ryan

arXiv.org Artificial IntelligenceJun-20-2024

The performance of modern language models (LMs) has been improved by chain-of-thought (CoT) reasoning, i.e., the process of generating intermediate results that guide the model towards a final answer. A possible explanation for this improvement is that CoT reasoning extends an LM's computational power, as RNNs and transformers with additional scratch space are known to be Turing complete. Comparing LMs to Turing machines, however, introduces a category error - Turing machines decide language membership, whereas LMs define distributions over strings. To bridge this gap, we formalize CoT reasoning in a probabilistic setting. We present several results on the representational capacity of recurrent and transformer LMs with CoT reasoning, showing that they can represent the same family of distributions over strings as probabilistic Turing machines.

cot reasoning, transformer, transition, (14 more...)

arXiv.org Artificial Intelligence

2406.14197

Country:

North America > Mexico > Mexico City > Mexico City (0.04)
Asia > Singapore (0.04)
Oceania > Australia > Victoria > Melbourne (0.04)
(6 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Lower Bounds on the Expressivity of Recurrent Neural Language Models

Svete, Anej, Nowak, Franz, Sahabdeen, Anisha Mohamed, Cotterell, Ryan

arXiv.org Artificial IntelligenceJun-18-2024

The recent successes and spread of large neural language models (LMs) call for a thorough understanding of their computational ability. Describing their computational abilities through LMs' \emph{representational capacity} is a lively area of research. However, investigation into the representational capacity of neural LMs has predominantly focused on their ability to \emph{recognize} formal languages. For example, recurrent neural networks (RNNs) with Heaviside activations are tightly linked to regular languages, i.e., languages defined by finite-state automata (FSAs). Such results, however, fall short of describing the capabilities of RNN \emph{language models} (LMs), which are definitionally \emph{distributions} over strings. We take a fresh look at the representational capacity of RNN LMs by connecting them to \emph{probabilistic} FSAs and demonstrate that RNN LMs with linearly bounded precision can express arbitrary regular LMs.

language model, lms, pfsa, (14 more...)

arXiv.org Artificial Intelligence

2405.19222

Country:

Asia > Thailand > Bangkok > Bangkok (0.04)
Asia > Singapore (0.04)
Asia > Middle East > Qatar > Ad-Dawhah > Doha (0.04)
(10 more...)

Genre: Research Report (0.81)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Data Smashing 2.0: Sequence Likelihood (SL) Divergence For Fast Time Series Comparison

Huang, Yi, Chattopadhyay, Ishanu

arXiv.org Machine LearningSep-26-2019

Abstract--Recognizing subtle historical patterns is central to modeling and forecasting problems in time series analysis. Here we introduce and develop a new approach to quantify deviations in the underlying hidden generators of observed data streams, resulting in a new efficiently computable universal metric for time series. The proposed metric is universal in the sense that we can compare and contrast data streams regardless of where and how they are generated, and without any feature engineering step. The approach proposed in this paper is conceptually distinct from our previous work on data smashing [4], and vastly improves discrimination performance and computing speed. The core idea here is the generalization of the notion of KL divergence often used to compare probability distributions to a notion of divergence in time series. We call this the sequence likelihood (SL) divergence, which may be used to measure deviations within a well-defined class of discrete-valued stochastic processes. We devise efficient estimators of SL divergence from finite sample paths, and subsequently formulate a universal metric useful for computing distance between time series produced by hidden stochastic generators. We illustrate the superior performance of the new smash2.0 Pattern disambiguation in two distinct applications involving electroencephalogram data and gait recognition is also illustrated. We are hopeful that the smash2.0 Effi ciently learning stochastic processes is a key challenge in analyzing time-dependency in domains where randomness cannot be ignored. For such learning to occur, we need todefine a distance metric to compare and contrast time series. However these distance metrics mentioned all have either or both of the following limitations: first, dimensionality reduction and feature selection heavily relies on domain knowledge and inevitably incur trade-o ff between precision and computability. Secondly, when dealing with data from nontrivial stochastic process dynamics, state of the art techniques might fail to correctly estimate the similarity or lack thereof between exemplars.

causal state, pfsa, sequence, (16 more...)

arXiv.org Machine Learning

1909.12243

Country: North America > United States > Illinois > Cook County > Chicago (0.04)

Genre: Research Report (0.84)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.76)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.70)

Add feedback

Global Autoregressive Models for Data-Efficient Sequence Learning

Parshakova, Tetiana, Andreoli, Jean-Marc, Dymetman, Marc

arXiv.org Artificial IntelligenceSep-19-2019

Standard autoregressive seq2seq models are easily trained by max-likelihood, but tend to show poor results under small-data conditions. We introduce a class of seq2seq models, GAMs (Global Autoregressive Models), which combine an autoregressive component with a log-linear component, allowing the use of global \textit{a priori} features to compensate for lack of data. We train these models in two steps. In the first step, we obtain an \emph{unnormalized} GAM that maximizes the likelihood of the data, but is improper for fast inference or evaluation. In the second step, we use this GAM to train (by distillation) a second autoregressive model that approximates the \emph{normalized} distribution associated with the GAM, and can be used for fast inference and evaluation. Our experiments focus on language modelling under synthetic conditions and show a strong perplexity reduction of using the second autoregressive model over the standard one.

experiment, motif, sequence, (15 more...)

arXiv.org Artificial Intelligence

1909.07063

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Europe (0.04)
Asia > Middle East > Jordan (0.04)
(3 more...)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

Symbolic Analysis-based Reduced Order Markov Modeling of Time Series Data

Jha, Devesh K, Virani, Nurali, Reimann, Jan, Srivastav, Abhishek, Ray, Asok

arXiv.org Machine LearningSep-26-2017

This paper presents a technique for reduced-order Markov modeling for compact representation of time-series data. In this work, symbolic dynamics-based tools have been used to infer an approximate generative Markov model. The time-series data are first symbolized by partitioning the continuous measurement space of the signal and then, the discrete sequential data are modeled using symbolic dynamics. In the proposed approach, the size of temporal memory of the symbol sequence is estimated from spectral properties of the resulting stochastic matrix corresponding to a first-order Markov model of the symbol sequence. Then, hierarchical clustering is used to represent the states of the corresponding full-state Markov model to construct a reduced-order or size Markov model with a non-deterministic algebraic structure. Subsequently, the parameters of the reduced-order Markov model are identified from the original model by making use of a Bayesian inference rule. The final model is selected using information-theoretic criteria. The proposed concept is elucidated and validated on two different data sets as examples. The first example analyzes a set of pressure data from a swirl-stabilized combustor, where controlled protocols are used to induce flame instabilities. Variations in the complexity of the derived Markov model represent how the system operating condition changes from a stable to an unstable combustion regime. In the second example, the data set is taken from NASA's data repository for prognostics of bearings on rotating shafts. We show that, even with a very small state-space, the reduced-order models are able to achieve comparable performance and that the proposed approach provides flexibility in the selection of a final model for representation and learning.

artificial intelligence, machine learning, markov model, (16 more...)

arXiv.org Machine Learning

1709.09274

Country: North America > United States > California (0.28)

Genre: Research Report (1.00)

Industry:

Government > Regional Government > North America Government > United States Government (0.48)
Government > Space Agency (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Computing Entropy Rate Of Symbol Sources & A Distribution-free Limit Theorem

Chattopadhyay, Ishanu, Lipson, Hod

arXiv.org Machine LearningMar-21-2014

Entropy rate of sequential data-streams naturally quantifies the complexity of the generative process. Thus entropy rate fluctuations could be used as a tool to recognize dynamical perturbations in signal sources, and could potentially be carried out without explicit background noise characterization. However, state of the art algorithms to estimate the entropy rate have markedly slow convergence; making such entropic approaches non-viable in practice. We present here a fundamentally new approach to estimate entropy rates, which is demonstrated to converge significantly faster in terms of input data lengths, and is shown to be effective in diverse applications ranging from the estimation of the entropy rate of English texts to the estimation of complexity of chaotic dynamical systems. Additionally, the convergence rate of entropy estimates do not follow from any standard limit theorem, and reported algorithms fail to provide any confidence bounds on the computed values. Exploiting a connection to the theory of probabilistic automata, we establish a convergence rate of $O(\log \vert s \vert/\sqrt[3]{\vert s \vert})$ as a function of the input length $\vert s \vert$, which then yields explicit uncertainty estimates, as well as required data lengths to satisfy pre-specified confidence bounds.

algorithm, automata, entropy rate, (14 more...)

arXiv.org Machine Learning

1401.0711

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > Florida > Orange County > Orlando (0.04)
Europe > Serbia > Vojvodina > South Bačka District > Novi Sad (0.04)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.68)

Add feedback

Data Smashing

Chattopadhyay, Ishanu, Lipson, Hod

arXiv.org Artificial IntelligenceJan-3-2014

Investigation of the underlying physics or biology from empirical data requires a quantifiable notion of similarity - when do two observed data sets indicate nearly identical generating processes, and when they do not. The discriminating characteristics to look for in data is often determined by heuristics designed by experts, $e.g.$, distinct shapes of "folded" lightcurves may be used as "features" to classify variable stars, while determination of pathological brain states might require a Fourier analysis of brainwave activity. Finding good features is non-trivial. Here, we propose a universal solution to this problem: we delineate a principle for quantifying similarity between sources of arbitrary data streams, without a priori knowledge, features or training. We uncover an algebraic structure on a space of symbolic models for quantized data, and show that such stochastic generators may be added and uniquely inverted; and that a model and its inverse always sum to the generator of flat white noise. Therefore, every data stream has an anti-stream: data generated by the inverse model. Similarity between two streams, then, is the degree to which one, when summed to the other's anti-stream, mutually annihilates all statistical structure to noise. We call this data smashing. We present diverse applications, including disambiguation of brainwaves pertaining to epileptic seizures, detection of anomalous cardiac rhythms, and classification of astronomical objects from raw photometry. In our examples, the data smashing principle, without access to any domain knowledge, meets or exceeds the performance of specialized algorithms tuned by domain experts.

artificial intelligence, data mining, machine learning, (18 more...)

arXiv.org Artificial Intelligence

1401.0742

Country:

Europe (0.28)
North America > United States (0.27)

Genre: Research Report (0.81)

Industry:

Health & Medicine > Therapeutic Area > Neurology > Epilepsy (0.47)
Health & Medicine > Therapeutic Area > Genetic Disease (0.47)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Efficient Induction of Finite State Automata

Collins, Matthew S., Oliver, Jonathan

arXiv.org Artificial IntelligenceFeb-6-2013

This paper introduces a new algorithm for the induction of complex finite state automata from samples of behaviour. The algorithm is based on information theoretic principles. The algorithm reduces the search space by many orders of magnitude over what was previously thought possible. We compare the algorithm with some existing induction techniques for finite state automata and show that the algorithm is much superior in both run time and quality of inductions.

algorithm, artificial intelligence, machine learning, (17 more...)

arXiv.org Artificial Intelligence

1302.153

Country:

Oceania > Australia (0.15)
Oceania > New Zealand (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback