AITopics | Krause, Ben

Collaborating Authors

Krause, Ben

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

AutoGRAMS: Autonomous Graphical Agent Modeling Software

Krause, Ben, Chen, Lucia, Kahembwe, Emmanuel

arXiv.org Artificial IntelligenceJul-13-2024

We introduce the AutoGRAMS framework for programming multi-step interactions with language models. AutoGRAMS represents AI agents as a graph, where each node can execute either a language modeling instruction or traditional code. Likewise, transitions in the graph can be governed by either language modeling decisions or traditional branch logic. AutoGRAMS supports using variables as memory and allows nodes to call other AutoGRAMS graphs as functions. We show how AutoGRAMS can be used to design highly sophisticated agents, including self-referential agents that can modify their own graph. AutoGRAMS's graph-centric approach aids interpretability, controllability, and safety during the design, development, and deployment of AI agents. We provide our framework as open source at https://github.com/autograms/autograms .

large language model, machine learning, node, (19 more...)

arXiv.org Artificial Intelligence

2407.10049

Country:

Europe (0.45)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Workflow (1.00)

Industry: Health & Medicine (0.74)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

XGen-7B Technical Report

Nijkamp, Erik, Xie, Tian, Hayashi, Hiroaki, Pang, Bo, Xia, Congying, Xing, Chen, Vig, Jesse, Yavuz, Semih, Laban, Philippe, Krause, Ben, Purushwalkam, Senthil, Niu, Tong, Kryściński, Wojciech, Murakhovs'ka, Lidiya, Choubey, Prafulla Kumar, Fabbri, Alex, Liu, Ye, Meng, Rui, Tu, Lifu, Bhat, Meghana, Wu, Chien-Sheng, Savarese, Silvio, Zhou, Yingbo, Joty, Shafiq, Xiong, Caiming

arXiv.org Artificial IntelligenceSep-6-2023

Large Language Models (LLMs) have become ubiquitous across various domains, transforming the way we interact with information and conduct research. However, most high-performing LLMs remain confined behind proprietary walls, hindering scientific progress. Most open-source LLMs, on the other hand, are limited in their ability to support longer sequence lengths, which is a key requirement for many tasks that require inference over an input context. To address this, we have trained XGen, a series of 7B parameter models on up to 8K sequence length for up to 1.5T tokens. We have also finetuned the XGen models on public-domain instructional data, creating their instruction-tuned counterparts (XGen-Inst). We open-source our models for both research advancements and commercial applications. Our evaluation on standard benchmarks shows that XGen models achieve comparable or better results when compared with state-of-the-art open-source LLMs. Our targeted evaluation on long sequence modeling tasks shows the benefits of our 8K-sequence models over 2K-sequence open-source LLMs.

artificial intelligence, large language model, xgen-7b technical report, (1 more...)

arXiv.org Artificial Intelligence

2309.0345

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Taylorized Training: Towards Better Approximation of Neural Network Training at Finite Width

Bai, Yu, Krause, Ben, Wang, Huan, Xiong, Caiming, Socher, Richard

arXiv.org Machine LearningFeb-24-2020

We propose \emph{Taylorized training} as an initiative towards better understanding neural network training at finite width. Taylorized training involves training the $k$-th order Taylor expansion of the neural network at initialization, and is a principled extension of linearized training---a recently proposed theory for understanding the success of deep learning. We experiment with Taylorized training on modern neural network architectures, and show that Taylorized training (1) agrees with full neural network training increasingly better as we increase $k$, and (2) can significantly close the performance gap between linearized and full training. Compared with linearized training, higher-order training works in more realistic settings such as standard parameterization and large (initial) learning rate. We complement our experiments with theoretical results showing that the approximation error of $k$-th order Taylorized models decay exponentially over $k$ in wide neural networks.

deep learning, neural network, taylorized training, (17 more...)

arXiv.org Machine Learning

2002.0401

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback

Dynamic Evaluation of Transformer Language Models

Krause, Ben, Kahembwe, Emmanuel, Murray, Iain, Renals, Steve

arXiv.org Machine LearningApr-17-2019

This research note combines two methods that have recently improved the state of the art in language modeling: Transformers and dynamic evaluation. Transformers use stacked layers of self-attention that allow them to capture long range dependencies in sequential data. Dynamic evaluation fits models to the recent sequence history, allowing them to assign higher probabilities to reoccurring sequential patterns. By applying dynamic evaluation to Transformer-XL models, we improve the state of the art on enwik8 from 0.99 to 0.94 bits/char, text8 from 1.08 to 1.04 bits/char, and WikiText-103 from 18.3 to 16.4 perplexity points. Language modeling is a commonly used machine learning benchmark with applications to speech recognition, machine translation, text generation, and unsupervised learning in natural language processing tasks.

deep learning, dynamic evaluation, neural network, (17 more...)

arXiv.org Machine Learning

1904.08378

Country: Europe > United Kingdom > Scotland (0.14)

Genre: Research Report > New Finding (0.49)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.56)

Add feedback

Talking to myself: self-dialogues as data for conversational agents

Fainberg, Joachim, Krause, Ben, Dobre, Mihai, Damonte, Marco, Kahembwe, Emmanuel, Duma, Daniel, Webber, Bonnie, Fancellu, Federico

arXiv.org Artificial IntelligenceSep-19-2018

Conversational agents are gaining popularity with the increasing ubiquity of smart devices. However, training agents in a data driven manner is challenging due to a lack of suitable corpora. This paper presents a novel method for gathering topical, unstructured conversational data in an efficient way: self-dialogues through crowd-sourcing. Alongside this paper, we include a corpus of 3.6 million words across 23 topics. We argue the utility of the corpus by comparing self-dialogues with standard two-party conversations as well as data from other corpora.

corpus, crowdsourcing, social media, (21 more...)

arXiv.org Artificial Intelligence

1809.06641

Country: North America > United States (0.68)

Genre:

Personal > Interview (1.00)
Research Report (0.84)

Industry:

Media > Television (1.00)
Media > Film (1.00)
Leisure & Entertainment > Sports > Football (1.00)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)

Add feedback

Multiplicative LSTM for sequence modelling

Krause, Ben, Lu, Liang, Murray, Iain, Renals, Steve

arXiv.org Machine LearningOct-12-2017

We introduce multiplicative LSTM (mLSTM), a recurrent neural network architecture for sequence modelling that combines the long short-term memory (LSTM) and multiplicative recurrent neural network architectures. mLSTM is characterised by its ability to have different recurrent transition functions for each possible input, which we argue makes it more expressive for autoregressive density estimation. We demonstrate empirically that mLSTM outperforms standard LSTM and its deep variants for a range of character level language modelling tasks. In this version of the paper, we regularise mLSTM to achieve 1.27 bits/char on text8 and 1.24 bits/char on Hutter Prize. We also apply a purely byte-level mLSTM on the WikiText-2 dataset to achieve a character level entropy of 1.26 bits/char, corresponding to a word level perplexity of 88.8, which is comparable to word level LSTMs regularised in similar ways on the same task.

deep learning, mlstm, neural network, (16 more...)

arXiv.org Machine Learning

1609.07959

Country:

North America > United States > Illinois (0.14)
Europe > United Kingdom > Scotland (0.14)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback