AITopics | Bornschein, Jorg

Collaborating Authors

Bornschein, Jorg

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Predicting from Strings: Language Model Embeddings for Bayesian Optimization

Nguyen, Tung, Zhang, Qiuyi, Yang, Bangding, Lee, Chansoo, Bornschein, Jorg, Miao, Yingjie, Perel, Sagi, Chen, Yutian, Song, Xingyou

arXiv.org Artificial IntelligenceOct-15-2024

Bayesian Optimization is ubiquitous in the field of experimental design and blackbox optimization for improving search efficiency, but has been traditionally restricted to regression models which are only applicable to fixed search spaces and tabular input features. We propose Embed-then-Regress, a paradigm for applying in-context regression over string inputs, through the use of string embedding capabilities of pretrained language models. By expressing all inputs as strings, we are able to perform general-purpose regression for Bayesian Optimization over various domains including synthetic, combinatorial, and hyperparameter optimization, obtaining comparable results to state-of-the-art Gaussian Process-based algorithms. Code can be found at https://github.com/google-research/optformer/tree/main/optformer/embed_then_regress.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2410.1019

Country:

Europe (0.68)
North America > United States > California (0.28)

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(2 more...)

Add feedback

Denoising Autoregressive Representation Learning

Li, Yazhe, Bornschein, Jorg, Chen, Ting

arXiv.org Artificial IntelligenceJun-4-2024

In this paper, we explore a new generative approach for learning visual representations. Our method, DARL, employs a decoder-only Transformer to predict image patches autoregressively. We find that training with Mean Squared Error (MSE) alone leads to strong representations. To enhance the image generation ability, we replace the MSE loss with the diffusion objective by using a denoising patch decoder. We show that the learned representation can be improved by using tailored noise schedules and longer training in larger models. Notably, the optimal schedule differs significantly from the typical ones used in standard image diffusion models. Overall, despite its simple architecture, DARL delivers performance remarkably close to state-of-the-art masked prediction models under the fine-tuning protocol. This marks an important step towards a unified model capable of both visual perception and generation, effectively combining the strengths of autoregressive and denoising diffusion models.

artificial intelligence, machine learning, representation, (15 more...)

arXiv.org Artificial Intelligence

2403.05196

Country: Europe > Austria > Vienna (0.14)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Revisiting Dynamic Evaluation: Online Adaptation for Large Language Models

Rannen-Triki, Amal, Bornschein, Jorg, Pascanu, Razvan, Hutter, Marcus, György, Andras, Galashov, Alexandre, Teh, Yee Whye, Titsias, Michalis K.

arXiv.org Artificial IntelligenceMar-3-2024

We consider the problem of online fine tuning the parameters of a language model at test time, also known as dynamic evaluation. While it is generally known that this approach improves the overall predictive performance, especially when considering distributional shift between training and evaluation data, we here emphasize the perspective that online adaptation turns parameters into temporally changing states and provides a form of context-length extension with memory in weights, more in line with the concept of memory in neuroscience. We pay particular attention to the speed of adaptation (in terms of sample efficiency),sensitivity to the overall distributional drift, and the computational overhead for performing gradient computations and parameter updates. Our empirical study provides insights on when online adaptation is particularly interesting. We highlight that with online adaptation the conceptual distinction between in-context learning and fine tuning blurs: both are methods to condition the model on previously observed tokens.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2403.01518

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Transformers for Supervised Online Continual Learning

Bornschein, Jorg, Li, Yazhe, Rannen-Triki, Amal

arXiv.org Artificial IntelligenceMar-3-2024

Transformers have become the dominant architecture for sequence modeling tasks such as natural language processing or audio processing, and they are now even considered for tasks that are not naturally sequential such as image classification. Their ability to attend to and to process a set of tokens as context enables them to develop in-context few-shot learning abilities. However, their potential for online continual learning remains relatively unexplored. In online continual learning, a model must adapt to a non-stationary stream of data, minimizing the cumulative nextstep prediction loss. We focus on the supervised online continual learning setting, where we learn a predictor $x_t \rightarrow y_t$ for a sequence of examples $(x_t, y_t)$. Inspired by the in-context learning capabilities of transformers and their connection to meta-learning, we propose a method that leverages these strengths for online continual learning. Our approach explicitly conditions a transformer on recent observations, while at the same time online training it with stochastic gradient descent, following the procedure introduced with Transformer-XL. We incorporate replay to maintain the benefits of multi-epoch training while adhering to the sequential protocol. We hypothesize that this combination enables fast adaptation through in-context learning and sustained longterm improvement via parametric learning. Our method demonstrates significant improvements over previous state-of-the-art results on CLOC, a challenging large-scale real-world benchmark for image geo-localization.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2403.01554

Country: North America > United States > California > San Francisco County > San Francisco (0.14)

Genre:

Research Report (1.00)
Instructional Material > Online (1.00)

Industry:

Education > Educational Setting > Online (1.00)
Education > Educational Technology > Educational Software > Computer Based Training (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory > Minimum Complexity Machines (0.68)

Add feedback

Bridging the Gap Between Offline and Online Reinforcement Learning Evaluation Methodologies

Sujit, Shivakanth, Braga, Pedro H. M., Bornschein, Jorg, Kahou, Samira Ebrahimi

arXiv.org Artificial IntelligenceNov-21-2023

Reinforcement learning (RL) has shown great promise with algorithms learning in environments with large state and action spaces purely from scalar reward signals. A crucial challenge for current deep RL algorithms is that they require a tremendous amount of environment interactions for learning. This can be infeasible in situations where such interactions are expensive; such as in robotics. Offline RL algorithms try to address this issue by bootstrapping the learning process from existing logged data without needing to interact with the environment from the very beginning. While online RL algorithms are typically evaluated as a function of the number of environment interactions, there exists no single established protocol for evaluating offline RL methods.In this paper, we propose a sequential approach to evaluate offline RL algorithms as a function of the training set size and thus by their data efficiency. Sequential evaluation provides valuable insights into the data efficiency of the learning process and the robustness of algorithms to distribution changes in the dataset while also harmonizing the visualization of the offline and online learning phases. Our approach is generally applicable and easy to implement. We compare several existing offline RL algorithms using this approach and present insights from a variety of tasks and offline datasets.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

2212.08131

Country:

Europe (0.46)
North America > Canada > Quebec (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)

Genre:

Research Report (0.50)
Instructional Material > Online (0.40)

Industry: Education > Educational Setting > Online (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

A study on the plasticity of neural networks

Berariu, Tudor, Czarnecki, Wojciech, De, Soham, Bornschein, Jorg, Smith, Samuel, Pascanu, Razvan, Clopath, Claudia

arXiv.org Artificial IntelligenceOct-14-2023

For example, PackNet (Mallya & Lazebnik, 2017) eventually One aim shared by multiple settings, such as continual gets to a point where all neurons are frozen and learning is learning or transfer learning, is to leverage not possible anymore. In the same fashion, accumulating previously acquired knowledge to converge faster constraints in EWC (Kirkpatrick et al., 2017) might lead on the current task. Usually this is done through to a strongly regularised objective that does not allow for fine-tuning, where an implicit assumption is that the new task's loss to be minimised. Alternatively, learning the network maintains its plasticity, meaning that might become less data efficient, referred to as negative the performance it can reach on any given task is forward transfer, an effect often noticed for regularisation not affected negatively by previously seen tasks.

artificial intelligence, machine learning, neural network, (17 more...)

arXiv.org Artificial Intelligence

2106.00042

Country: North America > United States (0.14)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Kalman Filter for Online Classification of Non-Stationary Data

Titsias, Michalis K., Galashov, Alexandre, Rannen-Triki, Amal, Pascanu, Razvan, Teh, Yee Whye, Bornschein, Jorg

arXiv.org Artificial IntelligenceJun-14-2023

In Online Continual Learning (OCL) a learning system receives a stream of data and sequentially performs prediction and training steps. Important challenges in OCL are concerned with automatic adaptation to the particular non-stationary structure of the data, and with quantification of predictive uncertainty. Motivated by these challenges we introduce a probabilistic Bayesian online learning model by using a (possibly pretrained) neural representation and a state space model over the linear predictor weights. Non-stationarity over the linear predictor weights is modelled using a "parameter drift" transition density, parametrized by a coefficient that quantifies forgetting. Inference in the model is implemented with efficient Kalman filter recursions which track the posterior distribution over the linear weights, while online SGD updates over the transition dynamics coefficient allows to adapt to the non-stationarity seen in data. While the framework is developed assuming a linear Gaussian model, we also extend it to deal with classification problems and for fine-tuning the deep learning representation. In a set of experiments in multi-class classification using data sets such as CIFAR-100 and CLOC we demonstrate the predictive ability of the model and its flexibility to capture non-stationarity.

artificial intelligence, deep learning, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2306.08448

Country: North America > Canada > Ontario > Toronto (0.14)

Genre:

Research Report (0.82)
Instructional Material > Online (0.35)

Industry:

Education > Educational Setting > Online (1.00)
Education > Educational Technology > Educational Software > Computer Based Training (0.41)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

NEVIS'22: A Stream of 100 Tasks Sampled from 30 Years of Computer Vision Research

Bornschein, Jorg, Galashov, Alexandre, Hemsley, Ross, Rannen-Triki, Amal, Chen, Yutian, Chaudhry, Arslan, He, Xu Owen, Douillard, Arthur, Caccia, Massimo, Feng, Qixuang, Shen, Jiajun, Rebuffi, Sylvestre-Alvise, Stacpoole, Kitty, Casas, Diego de las, Hawkins, Will, Lazaridou, Angeliki, Teh, Yee Whye, Rusu, Andrei A., Pascanu, Razvan, Ranzato, Marc'Aurelio

arXiv.org Artificial IntelligenceMay-16-2023

A shared goal of several machine learning communities like continual learning, meta-learning and transfer learning, is to design algorithms and models that efficiently and robustly adapt to unseen tasks. An even more ambitious goal is to build models that never stop adapting, and that become increasingly more efficient through time by suitably transferring the accrued knowledge. Beyond the study of the actual learning algorithm and model architecture, there are several hurdles towards our quest to build such models, such as the choice of learning protocol, metric of success and data needed to validate research hypotheses. In this work, we introduce the Never-Ending VIsual-classification Stream (NEVIS'22), a benchmark consisting of a stream of over 100 visual classification tasks, sorted chronologically and extracted from papers sampled uniformly from computer vision proceedings spanning the last three decades. The resulting stream reflects what the research community thought was meaningful at any point in time, and it serves as an ideal test bed to assess how well models can adapt to new tasks, and do so better and more efficiently as time goes by. Despite being limited to classification, the resulting stream has a rich diversity of tasks from OCR, to texture analysis, scene recognition, and so forth. The diversity is also reflected in the wide range of dataset sizes, spanning over four orders of magnitude. Overall, NEVIS'22 poses an unprecedented challenge for current sequential learning approaches due to the scale and diversity of tasks, yet with a low entry barrier as it is limited to a single modality and well understood supervised learning problems. Moreover, we provide a reference implementation including strong baselines and an evaluation protocol to compare methods in terms of their trade-off between accuracy and compute.

artificial intelligence, inductive learning, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2211.11747

Country:

Europe (1.00)
North America > United States > California (0.45)
North America > United States > Texas (0.27)
North America > United States > Massachusetts (0.27)

Genre:

Research Report > New Finding (1.00)
Instructional Material (0.93)

Industry:

Education (1.00)
Health & Medicine > Therapeutic Area (0.94)
Information Technology (0.92)
Health & Medicine > Diagnostic Medicine > Imaging (0.67)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(2 more...)

Add feedback

DiscoGen: Learning to Discover Gene Regulatory Networks

Ke, Nan Rosemary, Dunn, Sara-Jane, Bornschein, Jorg, Chiappa, Silvia, Rey, Melanie, Lespiau, Jean-Baptiste, Cassirer, Albin, Wang, Jane, Weber, Theophane, Barrett, David, Botvinick, Matthew, Goyal, Anirudh, Mozer, Mike, Rezende, Danilo

arXiv.org Artificial IntelligenceApr-12-2023

Accurately inferring Gene Regulatory Networks (GRNs) is a critical and challenging task in biology. GRNs model the activatory and inhibitory interactions between genes and are inherently causal in nature. To accurately identify GRNs, perturbational data is required. However, most GRN discovery methods only operate on observational data. Recent advances in neural network-based causal discovery methods have significantly improved causal discovery, including handling interventional data, improvements in performance and scalability. However, applying state-of-the-art (SOTA) causal discovery methods in biology poses challenges, such as noisy data and a large number of samples. Thus, adapting the causal discovery methods is necessary to handle these challenges. In this paper, we introduce DiscoGen, a neural network-based GRN discovery method that can denoise gene expression measurements and handle interventional data. We demonstrate that our model outperforms SOTA neural network-based causal discovery methods.

artificial intelligence, discogen, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2304.05823

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

Evaluating Representations with Readout Model Switching

Li, Yazhe, Bornschein, Jorg, Hutter, Marcus

arXiv.org Artificial IntelligenceFeb-19-2023

Although much of the success of Deep Learning builds on learning good representations, a rigorous method to evaluate their quality is lacking. In this paper, we treat the evaluation of representations as a model selection problem and propose to use the Minimum Description Length (MDL) principle to devise an evaluation metric. Contrary to the established practice of limiting the capacity of the readout model, we design a hybrid discrete and continuous-valued model space for the readout models and employ a switching strategy to combine their predictions. The MDL score takes model complexity, as well as data efficiency into account. As a result, the most appropriate model for the specific task and representation will be chosen, making it a unified measure for comparison. The proposed metric can be efficiently computed with an online method and we present results for pre-trained vision encoders of various architectures (ResNet and ViT) and objective functions (supervised and self-supervised) on a range of downstream tasks. We compare our methods with accuracy-based approaches and show that the latter are inconsistent when multiple readout models are used. Finally, we discuss important properties revealed by our evaluations such as model scaling, preferred readout model, and data efficiency.

artificial intelligence, machine learning, representation, (18 more...)

arXiv.org Artificial Intelligence

2302.09579

Country: North America > United States (0.46)

Genre: Research Report > New Finding (0.46)

Add feedback