AITopics | Shysheya, Aliaksandra

Collaborating Authors

Shysheya, Aliaksandra

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

JoLT: Joint Probabilistic Predictions on Tabular Data Using LLMs

Shysheya, Aliaksandra, Bronskill, John, Requeima, James, Siddiqui, Shoaib Ahmed, Gonzalez, Javier, Duvenaud, David, Turner, Richard E.

arXiv.org Machine LearningFeb-17-2025

We introduce a simple method for probabilistic predictions on tabular data based on Large Language Models (LLMs) called JoLT (Joint LLM Process for Tabular data). JoLT uses the in-context learning capabilities of LLMs to define joint distributions over tabular data conditioned on user-specified side information about the problem, exploiting the vast repository of latent problem-relevant knowledge encoded in LLMs. JoLT defines joint distributions for multiple target variables with potentially heterogeneous data types without any data conversion, data preprocessing, special handling of missing data, or model training, making it accessible and efficient for practitioners. Our experiments show that JoLT outperforms competitive methods on low-shot single-target and multi-target tabular classification and regression tasks. Furthermore, we show that JoLT can automatically handle missing data and perform data imputation by leveraging textual side information. We argue that due to its simplicity and generality, JoLT is an effective approach for a wide variety of real prediction problems.

large language model, machine learning, natural language, (16 more...)

arXiv.org Machine Learning

2502.11877

Country:

Europe (1.00)
North America > United States (0.46)
North America > Canada (0.28)

Genre: Research Report (1.00)

Industry:

Leisure & Entertainment > Sports > Olympic Games (1.00)
Health & Medicine > Therapeutic Area (0.93)
Media > Film (0.68)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

On conditional diffusion models for PDE simulations

Shysheya, Aliaksandra, Diaconu, Cristiana, Bergamin, Federico, Perdikaris, Paris, Hernández-Lobato, José Miguel, Turner, Richard E., Mathieu, Emile

arXiv.org Artificial IntelligenceOct-21-2024

Modelling partial differential equations (PDEs) is of crucial importance in science and engineering, and it includes tasks ranging from forecasting to inverse problems, such as data assimilation. However, most previous numerical and machine learning approaches that target forecasting cannot be applied out-of-the-box for data assimilation. Recently, diffusion models have emerged as a powerful tool for conditional generation, being able to flexibly incorporate observations without retraining. In this work, we perform a comparative study of score-based diffusion models for forecasting and assimilation of sparse observations. In particular, we focus on diffusion models that are either trained in a conditional manner, or conditioned after unconditional training. We address the shortcomings of existing models by proposing 1) an autoregressive sampling approach, that significantly improves performance in forecasting, 2) a new training strategy for conditional score-based models that achieves stable performance over a range of history lengths, and 3) a hybrid model which employs flexible pre-training conditioning on initial conditions and flexible posttraining conditioning to handle data assimilation. We empirically show that these modifications are crucial for successfully tackling the combination of forecasting and data assimilation, a task commonly encountered in real-world scenarios.

artificial intelligence, machine learning, trajectory, (19 more...)

arXiv.org Artificial Intelligence

2410.16415

Country:

Europe (0.46)
North America > United States (0.28)

Genre: Research Report > New Finding (0.45)

Industry:

Energy (0.67)
Health & Medicine (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.92)

Add feedback

Denoising Diffusion Probabilistic Models in Six Simple Steps

Turner, Richard E., Diaconu, Cristiana-Diana, Markou, Stratis, Shysheya, Aliaksandra, Foong, Andrew Y. K., Mlodozeniec, Bruno

arXiv.org Artificial IntelligenceFeb-10-2024

Denoising Diffusion Probabilistic Models (DDPMs) [Ho et al., 2020] are a very popular class of deep generative model that have been successfully applied to a diverse range of problems including image and video generation, protein and material synthesis, weather forecasting, and neural surrogates of partial differential equations. Despite their ubiquity it is hard to find an introduction to DDPMs which is simple, comprehensive, clean and clear. The compact explanations necessary in research papers are not able to elucidate all of the different design steps taken to formulate the DDPM and the rationale of the steps that are presented is often omitted to save space. Moreover, the expositions are typically presented from the variational lower bound perspective which is unnecessary and arguably harmful as it obfuscates why the method is working and suggests generalisations that do not perform well in practice. On the other hand, perspectives that take the continuous time-limit are beautiful and general, but they have a high barrier-to-entry as they require background knowledge of stochastic differential equations and probability flow. In this note, we distill down the formulation of the DDPM into six simple steps each of which comes with a clear rationale. We assume that the reader is familiar with fundamental topics in machine learning including basic probabilistic modelling, Gaussian distributions, maximum likelihood estimation, and deep learning.

artificial intelligence, machine learning, variance, (20 more...)

arXiv.org Artificial Intelligence

2402.04384

Country: Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Transformer Neural Autoregressive Flows

Patacchiola, Massimiliano, Shysheya, Aliaksandra, Hofmann, Katja, Turner, Richard E.

arXiv.org Artificial IntelligenceJan-3-2024

Density estimation, a central problem in machine learning, can be performed using Normalizing Flows (NFs). NFs comprise a sequence of invertible transformations, that turn a complex target distribution into a simple one, by exploiting the change of variables theorem. Neural Autoregressive Flows (NAFs) and Block Neural Autoregressive Flows (B-NAFs) are arguably the most perfomant members of the NF family. However, they suffer scalability issues and training instability due to the constraints imposed on the network structure. In this paper, we propose a novel solution to these challenges by exploiting transformers to define a new class of neural flows called Transformer Neural Autoregressive Flows (T-NAFs). T-NAFs treat each dimension of a random variable as a separate input token, using attention masking to enforce an autoregressive constraint. We take an amortization-inspired approach where the transformer outputs the parameters of an invertible transformation. The experimental results demonstrate that T-NAFs consistently match or outperform NAFs and B-NAFs across multiple datasets from the UCI benchmark. Remarkably, T-NAFs achieve these results using an order of magnitude fewer parameters than previous approaches, without composing multiple flows.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2401.01855

Country: Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)

Genre: Research Report > New Finding (0.88)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.93)

Add feedback

On the Efficacy of Differentially Private Few-shot Image Classification

Tobaben, Marlon, Shysheya, Aliaksandra, Bronskill, John, Paverd, Andrew, Tople, Shruti, Zanella-Beguelin, Santiago, Turner, Richard E, Honkela, Antti

arXiv.org Machine LearningDec-19-2023

There has been significant recent progress in training differentially private (DP) models which achieve accuracy that approaches the best non-private models. These DP models are typically pretrained on large public datasets and then fine-tuned on private downstream datasets that are relatively large and similar in distribution to the pretraining data. However, in many applications including personalization and federated learning, it is crucial to perform well (i) in the few-shot setting, as obtaining large amounts of labeled data may be problematic; and (ii) on datasets from a wide variety of domains for use in various specialist settings. To understand under which conditions few-shot DP can be effective, we perform an exhaustive set of experiments that reveals how the accuracy and vulnerability to attack of few-shot DP image classification models are affected as the number of shots per class, privacy level, model architecture, downstream dataset, and subset of learnable parameters in the model vary. We show that to achieve DP accuracy on par with non-private models, the shots per class must be increased as the privacy level increases. We also show that learning parameter-efficient FiLM adapters under DP is competitive with learning just the final classifier layer or learning all of the network parameters. Finally, we evaluate DP federated learning systems and establish state-of-the-art performance on the challenging FLAIR benchmark.

accuracy, artificial intelligence, machine learning, (16 more...)

arXiv.org Machine Learning

2302.0119

Country:

Europe (1.00)
North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > California > Los Angeles County > Long Beach (0.14)
North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report (0.83)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
(2 more...)

Add feedback

Diffusion-Augmented Neural Processes

Bonito, Lorenzo, Requeima, James, Shysheya, Aliaksandra, Turner, Richard E.

arXiv.org Artificial IntelligenceNov-16-2023

Over the last few years, Neural Processes have become a useful modelling tool in many application areas, such as healthcare and climate sciences, in which data are scarce and prediction uncertainty estimates are indispensable. However, the current state of the art in the field (AR CNPs; Bruinsma et al., 2023) presents a few issues that prevent its widespread deployment. This work proposes an alternative, diffusion-based approach to NPs which, through conditioning on noised datasets, addresses many of these limitations, whilst also exceeding SOTA performance.

artificial intelligence, machine learning, neural process, (15 more...)

arXiv.org Artificial Intelligence

2311.09848

Country:

Asia > Middle East > UAE > Abu Dhabi Emirate > Arabian Gulf (0.24)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.15)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

FiT: Parameter Efficient Few-shot Transfer Learning for Personalized and Federated Image Classification

Shysheya, Aliaksandra, Bronskill, John, Patacchiola, Massimiliano, Nowozin, Sebastian, Turner, Richard E

arXiv.org Machine LearningFeb-2-2023

Modern deep learning systems are increasingly deployed in situations such as personalization and federated learning where it is necessary to support i) learning on small amounts of data, and ii) communication efficient distributed training protocols. In this work, we develop FiLM Transfer (FiT) which fulfills these requirements in the image classification setting by combining ideas from transfer learning (fixed pretrained backbones and fine-tuned FiLM adapter layers) and meta-learning (automatically configured Naive Bayes classifiers and episodic training) to yield parameter efficient models with superior classification accuracy at low-shot. The resulting parameter efficiency is key for enabling few-shot learning, inexpensive model updates for personalization, and communication efficient federated learning. We experiment with FiT on a wide range of downstream datasets and show that it achieves better classification accuracy than the leading Big Transfer (BiT) algorithm at low-shot and achieves state-of-the art accuracy on the challenging VTAB-1k benchmark, with fewer than 1% of the updateable parameters. Finally, we demonstrate the parameter efficiency and superior accuracy of FiT in distributed low-shot applications including model personalization and federated learning where model update size is an important performance metric.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Machine Learning

2206.08671

Country: Europe (0.28)

Genre: Research Report > New Finding (0.46)

Industry:

Media > Film (0.68)
Health & Medicine (0.67)
Leisure & Entertainment (0.46)
Energy (0.46)

Add feedback

Textured Neural Avatars

Shysheya, Aliaksandra, Zakharov, Egor, Aliev, Kara-Ali, Bashirov, Renat, Burkov, Egor, Iskakov, Karim, Ivakhnenko, Aleksei, Malkov, Yury, Pasechnik, Igor, Ulyanov, Dmitry, Vakhitov, Alexander, Lempitsky, Victor

arXiv.org Artificial IntelligenceMay-21-2019

We present a system for learning full-body neural avatars, i.e. deep networks that produce full-body renderings of a person for varying body pose and camera position. Our system takes the middle path between the classical graphics pipeline and the recent deep learning approaches that generate images of humans using image-to-image translation. In particular, our system estimates an explicit two-dimensional texture map of the model surface. At the same time, it abstains from explicit shape modeling in 3D. Instead, at test time, the system uses a fully-convolutional network to directly map the configuration of body feature points w.r.t. the camera to the 2D texture coordinates of individual pixels in the image frame. We show that such a system is capable of learning to generate realistic renderings while being trained on videos annotated with 3D poses and foreground masks. We also demonstrate that maintaining an explicit texture representation helps our system to achieve better generalization compared to systems that use direct image-to-image translation.

avatar, deep learning, neural network, (20 more...)

arXiv.org Artificial Intelligence

1905.08776

Country: Europe > Austria > Vienna (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback