AITopics | Shiarlis, Kyriacos

Collaborating Authors

Shiarlis, Kyriacos

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Gandalf the Red: Adaptive Security for LLMs

Pfister, Niklas, Volhejn, Václav, Knott, Manuel, Arias, Santiago, Bazińska, Julia, Bichurin, Mykhailo, Commike, Alan, Darling, Janet, Dienes, Peter, Fiedler, Matthew, Haber, David, Kraft, Matthias, Lancini, Marco, Mathys, Max, Pascual-Ortiz, Damián, Podolak, Jakub, Romero-López, Adrià, Shiarlis, Kyriacos, Signer, Andreas, Terek, Zsolt, Theocharis, Athanasios, Timbrell, Daniel, Trautwein, Samuel, Watts, Samuel, Wu, Natalie, Rojas-Carulla, Mateo

arXiv.org Artificial IntelligenceJan-14-2025

Current evaluations of defenses against prompt attacks in large language model (LLM) applications often overlook two critical factors: the dynamic nature of adversarial behavior and the usability penalties imposed on legitimate users by restrictive defenses. We propose D-SEC (Dynamic Security Utility Threat Model), which explicitly separates attackers from legitimate users, models multi-step interactions, and rigorously expresses the security-utility in an optimizable form. We further address the shortcomings in existing evaluations by introducing Gandalf, a crowd-sourced, gamified red-teaming platform designed to generate realistic, adaptive attack datasets. Using Gandalf, we collect and release a dataset of 279k prompt attacks. Complemented by benign user data, our analysis reveals the interplay between security and utility, showing that defenses integrated in the LLM (e.g., system prompts) can degrade usability even without blocking requests. We demonstrate that restricted application domains, defense-in-depth, and adaptive defenses are effective strategies for building secure and useful LLM applications. Code is available at \href{https://github.com/lakeraai/dsec-gandalf}{\texttt{https://github.com/lakeraai/dsec-gandalf}}.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2501.07927

Country:

Europe (0.28)
Asia > Japan (0.14)

Genre:

Research Report > Experimental Study (0.46)
Research Report > New Finding (0.46)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Military (0.93)
Media > Film (0.92)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

Add feedback

Hierarchical Imitation Learning for Stochastic Environments

Igl, Maximilian, Shah, Punit, Mougin, Paul, Srinivasan, Sirish, Gupta, Tarun, White, Brandyn, Shiarlis, Kyriacos, Whiteson, Shimon

arXiv.org Artificial IntelligenceSep-25-2023

Many applications of imitation learning require the agent to generate the full distribution of behaviour observed in the training data. For example, to evaluate the safety of autonomous vehicles in simulation, accurate and diverse behaviour models of other road users are paramount. Existing methods that improve this distributional realism typically rely on hierarchical policies. These condition the policy on types such as goals or personas that give rise to multi-modal behaviour. However, such methods are often inappropriate for stochastic environments where the agent must also react to external factors: because agent types are inferred from the observed future trajectory during training, these environments require that the contributions of internal and external factors to the agent behaviour are disentangled and only internal factors, i.e., those under the agent's control, are encoded in the type. Encoding future information about external factors leads to inappropriate agent reactions during testing, when the future is unknown and types must be drawn independently from the actual future. We formalize this challenge as distribution shift in the conditional distribution of agent types under environmental stochasticity. We propose Robust Type Conditioning (RTC), which eliminates this shift with adversarial training under randomly sampled types. Experiments on two domains, including the large-scale Waymo Open Motion Dataset, show improved distributional realism while maintaining or improving task performance compared to state-of-the-art baselines.

artificial intelligence, information, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2309.14003

Genre: Research Report (0.40)

Industry: Transportation > Ground > Road (0.48)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

VariBAD: A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning

Zintgraf, Luisa, Shiarlis, Kyriacos, Igl, Maximilian, Schulze, Sebastian, Gal, Yarin, Hofmann, Katja, Whiteson, Shimon

arXiv.org Machine LearningOct-18-2019

V ARIBAD: A V ERY G OOD M ETHOD FOR B AYES-A DAPTIVE D EEP RL VIA M ETA-L EARNING Luisa Zintgraf University of Oxford Kyriacos Shiarlis Latent Logic Maximilian Igl University of Oxford Sebastian Schulze University of Oxford Y arin Gal OA TML Group, University of Oxford Katja Hofmann Microsoft Research Shimon Whiteson University of Oxford Latent Logic A BSTRACT Trading off exploration and exploitation in an unknown environment is key to maximising expected return during learning. A Bayes-optimal policy, which does so optimally, conditions its actions not only on the environment state but on the agent's uncertainty about the environment. Computing a Bayes-optimal policy is however intractable for all but the smallest tasks. In this paper, we introduce variational Bayes-Adaptive Deep RL (variBAD), a way to meta-learn to perform approximate inference in an unknown environment, and incorporate task uncertainty directly during action selection. In a grid-world domain, we illustrate how variBAD performs structured online exploration as a function of task uncertainty. We also evaluate variBAD on MuJoCo domains widely used in meta-RL and show that it achieves higher return during training than existing methods. 1 I NTRODUCTION Reinforcement learning (RL) is typically concerned with finding an optimal policy that maximises expected return for a given Markov decision process (MDP) with an unknown reward and transition function. If these were known, the optimal policy could in theory be computed without interacting with the environment. By contrast, learning in an unknown environment typically requires trading off exploration (learning about the environment) and exploitation (taking promising actions). Balancing this tradeoff is key to maximising expected return during learning . A Bayes-optimal policy, which does so optimally, conditions actions not only on the environment state but on the agent's own uncertainty about the current MDP . In principle, a Bayes-optimal policy can be computed using the framework of Bayes-adaptive Markov decision processes (BAMDPs) (Martin, 1967; Duff & Barto, 2002). The agent maintains a belief, i.e., a posterior distribution, over possible environments. Augmenting the state space of the underlying MDP with this posterior distribution yields a BAMDP, a special case of a belief MDP (Kaelbling et al., 1998).

deep learning, neural network, upstream oil & gas, (20 more...)

arXiv.org Machine Learning

1910.08348

Country: Europe > United Kingdom > England > Oxfordshire > Oxford (1.00)

Genre: Research Report > New Finding (0.46)

Industry: Energy > Oil & Gas > Upstream (0.49)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(2 more...)

Add feedback

Learning from Demonstration in the Wild

Behbahani, Feryal, Shiarlis, Kyriacos, Chen, Xi, Kurin, Vitaly, Kasewa, Sudhanshu, Stirbu, Ciprian, Gomes, João, Paul, Supratik, Oliehoek, Frans A., Messias, João, Whiteson, Shimon

arXiv.org Machine LearningNov-8-2018

Abstract-- Learning from demonstration (LfD) is useful in settings where hand-coding behaviour or a reward function is impractical. It has succeeded in a wide range of problems but typically relies on artificially generated demonstrations or specially deployed sensors and has not generally been able to leverage the copious demonstrations available in the wild: those that capture behaviour that was occurring anyway using sensors that were already deployed for another purpose, e.g., traffic camera footage capturing demonstrations of natural behaviour of vehicles, cyclists, and pedestrians. We propose video to behaviour (ViBe), a new approach to learning models of road user behaviour that requires as input only unlabelled raw video data of a traffic scene collected from a single, monocular, uncalibrated camera with ordinary resolution. Our approach calibrates the camera, detects relevant objects, tracks them through time, and uses the resulting trajectories to perform LfD, yielding models of naturalistic behaviour. We apply ViBe to raw videos of a traffic intersection and show that it can learn purely from videos, without additional expert knowledge. Learning from demonstration (LfD) is a machine learning technique that can learn complex behaviours from a dataset of expert trajectories, called demonstrations. LfD is particularly useful in settings where hand-coding behaviour, or engineering a suitable reward function, is too difficult or labour intensive. While LfD has succeeded in a wide range of problems [1], [2], [3], nearly all methods rely on either artificially generated demonstrations (e.g., from laboratory subjects) or those collected by specially deployed sensors (e.g., MOCAP). These restrictions greatly limit the practical applicability of LfD, which to date has largely not been able to leverage the copious demonstrations available in the wild: those that capture behaviour that was occurring anyway using sensors that were already deployed for other purposes. For example, consider the problem of training autonomous vehicles to navigate in the presence of human road users.

artificial intelligence, ground transportation, trajectory, (16 more...)

arXiv.org Machine Learning

1811.03516

Country:

North America > United States (0.28)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)

Genre: Research Report (0.82)

Industry: Transportation > Ground > Road (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.68)

Add feedback

CAML: Fast Context Adaptation via Meta-Learning

Zintgraf, Luisa M, Shiarlis, Kyriacos, Kurin, Vitaly, Hofmann, Katja, Whiteson, Shimon

arXiv.org Machine LearningOct-12-2018

We propose CAML, a meta-learning method for fast adaptation that partitions the model parameters into two parts: context parameters that serve as additional input to the model and are adapted on individual tasks, and shared parameters that are meta-trained and shared across tasks. At test time, the context parameters are updated with one or several gradient steps on a task-specific loss that is backpropagated through the shared part of the network. Compared to approaches that adjust all parameters on a new task (e.g., MAML), our method can be scaled up to larger networks without overfitting on a single task, is easier to implement, and saves memory writes during training and network communication at test time for distributed machine learning systems. We show empirically that this approach outperforms MAML, is less sensitive to the task-specific learning rate, can capture meaningful task embeddings with the context parameters, and outperforms alternative partitionings of the parameter vectors. A key challenge in meta-learning is fast adaptation: learning on previously unseen tasks fast and with little data. In principle, this can be achieved by leveraging knowledge obtained in other, related tasks. However, the best way to do so remains an open question. A popular recent method for fast adaptation is model agnostic meta learning (MAML) (Finn et al., 2017a), which learns a model initialisation, such that at test time the model can be adapted to solve the new task in only a few gradient steps. MAML has an interleaved training procedure, comprised of inner loop and outer loop updates that operate on a batch of tasks at each iteration. In the inner loop, MAML learns task-specific parameters by performing one gradient step on a task-specific loss.

context parameter, deep learning, neural network, (17 more...)

arXiv.org Machine Learning

1810.03642

Country: Europe (0.14)

Genre: Research Report > New Finding (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

TACO: Learning Task Decomposition via Temporal Alignment for Control

Shiarlis, Kyriacos, Wulfmeier, Markus, Salter, Sasha, Whiteson, Shimon, Posner, Ingmar

arXiv.org Machine LearningAug-10-2018

Many advanced Learning from Demonstration (LfD) methods consider the decomposition of complex, real-world tasks into simpler sub-tasks. By reusing the corresponding sub-policies within and between tasks, they provide training data for each policy from different high-level tasks and compose them to perform novel ones. Existing approaches to modular LfD focus either on learning a single high-level task or depend on domain knowledge and temporal segmentation. In contrast, we propose a weakly supervised, domain-agnostic approach based on task sketches, which include only the sequence of sub-tasks performed in each demonstration. Our approach simultaneously aligns the sketches with the observed demonstrations and learns the required sub-policies. This improves generalisation in comparison to separate optimisation procedures. We evaluate the approach on multiple domains, including a simulated 3D robot arm control task using purely image-based observations. The results show that our approach performs commensurately with fully supervised approaches, while requiring significantly less annotation effort.

alignment, artificial intelligence, neural network, (20 more...)

arXiv.org Machine Learning

1803.0184

Country: Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback