Leoben
As the US and China lock horns, Malaysia hopes to harness an AI revolution
Kulim, Malaysia – When tech giant AT&S decided a few years ago that it needed to ramp up production to keep pace with the artificial intelligence (AI) boom, it did not look to its largest manufacturing facilities in China. The Austrian firm's plants in Chongqing and Shanghai – opened in 2022 and 2016, respectively – employ some 9,000 workers between them, churning out high-end components used in everything from consumer electronics to cars. But AT&S was at the same time coming to grips with the risks of concentrating production in one country. Like many tech firms grappling with the disruption of the COVID-19 pandemic and the trade war salvoes between the United States and China, AT&S decided it needed to diversify its supply chains. Malaysia quickly emerged at the top of the company's list of potential locations for its next plant.
How hard is my MDP?" The distribution-norm to the rescue"
Odalric-Ambrym Maillard, Timothy A. Mann, Shie Mannor
In Reinforcement Learning (RL), state-of-the-art algorithms require a large number of samples per state-action pair to estimate the transition kernel p. In many problems, a good approximation of p is not needed. For instance, if from one state-action pair (s, a), one can only transit to states with the same value, learning p( |s, a) accurately is irrelevant (only its support matters). This paper aims at capturing such behavior by defining a novel hardness measure for Markov Decision Processes (MDPs) based on what we call the distribution-norm. The distributionnorm w.r.t. a measure ν is defined on zero ν-mean functions f by the standard variation of f with respect to ν. We first provide a concentration inequality for the dual of the distribution-norm.
Function Space Diversity for Uncertainty Prediction via Repulsive Last-Layer Ensembles
Steger, Sophie, Knoll, Christian, Klein, Bernhard, Fröning, Holger, Pernkopf, Franz
Bayesian inference in function space has gained attention due to its robustness against overparameterization in neural networks. However, approximating the infinite-dimensional function space introduces several challenges. In this work, we discuss function space inference via particle optimization and present practical modifications that improve uncertainty estimation and, most importantly, make it applicable for large and pretrained networks. First, we demonstrate that the input samples, where particle predictions are enforced to be diverse, are detrimental to the model performance. While diversity on training data itself can lead to underfitting, the use of label-destroying data augmentation, or unlabeled out-of-distribution data can improve prediction diversity and uncertainty estimates. Furthermore, we take advantage of the function space formulation, which imposes no restrictions on network parameterization other than sufficient flexibility. Instead of using full deep ensembles to represent particles, we propose a single multi-headed network that introduces a minimal increase in parameters and computation. This allows seamless integration to pretrained networks, where this repulsive last-layer ensemble can be used for uncertainty aware fine-tuning at minimal additional cost. We achieve competitive results in disentangling aleatoric and epistemic uncertainty for active learning, detecting out-of-domain data, and providing calibrated uncertainty estimates under distribution shifts with minimal compute and memory.
EnvoDat: A Large-Scale Multisensory Dataset for Robotic Spatial Awareness and Semantic Reasoning in Heterogeneous Environments
Nwankwo, Linus, Ellensohn, Bjoern, Dave, Vedant, Hofer, Peter, Forstner, Jan, Villneuve, Marlene, Galler, Robert, Rueckert, Elmar
Abstract-- To ensure the efficiency of robot autonomy under diverse real-world conditions, a high-quality heterogeneous dataset is essential to benchmark the operating algorithms' performance and robustness. Current benchmarks predominantly focus on urban terrains, specifically for on-road autonomous driving, leaving multi-degraded, densely vegetated, dynamic and feature-sparse environments, such as underground tunnels, natural fields, and modern indoor spaces underrepresented. To fill this gap, we introduce EnvoDat, a large-scale, multi-modal dataset collected in diverse environments and conditions, including high illumination, fog, rain, and zero visibility at different times of the day. Overall, EnvoDat contains 26 sequences from 13 scenes, 10 sensing modalities, over 1.9T B of data, and over 89K fine-grained polygon-based annotations for more than 82 object and terrain classes. EnvoDat includes time-synchronized multimodal sensor data (e.g., RGB, LiDAR, depth) and Furthermore, real-world environments are often in a state I. INTRODUCTION This viability poses challenges for (whether known or unknown), describe their location, accurate perception and SLAM in autonomous agents. However, adapting autonomous for contemporary perception and SLAM algorithms agents to perform such innate abilities and operate reliably can potentially lead to inaccuracies.
ED-VAE: Entropy Decomposition of ELBO in Variational Autoencoders
Lygerakis, Fotios, Rueckert, Elmar
Traditional Variational Autoencoders (VAEs) are constrained by the limitations of the Evidence Lower Bound (ELBO) formulation, particularly when utilizing simplistic, non-analytic, or unknown prior distributions. These limitations inhibit the VAE's ability to generate high-quality samples and provide clear, interpretable latent representations. This work introduces the Entropy Decomposed Variational Autoencoder (ED-VAE), a novel re-formulation of the ELBO that explicitly includes entropy and cross-entropy components. This reformulation significantly enhances model flexibility, allowing for the integration of complex and non-standard priors. By providing more detailed control over the encoding and regularization of latent spaces, ED-VAE not only improves interpretability but also effectively captures the complex interactions between latent variables and observed data, thus leading to better generative performance.
Multimodal Human-Autonomous Agents Interaction Using Pre-Trained Language and Visual Foundation Models
Nwankwo, Linus, Rueckert, Elmar
In this paper, we extended the method proposed in [17] to enable humans to interact naturally with autonomous agents through vocal and textual conversations. Our extended method exploits the inherent capabilities of pre-trained large language models (LLMs), multimodal visual language models (VLMs), and speech recognition (SR) models to decode the high-level natural language conversations and semantic understanding of the robot's task environment, and abstract them to the robot's actionable commands or queries. We performed a quantitative evaluation of our framework's natural vocal conversation understanding with participants from different racial backgrounds and English language accents. The participants interacted with the robot using both spoken and textual instructional commands. Based on the logged interaction data, our framework achieved 87.55% vocal commands decoding accuracy, 86.27% commands execution success, and an average latency of 0.89 seconds from receiving the participants' vocal chat commands to initiating the robot's actual physical action. The video demonstrations of this paper can be found at https://linusnep.github.io/MTCC-IRoNL/.
Online Regret Bounds for Undiscounted Continuous Reinforcement Learning
We derive sublinear regret bounds for undiscounted reinforcement learning in continuous state space. The proposed algorithm combines state aggregation with the use of upper confidence bounds for implementing optimism in the face of uncertainty. Beside the existence of an optimal policy which satisfies the Poisson equation, the only assumptions made are Hölder continuity of rewards and transition probabilities.