Goto

Collaborating Authors

 Bayesian Learning


Optimal camera-robot pose estimation in linear time from points and lines

arXiv.org Artificial Intelligence

Camera pose estimation is a fundamental problem in robotics. This paper focuses on two issues of interest: First, point and line features have complementary advantages, and it is of great value to design a uniform algorithm that can fuse them effectively; Second, with the development of modern front-end techniques, a large number of features can exist in a single image, which presents a potential for highly accurate robot pose estimation. With these observations, we propose AOPnP(L), an optimal linear-time camera-robot pose estimation algorithm from points and lines. Specifically, we represent a line with two distinct points on it and unify the noise model for point and line measurements where noises are added to 2D points in the image. By utilizing Plucker coordinates for line parameterization, we formulate a maximum likelihood (ML) problem for combined point and line measurements. To optimally solve the ML problem, AOPnP(L) adopts a two-step estimation scheme. In the first step, a consistent estimate that can converge to the true pose is devised by virtue of bias elimination. In the second step, a single Gauss-Newton iteration is executed to refine the initial estimate. AOPnP(L) features theoretical optimality in the sense that its mean squared error converges to the Cramer-Rao lower bound. Moreover, it owns a linear time complexity. These properties make it well-suited for precision-demanding and real-time robot pose estimation. Extensive experiments are conducted to validate our theoretical developments and demonstrate the superiority of AOPnP(L) in both static localization and dynamic odometry systems.


Promises and Pitfalls of Generative Masked Language Modeling: Theoretical Framework and Practical Guidelines

arXiv.org Artificial Intelligence

Autoregressive language models are the currently dominant paradigm for text generation, but they have some fundamental limitations that cannot be remedied by scale-for example inherently sequential and unidirectional generation. While alternate classes of models have been explored, we have limited mathematical understanding of their fundamental power and limitations. In this paper we focus on Generative Masked Language Models (GMLMs), a non-autoregressive paradigm in which we train a model to fit conditional probabilities of the data distribution via masking, which are subsequently used as inputs to a Markov Chain to draw samples from the model, These models empirically strike a promising speed-quality trade-off as each step can be typically parallelized by decoding the entire sequence in parallel. We develop a mathematical framework for analyzing and improving such models which sheds light on questions of sample complexity and inference speed and quality. Empirically, we adapt the T5 model for iteratively-refined parallel decoding, achieving 2-3x speedup in machine translation with minimal sacrifice in quality compared with autoregressive models. We run careful ablation experiments to give recommendations on key design choices, and make fine-grained observations on the common error modes in connection with our theory. Our mathematical analyses and empirical observations characterize both potentials and limitations of this approach, and can be applied to future works on improving understanding and performance of GMLMs. Our codes are released at https://github.com/google-research/google-research/tree/master/padir


Multimodal Input Aids a Bayesian Model of Phonetic Learning

arXiv.org Artificial Intelligence

One of the many tasks facing the typically-developing child language learner is learning to discriminate between the distinctive sounds that make up words in their native language. Here we investigate whether multimodal information--specifically adult speech coupled with video frames of speakers' faces--benefits a computational model of phonetic learning. We introduce a method for creating high-quality synthetic videos of speakers' faces for an existing audio corpus. Our learning model, when both trained and tested on audiovisual inputs, achieves up to a 8.1% relative improvement on a phoneme discrimination battery compared to a model trained and tested on audio-only input. It also outperforms the audio model by up to 3.9% when both are tested on audio-only data, suggesting that visual information facilitates the acquisition of acoustic distinctions. Visual information is especially beneficial in noisy audio environments, where an audiovisual model closes 67% of the loss in discrimination performance of the audio model in noise relative to a non-noisy environment. These results demonstrate that visual information benefits an ideal learner and illustrate some of the ways that children might be able to leverage visual cues when learning to discriminate speech sounds.


Building Machines that Learn and Think with People

arXiv.org Artificial Intelligence

What do we want from machine intelligence? We envision machines that are not just tools for thought, but partners in thought: reasonable, insightful, knowledgeable, reliable, and trustworthy systems that think with us. Current artificial intelligence (AI) systems satisfy some of these criteria, some of the time. In this Perspective, we show how the science of collaborative cognition can be put to work to engineer systems that really can be called ``thought partners,'' systems built to meet our expectations and complement our limitations. We lay out several modes of collaborative thought in which humans and AI thought partners can engage and propose desiderata for human-compatible thought partnerships. Drawing on motifs from computational cognitive science, we motivate an alternative scaling path for the design of thought partners and ecosystems around their use through a Bayesian lens, whereby the partners we construct actively build and reason over models of the human and world.


Explainable AI-based Intrusion Detection System for Industry 5.0: An Overview of the Literature, associated Challenges, the existing Solutions, and Potential Research Directions

arXiv.org Artificial Intelligence

Industry 5.0, which focuses on human and Artificial Intelligence (AI) collaboration for performing different tasks in manufacturing, involves a higher number of robots, Internet of Things (IoTs) devices and interconnections, Augmented/Virtual Reality (AR), and other smart devices. The huge involvement of these devices and interconnection in various critical areas, such as economy, health, education and defense systems, poses several types of potential security flaws. AI itself has been proven a very effective and powerful tool in different areas of cybersecurity, such as intrusion detection, malware detection, and phishing detection, among others. Just as in many application areas, cybersecurity professionals were reluctant to accept black-box ML solutions for cybersecurity applications. This reluctance pushed forward the adoption of eXplainable Artificial Intelligence (XAI) as a tool that helps explain how decisions are made in ML-based systems. In this survey, we present a comprehensive study of different XAI-based intrusion detection systems for industry 5.0, and we also examine the impact of explainability and interpretability on Cybersecurity practices through the lens of Adversarial XIDS (Adv-XIDS) approaches. Furthermore, we analyze the possible opportunities and challenges in XAI cybersecurity systems for industry 5.0 that elicit future research toward XAI-based solutions to be adopted by high-stakes industry 5.0 applications. We believe this rigorous analysis will establish a foundational framework for subsequent research endeavors within the specified domain.


Revisiting Neighborhood Aggregation in Graph Neural Networks for Node Classification using Statistical Signal Processing

arXiv.org Machine Learning

We delve into the issue of node classification within graphs, specifically reevaluating the concept of neighborhood aggregation, which is a fundamental component in graph neural networks (GNNs). Our analysis reveals conceptual flaws within certain benchmark GNN models when operating under the assumption of edge-independent node labels, a condition commonly observed in benchmark graphs employed for node classification. Approaching neighborhood aggregation from a statistical signal processing perspective, our investigation provides novel insights which may be used to design more efficient GNN models.


FFHFlow: A Flow-based Variational Approach for Multi-fingered Grasp Synthesis in Real Time

arXiv.org Artificial Intelligence

Synthesizing diverse and accurate grasps with multi-fingered hands is an important yet challenging task in robotics. Previous efforts focusing on generative modeling have fallen short of precisely capturing the multi-modal, high-dimensional grasp distribution. To address this, we propose exploiting a special kind of Deep Generative Model (DGM) based on Normalizing Flows (NFs), an expressive model for learning complex probability distributions. Specifically, we first observed an encouraging improvement in diversity by directly applying a single conditional NFs (cNFs), dubbed FFHFlow-cnf, to learn a grasp distribution conditioned on the incomplete point cloud. However, we also recognized limited performance gains due to restricted expressivity in the latent space. This motivated us to develop a novel flow-based d Deep Latent Variable Model (DLVM), namely FFHFlow-lvm, which facilitates more reasonable latent features, leading to both diverse and accurate grasp synthesis for unseen objects. Unlike Variational Autoencoders (VAEs), the proposed DLVM counteracts typical pitfalls such as mode collapse and mis-specified priors by leveraging two cNFs for the prior and likelihood distributions, which are usually restricted to being isotropic Gaussian. Comprehensive experiments in simulation and real-robot scenarios demonstrate that our method generates more accurate and diverse grasps than the VAE baselines. Additionally, a run-time comparison is conducted to reveal its high potential for real-time applications.


A multi-level multi-label text classification dataset of 19th century Ottoman and Russian literary and critical texts

arXiv.org Artificial Intelligence

This paper introduces a multi-level, multi-label text classification dataset comprising over 3000 documents. The dataset features literary and critical texts from 19th-century Ottoman Turkish and Russian. It is the first study to apply large language models (LLMs) to this dataset, sourced from prominent literary periodicals of the era. The texts have been meticulously organized and labeled. This was done according to a taxonomic framework that takes into account both their structural and semantic attributes. Articles are categorized and tagged with bibliometric metadata by human experts. We present baseline classification results using a classical bag-of-words (BoW) naive Bayes model and three modern LLMs: multilingual BERT, Falcon, and Llama-v2. We found that in certain cases, Bag of Words (BoW) outperforms Large Language Models (LLMs), emphasizing the need for additional research, especially in low-resource language settings. This dataset is expected to be a valuable resource for researchers in natural language processing and machine learning, especially for historical and low-resource languages. The dataset is publicly available^1.


Variational Potential Flow: A Novel Probabilistic Framework for Energy-Based Generative Modelling

arXiv.org Artificial Intelligence

Energy based models (EBMs) are appealing for their generality and simplicity in data likelihood modeling, but have conventionally been difficult to train due to the unstable and time-consuming implicit MCMC sampling during contrastive divergence training. In this paper, we present a novel energy-based generative framework, Variational Potential Flow (VAPO), that entirely dispenses with implicit MCMC sampling and does not rely on complementary latent models or cooperative training. The VAPO framework aims to learn a potential energy function whose gradient (flow) guides the prior samples, so that their density evolution closely follows an approximate data likelihood homotopy. An energy loss function is then formulated to minimize the Kullback-Leibler divergence between density evolution of the flow-driven prior and the data likelihood homotopy. Images can be generated after training the potential energy, by initializing the samples from Gaussian prior and solving the ODE governing the potential flow on a fixed time interval using generic ODE solvers. Experiment results show that the proposed VAPO framework is capable of generating realistic images on various image datasets. In particular, our proposed framework achieves competitive FID scores for unconditional image generation on the CIFAR-10 and CelebA datasets.


Practical multi-fidelity machine learning: fusion of deterministic and Bayesian models

arXiv.org Machine Learning

Multi-fidelity machine learning methods address the accuracy-efficiency trade-off by integrating scarce, resource-intensive high-fidelity data with abundant but less accurate low-fidelity data. We propose a practical multi-fidelity strategy for problems spanning low- and high-dimensional domains, integrating a non-probabilistic regression model for the low-fidelity with a Bayesian model for the high-fidelity. The models are trained in a staggered scheme, where the low-fidelity model is transfer-learned to the high-fidelity data and a Bayesian model is trained for the residual. This three-model strategy -- deterministic low-fidelity, transfer learning, and Bayesian residual -- leads to a prediction that includes uncertainty quantification both for noisy and noiseless multi-fidelity data. The strategy is general and unifies the topic, highlighting the expressivity trade-off between the transfer-learning and Bayesian models (a complex transfer-learning model leads to a simpler Bayesian model, and vice versa). We propose modeling choices for two scenarios, and argue in favor of using a linear transfer-learning model that fuses 1) kernel ridge regression for low-fidelity with Gaussian processes for high-fidelity; or 2) deep neural network for low-fidelity with a Bayesian neural network for high-fidelity. We demonstrate the effectiveness and efficiency of the proposed strategies and contrast them with the state-of-the-art based on various numerical examples. The simplicity of these formulations makes them practical for a broad scope of future engineering applications.