Goto

Collaborating Authors

 Learning Management


Online Learning and Planning in Cognitive Hierarchies

arXiv.org Artificial Intelligence

Complex robot behaviour typically requires the integration of multiple robotic and Artificial Intelligence (AI) techniques and components. Integrating such disparate components into a coherent system, while also ensuring global properties and behaviours, is a significant challenge for cognitive robotics. Using a formal framework to model the interactions between components can be an important step in dealing with this challenge. In this paper we extend an existing formal framework [Clark et al., 2016] to model complex integrated reasoning behaviours of robotic systems; from symbolic planning through to online learning of policies and transition systems. Furthermore the new framework allows for a more flexible modelling of the interactions between different reasoning components.


Efficient Online Learning with Offline Datasets for Infinite Horizon MDPs: A Bayesian Approach

arXiv.org Machine Learning

In this paper, we study the problem of efficient online reinforcement learning in the infinite horizon setting when there is an offline dataset to start with. We assume that the offline dataset is generated by an expert but with unknown level of competence, i.e., it is not perfect and not necessarily using the optimal policy. We show that if the learning agent models the behavioral policy (parameterized by a competence parameter) used by the expert, it can do substantially better in terms of minimizing cumulative regret, than if it doesn't do that. We establish an upper bound on regret of the exact informed PSRL algorithm that scales as $\tilde{O}(\sqrt{T})$. This requires a novel prior-dependent regret analysis of Bayesian online learning algorithms for the infinite horizon setting. We then propose an approximate Informed RLSVI algorithm that we can interpret as performing imitation learning with the offline dataset, and then performing online learning.


Multimodal Graph Learning for Generative Tasks

arXiv.org Artificial Intelligence

Multimodal learning combines multiple data modalities, broadening the types and complexity of data our models can utilize: for example, from plain text to image-caption pairs. Most multimodal learning algorithms focus on modeling simple one-to-one pairs of data from two modalities, such as image-caption pairs, or audio-text pairs. However, in most real-world settings, entities of different modalities interact with each other in more complex and multifaceted ways, going beyond one-to-one mappings. We propose to represent these complex relationships as graphs, allowing us to capture data with any number of modalities, and with complex relationships between modalities that can flexibly vary from one sample to another. Toward this goal, we propose Multimodal Graph Learning (MMGL), a general and systematic framework for capturing information from multiple multimodal neighbors with relational structures among them. In particular, we focus on MMGL for generative tasks, building upon pretrained Language Models (LMs), aiming to augment their text generation with multimodal neighbor contexts. We study three research questions raised by MMGL: (1) how can we infuse multiple neighbor information into the pretrained LMs, while avoiding scalability issues? (2) how can we infuse the graph structure information among multimodal neighbors into the LMs? and (3) how can we finetune the pretrained LMs to learn from the neighbor context in a parameter-efficient manner? We conduct extensive experiments to answer these three questions on MMGL and analyze the empirical results to pave the way for future MMGL research.


Improving Adaptive Online Learning Using Refined Discretization

arXiv.org Machine Learning

We study unconstrained Online Linear Optimization with Lipschitz losses. The goal is to simultaneously achieve ($i$) second order gradient adaptivity; and ($ii$) comparator norm adaptivity also known as "parameter freeness" in the literature. Existing regret bounds (Cutkosky and Orabona, 2018; Mhammedi and Koolen, 2020; Jacobsen and Cutkosky, 2022) have the suboptimal $O(\sqrt{V_T\log V_T})$ dependence on the gradient variance $V_T$, while the present work improves it to the optimal rate $O(\sqrt{V_T})$ using a novel continuous-time-inspired algorithm, without any impractical doubling trick. This result can be extended to the setting with unknown Lipschitz constant, eliminating the range ratio problem from prior works (Mhammedi and Koolen, 2020). Concretely, we first show that the aimed simultaneous adaptivity can be achieved fairly easily in a continuous time analogue of the problem, where the environment is modeled by an arbitrary continuous semimartingale. Then, our key innovation is a new discretization argument that preserves such adaptivity in the discrete time adversarial setting. This refines a non-gradient-adaptive discretization argument from (Harvey et al., 2023), both algorithmically and analytically, which could be of independent interest.


Online Learning with Adversaries: A Differential-Inclusion Analysis

arXiv.org Artificial Intelligence

We introduce an observation-matrix-based framework for fully asynchronous online Federated Learning (FL) with adversaries. In this work, we demonstrate its effectiveness in estimating the mean of a random vector. Our main result is that the proposed algorithm almost surely converges to the desired mean $\mu.$ This makes ours the first asynchronous FL method to have an a.s. convergence guarantee in the presence of adversaries. We derive this convergence using a novel differential-inclusion-based two-timescale analysis. Two other highlights of our proof include (a) the use of a novel Lyapunov function to show that $\mu$ is the unique global attractor for our algorithm's limiting dynamics, and (b) the use of martingale and stopping-time theory to show that our algorithm's iterates are almost surely bounded.


Decentralized Online Learning in Task Assignment Games for Mobile Crowdsensing

arXiv.org Artificial Intelligence

The problem of coordinated data collection is studied for a mobile crowdsensing (MCS) system. A mobile crowdsensing platform (MCSP) sequentially publishes sensing tasks to the available mobile units (MUs) that signal their willingness to participate in a task by sending sensing offers back to the MCSP. From the received offers, the MCSP decides the task assignment. A stable task assignment must address two challenges: the MCSP's and MUs' conflicting goals, and the uncertainty about the MUs' required efforts and preferences. To overcome these challenges a novel decentralized approach combining matching theory and online learning, called collision-avoidance multi-armed bandit with strategic free sensing (CA-MAB-SFS), is proposed. The task assignment problem is modeled as a matching game considering the MCSP's and MUs' individual goals while the MUs learn their efforts online. Our innovative "free-sensing" mechanism significantly improves the MU's learning process while reducing collisions during task allocation. The stable regret of CA-MAB-SFS, i.e., the loss of learning, is analytically shown to be bounded by a sublinear function, ensuring the convergence to a stable optimal solution. Simulation results show that CA-MAB-SFS increases the MUs' and the MCSP's satisfaction compared to state-of-the-art methods while reducing the average task completion time by at least 16%.


AI & Blockchain as sustainable teaching and learning tools to cope with the 4IR

arXiv.org Artificial Intelligence

The Fourth Industrial Revolution (4IR) is transforming the way we live and work, and education is no exception. To cope with the challenges of 4IR, there is a need for innovative and sustainable teaching and learning tools. AI and block chain technologies hold great promise in this regard, with potential benefits such as personalized learning, secure credentialing, and decentralized learning networks. This paper presents a review of existing research on AI and block chain in education, analyzing case studies and exploring the potential benefits and challenges of these technologies. The paper also suggests a unique model for integrating AI and block chain into sustainable teaching and learning practices. Future research directions are discussed, including the need for more empirical studies and the exploration of ethical and social implications. The key summary of this discussion is that, by enhancing accessibility, efficacy, and security in education, AI and blockchain have the potential to revolutionise the field. In order to ensure that students can benefit from these potentially game-changing technologies as technology develops, it will be crucial to find ways to harness its power while minimising hazards. Overall, this paper highlights the potential of AI and block chain as sustainable tools for teaching and learning in the 4IR era and their respective advantages, issues and future prospects have been discussed in this writing.


Efficient Methods for Non-stationary Online Learning

arXiv.org Machine Learning

Non-stationary online learning has drawn much attention in recent years. In particular, dynamic regret and adaptive regret are proposed as two principled performance measures for online convex optimization in non-stationary environments. To optimize them, a two-layer online ensemble is usually deployed due to the inherent uncertainty of the non-stationarity, in which a group of base-learners are maintained and a meta-algorithm is employed to track the best one on the fly. However, the two-layer structure raises the concern about the computational complexity -- those methods typically maintain $\mathcal{O}(\log T)$ base-learners simultaneously for a $T$-round online game and thus perform multiple projections onto the feasible domain per round, which becomes the computational bottleneck when the domain is complicated. In this paper, we present efficient methods for optimizing dynamic regret and adaptive regret, which reduce the number of projections per round from $\mathcal{O}(\log T)$ to $1$. Moreover, our obtained algorithms require only one gradient query and one function evaluation at each round. Our technique hinges on the reduction mechanism developed in parameter-free online learning and requires non-trivial twists on non-stationary online methods. Empirical studies verify our theoretical findings.


Andrew Ng: How to be an innovator

MIT Technology Review

Having spent many years working on AI research and building AI products, I'm fortunate to have participated in a few innovations that made an impact, like using reinforcement learning to fly helicopter drones at Stanford, starting and leading Google Brain to drive large-scale deep learning, and creating online courses that led to the founding of Coursera. I'd like to share some thoughts about how to do it well, sidestep some of the pitfalls, and avoid building things that lead to serious harm along the way. As I have said before, I believe AI is the new electricity. Electricity revolutionized all industries and changed our way of life, and AI is doing the same. It's reaching into every industry and discipline, and it's yielding advances that help multitudes of people.


The online learning architecture with edge computing for high-level control for assisting patients

arXiv.org Artificial Intelligence

The prevalence of mobility impairments due to conditions such as spinal cord injuries, strokes, and degenerative diseases is on the rise globally. Lower-limb exoskeletons have been increasingly recognized as a viable solution for enhancing mobility and rehabilitation for individuals with such impairments. However, existing exoskeleton control systems often suffer from limitations such as latency, lack of adaptability, and computational inefficiency. To address these challenges, this paper introduces a novel online adversarial learning architecture integrated with edge computing for high-level lower-limb exoskeleton control. In the proposed architecture, sensor data from the user is processed in real-time through edge computing nodes, which then interact with an online adversarial learning model. This model adapts to the user's specific needs and controls the exoskeleton with minimal latency. Experimental evaluations demonstrate significant improvements in control accuracy and adaptability, as well as enhanced quality-of-service (QoS) metrics. These findings indicate that the integration of online adversarial learning with edge computing offers a robust and efficient approach for the next generation of lower-limb exoskeleton control systems.