Goto

Collaborating Authors

 Oceania


NSW suggests facial recognition could replace Opal cards in 'not too distant future'

The Guardian

Facial recognition could be used to replace swipe cards on public transport, the New South Wales government has suggested, but the opposition and digital rights groups say it would pose a risk to privacy. The transport minister, Andrew Constance, said on Tuesday he wanted commuters "in the not too distant future" to be able to board trains using only their faces, with no need for Opal cards, barriers or turnstiles. "I'm about to outline some concepts which may seem pretty crazy and far-fetched," he told the Sydney Institute on Tuesday. "But look at it this way โ€“ who would have thought in 1970 that you'd be able to use a handheld device to have a video conversation with someone on the other side of the world? "I want people to not think about their travel.


On the Optimality of Trees Generated by ID3

arXiv.org Machine Learning

Since its inception in the 1980s, ID3 has become one of the most successful and widely used algorithms for learning decision trees. However, its theoretical properties remain poorly understood. In this work, we analyze the heuristic of growing a decision tree with ID3 for a limited number of iterations $t$ and given that nodes are split as in the case of exact information gain and probability computations. In several settings, we provide theoretical and empirical evidence that the TopDown variant of ID3, introduced by Kearns and Mansour (1996), produces trees with optimal or near-optimal test error among all trees with $t$ internal nodes. We prove optimality in the case of learning conjunctions under product distributions and learning read-once DNFs with 2 terms under the uniform distribition. Using efficient dynamic programming algorithms, we empirically show that TopDown generates trees that are near-optimal ($\sim \%1$ difference from optimal test error) in a large number of settings for learning read-once DNFs under product distributions.


Self-Regulated Interactive Sequence-to-Sequence Learning

arXiv.org Machine Learning

Not all types of supervision signals are created equal: Different types of feedback have different costs and effects on learning. We show how self-regulation strategies that decide when to ask for which kind of feedback from a teacher (or from oneself) can be cast as a learning-to-learn problem leading to improved cost-aware sequence-to-sequence learning. In experiments on interactive neural machine translation, we find that the self-regulator discovers an $\epsilon$-greedy strategy for the optimal cost-quality trade-off by mixing different feedback types including corrections, error markups, and self-supervision. Furthermore, we demonstrate its robustness under domain shift and identify it as a promising alternative to active learning.


Brigitte, a Bridge-Based Grid Path-Finder

AAAI Conferences

We present BRIGITTE, a new path-finding algorithm for 8-connected grids. It is based on the notion bridge that we define here, i.e., a high-level description of paths between all pairs of points from two convex regions that allows fast distance query and fast generation of the prefix and suffix of these paths. BRIGITTE uses a pre-processing step to first partition the map into convex regions and then compute a sufficient set of bridges between every pair of regions. Path-finding is then performed by looking up the regions of the source and target cells and then iterating over the bridges of the pair of regions to determine which one yields the shortest path. BRIGITTE competes favourably compared to CH-SG-R and Copp, although this currently comes at a price of an extensive pre-processing.


A Theoretical Comparison of the Bounds of MM, NBS, and GBFHS

AAAI Conferences

Recent work in bidirectional front-to-end heuristic search has led to the development of novel algorithms that have advanced the state of the art after many years without major developments. In particular, three different algorithms with very different behavior have been proposed: MM, NBS and GBFHS. In this paper we perform a theoretical comparison of these algorithms, defining lower and upper bounds for each of them and analyzing why a given algorithm displays beneficial characteristics that the others lack. With this information, we propose a simple and intuitive near-optimal algorithm to be used as a baseline for comparison in bidirectional front-to-end heuristic search.


A Profit Guided Coordination Heuristic for Travelling Thief Problems

AAAI Conferences

The travelling thief problem (TTP) is a combination of two interdependent NP-hard components: travelling salesman problem (TSP) and knapsack problem (KP). Existing approaches for TTP typically solve the TSP and KP components in an interleaved fashion, where the solution to one component is held fixed while the other component is changed. This indicates poor coordination between solving the two components and may lead to poor quality TTP solutions. For solving the TSP component, the 2-OPT segment reversing heuristic is often used for modifying the tour. We propose an extended and modified form of the reversing heuristic in order to concurrently consider both the TSP and KP components. Items deemed as less profitable and picked in cities earlier in the reversed segment are replaced by items that tend to be equally or more profitable and not picked in the later cities. Comparative evaluations on a broad range of benchmark TTP instances indicate that the proposed approach outperforms existing state-of-the-art TTP solvers.


FiDi-RL: Incorporating Deep Reinforcement Learning with Finite-Difference Policy Search for Efficient Learning of Continuous Control

arXiv.org Artificial Intelligence

In recent years significant progress has been made in dealing with challenging problems using reinforcement learning.Despite its great success, reinforcement learning still faces challenge in continuous control tasks. Conventional methods always compute the derivatives of the optimal goal with a costly computation resources, and are inefficient, unstable and lack of robust-ness when dealing with such tasks. Alternatively, derivative-based methods treat the optimization process as a blackbox and show robustness and stability in learning continuous control tasks, but not data efficient in learning. The combination of both methods so as to get the best of the both has raised attention. However, most of the existing combination works adopt complex neural networks (NNs) as the policy for control. The double-edged sword of deep NNs can yield better performance, but also makes it difficult for parameter tuning and computation. To this end, in this paper we presents a novel method called FiDi-RL, which incorporates deep RL with Finite-Difference (FiDi) policy search.FiDi-RL combines Deep Deterministic Policy Gradients (DDPG)with Augment Random Search (ARS) and aims at improving the data efficiency of ARS. The empirical results show that FiDi-RL can improves the performance and stability of ARS, and provide competitive results against some existing deep reinforcement learning methods


Near-optimal Bayesian Solution For Unknown Discrete Markov Decision Process

arXiv.org Artificial Intelligence

We tackle the problem of acting in an unknown finite and discrete Markov Decision Process (MDP) for which the expected shortest path from any state to any other state is bounded by a finite number $D$. An MDP consists of $S$ states and $A$ possible actions per state. Upon choosing an action $a_t$ at state $s_t$, one receives a real value reward $r_t$, then one transits to a next state $s_{t+1}$. The reward $r_t$ is generated from a fixed reward distribution depending only on $(s_t, a_t)$ and similarly, the next state $s_{t+1}$ is generated from a fixed transition distribution depending only on $(s_t, a_t)$. The objective is to maximize the accumulated rewards after $T$ interactions. In this paper, we consider the case where the reward distributions, the transitions, $T$ and $D$ are all unknown. We derive the first polynomial time Bayesian algorithm, BUCRL{} that achieves up to logarithm factors, a regret (i.e the difference between the accumulated rewards of the optimal policy and our algorithm) of the optimal order $\tilde{\mathcal{O}}(\sqrt{DSAT})$. Importantly, our result holds with high probability for the worst-case (frequentist) regret and not the weaker notion of Bayesian regret. We perform experiments in a variety of environments that demonstrate the superiority of our algorithm over previous techniques. Our work also illustrates several results that will be of independent interest. In particular, we derive a sharper upper bound for the KL-divergence of Bernoulli random variables. We also derive sharper upper and lower bounds for Beta and Binomial quantiles. All the bound are very simple and only use elementary functions.


Latent ODEs for Irregularly-Sampled Time Series

arXiv.org Machine Learning

Time series with non-uniform intervals occur in many applications, and are difficult to model using standard recurrent neural networks (RNNs). We generalize RNNs to have continuous-time hidden dynamics defined by ordinary differential equations (ODEs), a model we call ODE-RNNs. Furthermore, we use ODE-RNNs to replace the recognition network of the recently-proposed Latent ODE model. Both ODE-RNNs and Latent ODEs can naturally handle arbitrary time gaps between observations, and can explicitly model the probability of observation times using Poisson processes. We show experimentally that these ODE-based models outperform their RNN-based counterparts on irregularly-sampled data.


Fine-Grained Continual Learning

arXiv.org Machine Learning

Robotic vision is a field where continual learning can play a significant role. An embodied agent operating in a complex environment subject to frequent and unpredictable changes is required to learn and adapt continuously. In the context of object recognition, for example, a robot should be able to learn (without forgetting) objects of never seen classes as well as improving its recognition capabilities as new instances of already known classes are discovered. Ideally, continual learning should be triggered by the availability of short videos of single objects and performed online on onboard hardware. In this paper, we introduce a novel fine-grained continual learning protocol based on the CORe50 benchmark and propose two continual learning techniques that can learn effectively even in the challenging case of nearly 400 small non-i.i.d.