Goto

Collaborating Authors

 Search


Automatic Gesture Recognition in Robot-assisted Surgery with Reinforcement Learning and Tree Search

arXiv.org Artificial Intelligence

Automatic surgical gesture recognition is fundamental for improving intelligence in robot-assisted surgery, such as conducting complicated tasks of surgery surveillance and skill evaluation. However, current methods treat each frame individually and produce the outcomes without effective consideration on future information. In this paper, we propose a framework based on reinforcement learning and tree search for joint surgical gesture segmentation and classification. An agent is trained to segment and classify the surgical video in a human-like manner whose direct decisions are re-considered by tree search appropriately. Our proposed tree search algorithm unites the outputs from two designed neural networks, i.e., policy and value network. With the integration of complementary information from distinct models, our framework is able to achieve the better performance than baseline methods using either of the neural networks. For an overall evaluation, our developed approach consistently outperforms the existing methods on the suturing task of JIGSAWS dataset in terms of accuracy, edit score and F1 score. Our study highlights the utilization of tree search to refine actions in reinforcement learning framework for surgical robotic applications.


Learn to Design the Heuristics for Vehicle Routing Problem

arXiv.org Artificial Intelligence

This paper presents an approach to learn the local-search heuristics that iteratively improves the solution of Vehicle Routing Problem (VRP). A local-search heuristics is composed of a destroy operator that destructs a candidate solution, and a following repair operator that rebuilds the destructed one into a new one. The proposed neural network, as trained through actor-critic framework, consists of an encoder in form of a modified version of Graph Attention Network where node embeddings and edge embeddings are integrated, and a GRU-based decoder rendering a pair of destroy and repair operators. Experiment results show that it outperforms both the traditional heuristics algorithms and the existing neural combinatorial optimization for VRP on medium-scale data set, and is able to tackle the large-scale data set (e.g., over 400 nodes) which is a considerable challenge in this area. Moreover, the need for expertise and handcrafted heuristics design is eliminated due to the fact that the proposed network learns to design the heuristics with a better performance. Our implementation is available online. 1 Keywords Vehicle Routing Problem · Combinatorial Optimization · Large Neighborhood Search · Neural Combinatorial Search · Reinforcement Learning · Graph Attention Network


(Individual) Fairness for $k$-Clustering

arXiv.org Machine Learning

We give a local search based algorithm for $k$-median ($k$-means) clustering from the perspective of individual fairness. More precisely, for a point $x$ in a point set $P$ of size $n$, let $r(x)$ be the minimum radius such that the ball of radius $r(x)$ centered at $x$ has at least $n/k$ points from $P$. Intuitively, if a set of $k$ random points are chosen from $P$ as centers, every point $x\in P$ expects to have a center within radius $r(x)$. An individually fair clustering provides such a guarantee for every point $x\in P$. This notion of fairness was introduced in [Jung et al., 2019] where they showed how to get an approximately feasible $k$-clustering with respect to this fairness condition. In this work, we show how to get an approximately optimal such fair $k$-clustering. The $k$-median ($k$-means) cost of our solution is within a constant factor of the cost of an optimal fair $k$-clustering, and our solution approximately satisfies the fairness condition (also within a constant factor). Further, we complement our theoretical bounds with empirical evaluation.


Near-minimax recursive density estimation on the binary hypercube

Neural Information Processing Systems

This paper describes a recursive estimation procedure for multivariate binary densities using orthogonal expansions. For $d$ covariates, there are $2 d$ basis coefficients to estimate, which renders conventional approaches computationally prohibitive when $d$ is large. However, for a wide class of densities that satisfy a certain sparsity condition, our estimator runs in probabilistic polynomial time and adapts to the unknown sparsity of the underlying density in two key ways: (1) it attains near-minimax mean-squared error, and (2) the computational complexity is lower for sparser densities. Our method also allows for flexible control of the trade-off between mean-squared error and computational complexity. Papers published at the Neural Information Processing Systems Conference.


Minimax Multi-Task Learning and a Generalized Loss-Compositional Paradigm for MTL

Neural Information Processing Systems

Since its inception, the modus operandi of multi-task learning (MTL) has been to minimize the task-wise mean of the empirical risks. We introduce a generalized loss-compositional paradigm for MTL that includes a spectrum of formulations as a subfamily. One endpoint of this spectrum is minimax MTL: a new MTL formulation that minimizes the maximum of the tasks' empirical risks. Via a certain relaxation of minimax MTL, we obtain a continuum of MTL formulations spanning minimax MTL and classical MTL. The full paradigm itself is loss-compositional, operating on the vector of empirical risks.


Variance Reduction in Monte-Carlo Tree Search

Neural Information Processing Systems

Monte-Carlo Tree Search (MCTS) has proven to be a powerful, generic planning technique for decision-making in single-agent and adversarial environments. The stochastic nature of the Monte-Carlo simulations introduces errors in the value estimates, both in terms of bias and variance. Whilst reducing bias (typically through the addition of domain knowledge) has been studied in the MCTS literature, comparatively little effort has focused on reducing variance. This is somewhat surprising, since variance reduction techniques are a well-studied area in classical statistics. In this paper, we examine the application of some standard techniques for variance reduction in MCTS, including common random numbers, antithetic variates and control variates.


Minimax Statistical Learning with Wasserstein distances

Neural Information Processing Systems

As opposed to standard empirical risk minimization (ERM), distributionally robust optimization aims to minimize the worst-case risk over a larger ambiguity set containing the original empirical distribution of the training data. In this work, we describe a minimax framework for statistical learning with ambiguity sets given by balls in Wasserstein space. In particular, we prove generalization bounds that involve the covering number properties of the original ERM problem. As an illustrative example, we provide generalization guarantees for transport-based domain adaptation problems where the Wasserstein distance between the source and target domain distributions can be reliably estimated from unlabeled samples. Papers published at the Neural Information Processing Systems Conference.


Learning Chordal Markov Networks via Branch and Bound

Neural Information Processing Systems

We present a new algorithmic approach for the task of finding a chordal Markov network structure that maximizes a given scoring function. The algorithm is based on branch and bound and integrates dynamic programming for both domain pruning and for obtaining strong bounds for search-space pruning. Empirically, we show that the approach dominates in terms of running times a recent integer programming approach (and thereby also a recent constraint optimization approach) for the problem. Papers published at the Neural Information Processing Systems Conference.


Theoretical Analysis of Heuristic Search Methods for Online POMDPs

Neural Information Processing Systems

Planning in partially observable environments remains a challenging problem, despite significant recent advances in offline approximation techniques. A few online methods have also been proposed recently, and proven to be remarkably scalable, but without the theoretical guarantees of their offline counterparts. Thus it seems natural to try to unify offline and online techniques, preserving the theoretical properties of the former, and exploiting the scalability of the latter. In this paper, we provide theoretical guarantees on an anytime algorithm for POMDPs which aims to reduce the error made by approximate offline value iteration algorithms through the use of an efficient online searching procedure. The algorithm uses search heuristics based on an error analysis of lookahead search, to guide the online search towards reachable beliefs with the most potential to reduce error.


The Epoch-Greedy Algorithm for Multi-armed Bandits with Side Information

Neural Information Processing Systems

Epoch-Greedy has the following properties: No knowledge of a time horizon $T$ is necessary. The regret incurred by Epoch-Greedy is controlled by a sample complexity bound for a hypothesis class. Here $S$ is the complexity term in a sample complexity bound for standard supervised learning. Papers published at the Neural Information Processing Systems Conference.