Oceania
R 3 : Reinforced Ranker-Reader for Open-Domain Question Answering
Wang, Shuohang (Singapore Management University) | Yu, Mo (IBM Research AI) | Guo, Xiaoxiao (IBM Research AI) | Wang, Zhiguo (IBM Research AI) | Klinger, Tim (IBM Research AI) | Zhang, Wei (IBM Research AI) | Chang, Shiyu (IBM Research AI) | Tesauro, Gerry (IBM Research AI) | Zhou, Bowen (JD.COM) | Jiang, Jing (Singapore Management University)
In recent years researchers have achieved considerable success applying neural network methods to question answering (QA). These approaches have achieved state of the art results in simplified closed-domain settings such as the SQuAD (Rajpurkar et al. 2016) dataset, which provides a pre-selected passage, from which the answer to a given question may be extracted. More recently, researchers have begun to tackle open-domain QA, in which the model is given a question and access to a large corpus (e.g., wikipedia) instead of a pre-selected passage (Chen et al. 2017a). This setting is more complex as it requires large-scale search for relevant passages by an information retrieval component, combined with a reading comprehension model that “reads” the passages to generate an answer to the question. Performance in this setting lags well behind closed-domain performance. In this paper, we present a novel open-domain QA system called Reinforced Ranker-Reader (R 3 ), based on two algorithmic innovations. First, we propose a new pipeline for open-domain QA with a Ranker component, which learns to rank retrieved passages in terms of likelihood of extracting the ground-truth answer to a given question. Second, we propose a novel method that jointly trains the Ranker along with an answer-extraction Reader model, based on reinforcement learning. We report extensive experimental results showing that our method significantly improves on the state of the art for multiple open-domain QA datasets.
Multi-Facet Network Embedding: Beyond the General Solution of Detection and Representation
Yang, Liang (Hebei University of Technology) | Guo, Yuanfang (Institute of Information Engineering, Chinese Academy of Sciences) | Cao, Xiaochun (Institute of Information Engineering, Chinese Academy of Sciences)
In network analysis, community detection and network embedding are two important topics. Community detection tends to obtain the most noticeable partition, while network embedding aims at seeking node representations which contains as many diverse properties as possible. We observe that the current community detection and network embedding problems are being resolved by a general solution, i.e., "maximizing the consistency between similar nodes while maximizing the distance between the dissimilar nodes." This general solution only exploits the most noticeable structure (facet) of the network, which effectively satisfies the demands of the community detection. Unfortunately, most of the specific embedding algorithms, which are developed from the general solution, cannot achieve the goal of network embedding by exploring only one facet of the network. To improve the general solution for better modeling the real network, we propose a novel network embedding method, Multi-facet Network Embedding (MNE), to capture the multiple facets of the network. MNE learns multiple embeddings simultaneously, with the Hilbert Schmidt Independence Criterion (HSIC) being the a diversity constraint. To efficiently solve the optimization problem, we propose a Binary HSIC with linear complexity and solve the MNE objective function by adopting the Augmented Lagrange Multiplier (ALM) method. The overall complexity is linear with the scale of the network. Extensive results demonstrate that MNE gives efficient performances and outperforms the state-of-the-art network embedding methods.
An End-to-End Deep Learning Architecture for Graph Classification
Zhang, Muhan (Washington University in St. Louis) | Cui, Zhicheng ( Washington University in St. Louis ) | Neumann, Marion ( Washington University in St. Louis ) | Chen, Yixin ( Washington University in St. Louis )
Neural networks are typically designed to deal with data in tensor forms. In this paper, we propose a novel neural network architecture accepting graphs of arbitrary structure. Given a dataset containing graphs in the form of (G,y) where G is a graph and y is its class, we aim to develop neural networks that read the graphs directly and learn a classification function. There are two main challenges: 1) how to extract useful features characterizing the rich information encoded in a graph for classification purpose, and 2) how to sequentially read a graph in a meaningful and consistent order. To address the first challenge, we design a localized graph convolution model and show its connection with two graph kernels. To address the second challenge, we design a novel SortPooling layer which sorts graph vertices in a consistent order so that traditional neural networks can be trained on the graphs. Experiments on benchmark graph classification datasets demonstrate that the proposed architecture achieves highly competitive performance with state-of-the-art graph kernels and other graph neural network methods. Moreover, the architecture allows end-to-end gradient-based training with original graphs, without the need to first transform graphs into vectors.
StackReader: An RNN-Free Reading Comprehension Model
Jiang, Yibo (Columbia University) | Zhao, Zhou (Zhejiang University)
Machine comprehension of text is the problem to answer a query based on a given context. Many existing systems use RNN-based units for contextual modeling linked with some attention mechanisms. In this paper, however, we propose StackReader, an end-to-end neural network model, to solve this problem, without recurrent neural network (RNN) units and its variants. This simple model is based solely on attention mechanism and gated convolutional neural network. Experiments on SQuAD have shown to have relatively high accuracy with a significant decrease in training time.
Decomposition-Based Solving Approaches for Stochastic Constraint Optimisation
Hemmi, David (Monash University)
Combinatorial optimisation problems often contain uncertainty that has to be taken into account to produce realistic solutions. A common way to describe the uncertainty is by means of scenarios, where each scenario describes different potential sets of problem parameters based on random distributions or historical data. While efficient algorithmic techniques exist for specific problem classes such as linear programs, there are very few approaches that can handle general Constraint Programming formulations subject to uncertainty. The goal of my PhD is to develop generic methods for solving stochastic combinatorial optimisation problems formulated in a Constraint Programming framework.
Kill Two Birds With One Stone: Weakly-Supervised Neural Network for Image Annotation and Tag Refinement
Zhang, Junjie (University of Technology Sydney) | Wu, Qi (University of Adelaide) | Zhang, Jian (University of Technology Sydney) | Shen, Chunhua (University of Adelaide) | Lu, Jianfeng (Nanjing University of Science and Technology)
The number of social images has exploded by the wide adoption of social networks, and people like to share their comments about them. These comments can be a description of the image, or some objects, attributes, scenes in it, which are normally used as the user-provided tags. However, it is well-known that user-provided tags are incomplete and imprecise to some extent. Directly using them can damage the performance of related applications, such as the image annotation and retrieval. In this paper, we propose to learn an image annotation model and refine the user-provided tags simultaneously in a weakly-supervised manner. The deep neural network is utilized as the image feature learning and backbone annotation model, while visual consistency, semantic dependency, and user-error sparsity are introduced as the constraints at the batch level to alleviate the tag noise. Therefore, our model is highly flexible and stable to handle large-scale image sets. Experimental results on two benchmark datasets indicate that our proposed model achieves the best performance compared to the state-of-the-art methods.
Knowledge-Based Policies for Qualitative Decentralized POMDPs
Saffidine, Abdallah (University of New South Wales, Sydney) | Schwarzentruber, François (Univ. Rennes, CNRS, IRISA) | Zanuttini, Bruno (Normandie Univ)
Qualitative Decentralized Partially Observable Markov Decision Problems (QDec-POMDPs) constitute a very general class of decision problems. They involve multiple agents, decentralized execution, sequential decision, partial observability, and uncertainty. Typically, joint policies, which prescribe to each agent an action to take depending on its full history of (local) actions and observations, are huge, which makes it difficult to store them onboard, at execution time, and also hampers the computation of joint plans. We propose and investigate a new representation for joint policies in QDec-POMDPs, which we call Multi-Agent Knowledge-Based Programs (MAKBPs), and which uses epistemic logic for compactly representing conditions on histories. Contrary to standard representations, executing an MAKBP requires reasoning at execution time, but we show that MAKBPs can be exponentially more succinct than any reactive representation.
DiSAN: Directional Self-Attention Network for RNN/CNN-Free Language Understanding
Shen, Tao (University of Technology Sydney) | Zhou, Tianyi (University of Washington) | Long, Guodong (University of Technology Sydney) | Jiang, Jing (University of Technology Sydney) | Pan, Shirui (University of Technology Sydney) | Zhang, Chengqi (University of Technology Sydney)
Recurrent neural nets (RNN) and convolutional neural nets (CNN) are widely used on NLP tasks to capture the long-term and local dependencies, respectively. Attention mechanisms have recently attracted enormous interest due to their highly parallelizable computation, significantly less training time, and flexibility in modeling dependencies. We propose a novel attention mechanism in which the attention between elements from input sequence(s) is directional and multi-dimensional (i.e., feature-wise). A light-weight neural net, "Directional Self-Attention Network (DiSAN)," is then proposed to learn sentence embedding, based solely on the proposed attention without any RNN/CNN structure. DiSAN is only composed of a directional self-attention with temporal order encoded, followed by a multi-dimensional attention that compresses the sequence into a vector representation. Despite its simple form, DiSAN outperforms complicated RNN models on both prediction quality and time efficiency. It achieves the best test accuracy among all sentence encoding methods and improves the most recent best result by 1.02% on the Stanford Natural Language Inference (SNLI) dataset, and shows state-of-the-art test accuracy on the Stanford Sentiment Treebank (SST), Multi-Genre natural language inference (MultiNLI), Sentences Involving Compositional Knowledge (SICK), Customer Review, MPQA, TREC question-type classification and Subjectivity (SUBJ) datasets.
Fourier Feature Approximations for Periodic Kernels in Time-Series Modelling
Tompkins, Anthony (The University of Sydney) | Ramos, Fabio (The University of Sydney)
Gaussian Processes (GPs) provide an extremely powerful mechanism to model a variety of problems but incur an O(N 3 ) complexity in the number of data samples. Common approximation methods rely on what are often termed inducing points but still typically incur an O(NM 2 ) complexity in the data and corresponding inducing points. Using Random Fourier Feature (RFF) maps, we overcome this by transforming the problem into a Bayesian Linear Regression formulation upon which we apply a Bayesian Variational treatment that also allows learning the corresponding kernel hyperparameters, likelihood and noise parameters. In this paper we introduce an alternative method using Fourier series to obtain spectral representations of common kernels, in particular for periodic warpings, which surprisingly have a convergent, non-random form using special functions, requiring fewer spectral features to approximate their corresponding kernel to high accuracy. Using this, we can fuse the Random Fourier Feature spectral representations of common kernels with their periodic counterparts to show how they can more effectively and expressively learn patterns in time-series for both interpolation and extrapolation. This method combines robustness, scalability and equally importantly, interpretability through a symbolic declarative grammar that is both functionally and humanly intuitive — a property that is crucial for explainable decision making. Using probabilistic programming and Variational Inference we are able to efficiently optimise over these rich functional representations. We show significantly improved Gram matrix approximation errors, and also demonstrate the method in several time-series problems comparing other commonly used approaches such as recurrent neural networks.
Domain Generalization via Conditional Invariant Representations
Li, Ya (University of Science and Technology of China) | Gong, Mingming (Carnegie Mellon University; University of Pittsburgh) | Tian, Xinmei (University of Science and Technology of China) | Liu, Tongliang (University of Sydney) | Tao, Dacheng (University of Sydney)
Domain generalization aims to apply knowledge gained from multiple labeled source domains to unseen target domains. The main difficulty comes from the dataset bias: training data and test data have different distributions, and the training set contains heterogeneous samples from different distributions. Let X denote the features, and Y be the class labels. Existing domain generalization methods address the dataset bias problem by learning a domain-invariant representation h(X) that has the same marginal distribution P(h(X)) across multiple source domains. The functional relationship encoded in P(Y|X) is usually assumed to be stable across domains such that P(Y|h(X)) is also invariant. However, it is unclear whether this assumption holds in practical problems. In this paper, we consider the general situation where both P(X) and P(Y|X) can change across all domains. We propose to learn a feature representation which has domain-invariant class conditional distributions P(h(X)|Y). With the conditional invariant representation, the invariance of the joint distribution P(h(X),Y) can be guaranteed if the class prior P(Y) does not change across training and test domains. Extensive experiments on both synthetic and real data demonstrate the effectiveness of the proposed method.