Goto

Collaborating Authors

 Africa


DiversiNews: Surfacing Diversity in Online News

AI Magazine

For most events of at least moderate significance, there are likely tens, often hundreds or thousands of online articles reporting on it, each from a slightly different perspective. If we want to understand an event in depth, from multiple perspectives, we need to aggregate multiple sources and understand the relations between them. However, current news aggregators do not offer this kind of functionality. As a step towards a solution, we propose DiversiNews, a real-time news aggregation and exploration platfom whose main feature is a novel set of controls that allow users to contrast reports of a selected event based on topical emphases, sentiment differences and/or publisher geolocation. News events are presented in the form of a ranked list of articles pertaining to the event and an automatically generated summary. Both the ranking and the summary are interactive and respond in real time to user’s change of controls. We validated the concept and the user interface through user tests with positive results.


On the Limitation of Spectral Methods: From the Gaussian Hidden Clique Problem to Rank-One Perturbations of Gaussian Tensors

Neural Information Processing Systems

We consider the following detection problem: given a realization of asymmetric matrix $X$ of dimension $n$, distinguish between the hypothesisthat all upper triangular variables are i.i.d. Gaussians variableswith mean 0 and variance $1$ and the hypothesis that there is aplanted principal submatrix $B$ of dimension $L$ for which all upper triangularvariables are i.i.d. Gaussians with mean $1$ and variance $1$, whereasall other upper triangular elements of $X$ not in $B$ are i.i.d.Gaussians variables with mean 0 and variance $1$. We refer to this asthe `Gaussian hidden clique problem'. When $L=( 1 + \epsilon) \sqrt{n}$ ($\epsilon > 0$), it is possible to solve thisdetection problem with probability $1 - o_n(1)$ by computing thespectrum of $X$ and considering the largest eigenvalue of $X$.We prove that when$L < (1-\epsilon)\sqrt{n}$ no algorithm that examines only theeigenvalues of $X$can detect the existence of a hiddenGaussian clique, with error probability vanishing as $n \to \infty$.The result above is an immediate consequence of a more general result on rank-oneperturbations of $k$-dimensional Gaussian tensors.In this context we establish a lower bound on the criticalsignal-to-noise ratio below which a rank-one signal cannot be detected.


Bayesian Policy Reuse

arXiv.org Artificial Intelligence

A long-lived autonomous agent should be able to respond online to novel instances of tasks from a familiar domain. Acting online requires 'fast' responses, in terms of rapid convergence, especially when the task instance has a short duration, such as in applications involving interactions with humans. These requirements can be problematic for many established methods for learning to act. In domains where the agent knows that the task instance is drawn from a family of related tasks, albeit without access to the label of any given instance, it can choose to act through a process of policy reuse from a library, rather than policy learning from scratch. In policy reuse, the agent has prior knowledge of the class of tasks in the form of a library of policies that were learnt from sample task instances during an offline training phase. We formalise the problem of policy reuse, and present an algorithm for efficiently responding to a novel task instance by reusing a policy from the library of existing policies, where the choice is based on observed 'signals' which correlate to policy performance. We achieve this by posing the problem as a Bayesian choice problem with a corresponding notion of an optimal response, but the computation of that response is in many cases intractable. Therefore, to reduce the computation cost of the posterior, we follow a Bayesian optimisation approach and define a set of policy selection functions, which balance exploration in the policy library against exploitation of previously tried policies, together with a model of expected performance of the policy library on their corresponding task instances. We validate our method in several simulated domains of interactive, short-duration episodic tasks, showing rapid convergence in unknown task variations.


Reinforcement Learning with Parameterized Actions

arXiv.org Artificial Intelligence

We introduce a model-free algorithm for learning in Markov decision processes with parameterized actions--discrete actions with continuous parameters. At each step the agent must select both which action to use and which parameters to use with that action. We introduce the Q-PAMDP algorithm for learning in these domains, show that it converges to a local optimum, and compare it to direct policy search in the goalscoring and Platform domains.


Lasso based feature selection for malaria risk exposure prediction

arXiv.org Machine Learning

In life sciences, the experts generally use empirical knowledge to recode variables, choose interactions and perform selection by classical approach. The aim of this work is to perform automatic learning algorithm for variables selection which can lead to know if experts can be help in they decision or simply replaced by the machine and improve they knowledge and results. The Lasso method can detect the optimal subset of variables for estimation and prediction under some conditions. In this paper, we propose a novel approach which uses automatically all variables available and all interactions. By a double cross-validation combine with Lasso, we select a best subset of variables and with GLM through a simple cross-validation perform predictions. The algorithm assures the stability and the the consistency of estimators.


Domain Scoping for Subject Matter Experts

AAAI Conferences

Exploring web and in particular social media data is an essential task to many of the subject matter experts in order to discover content around their subject of interest. It is important to provide them with a tool to define their scope of vocabulary, i.e what to search for, and suggest them commonly used terms besides the serendipitous terms allowing them to define their scope of explorations. This paper presents methods on constructing ``domain models" which are families of keywords and extractors to enable focus on social media documents relevant to a project using multiple channels of information extraction.


Collective Prediction of Individual Mobility Traces with Exponential Weights

arXiv.org Machine Learning

We present and test a sequential learning algorithm for the short-term prediction of human mobility. The experts are individual sequence prediction algorithms constructed from the mobility traces of 10 million roaming mobile phone users in a European country. Average prediction accuracy is significantly higher than that of individual sequence prediction algorithms, namely constant order Markov models derived from the user's own data, that have been shown to achieve high accuracy in previous studies of human mobility prediction. The algorithm uses only time stamped location data, and accuracy depends on the completeness of the expert ensemble, which should contain redundant records of typical mobility patterns. The proposed algorithm is applicable to the prediction of any sufficiently large dataset of sequences. The problem of algorithmic prediction of human mobility has received significant attention in the literature in recent years, for its potential applications and its inherent theoretical value. The problem is posed as asking for a prediction of the short-term future location of an individual, given his or hers previous locations and possibly other side information. One can distinguish the algorithms that have been studied in the mobility prediction literature in two classes. On one hand, we have algorithms that use only the single user's past locations, without any other information, to estimate the next location. This individual sequence prediction is closely related to lossless compression of sequential data [1-3]. The other category comprises methods that take advantage of data beyond the user's own past locations (e.g. the network of connections between devices as a proxy of the user's social interactions [4]).


Fast and Guaranteed Tensor Decomposition via Sketching

arXiv.org Machine Learning

Tensor CANDECOMP/PARAFAC (CP) decomposition has wide applications in statistical learning of latent variable models and in data mining. In this paper, we propose fast and randomized tensor CP decomposition algorithms based on sketching. We build on the idea of count sketches, but introduce many novel ideas which are unique to tensors. We develop novel methods for randomized computation of tensor contractions via FFTs, without explicitly forming the tensors. Such tensor contractions are encountered in decomposition methods such as tensor power iterations and alternating least squares. We also design novel colliding hashes for symmetric tensors to further save time in computing the sketches. We then combine these sketching ideas with existing whitening and tensor power iterative techniques to obtain the fastest algorithm on both sparse and dense tensors. The quality of approximation under our method does not depend on properties such as sparsity, uniformity of elements, etc. We apply the method for topic modeling and obtain competitive results.


VB calibration to improve the interface between phone recognizer and i-vector extractor

arXiv.org Machine Learning

The EM training algorithm of the classical i-vector extractor is often incorrectly described as a maximum-likelihood method. The i-vector model is however intractable: the likelihood itself and the hidden-variable posteriors needed for the EM algorithm cannot be computed in closed form. We show here that the classical i-vector extractor recipe is actually a mean-field variational Bayes (VB) recipe. This theoretical VB interpretation turns out to be of further use, because it also offers an interpretation of the newer phonetic i-vector extractor recipe, thereby unifying the two flavours of extractor. More importantly, the VB interpretation is also practically useful: it suggests ways of modifying existing i-vector extractors to make them more accurate. In particular, in existing methods, the approximate VB posterior for the GMM states is fixed, while only the parameters of the generative model are adapted. Here we explore the possibility of also mildly adjusting (calibrating) those posteriors, so that they better fit the generative model.


Building Subject-aligned Comparable Corpora and Mining it for Truly Parallel Sentence Pairs

arXiv.org Machine Learning

Parallel sentences are a relatively scarce but extremely useful resource for many applications including cross-lingual retrieval and statistical machine translation. This research explores our methodology for mining such data from previously obtained comparable corpora. The task is highly practical since non-parallel multilingual data exist in far greater quantities than parallel corpora, but parallel sentences are a much more useful resource. Here we propose a web crawling method for building subject-aligned comparable corpora from Wikipedia articles. We also introduce a method for extracting truly parallel sentences that are filtered out from noisy or just comparable sentence pairs. We describe our implementation of a specialized tool for this task as well as training and adaption of a machine translation system that supplies our filter with additional information about the similarity of comparable sentence pairs.