South America
A Model for Quality of Schooling
Moussavi, Massoud (Causal Links, LLC) | McGinn, Noel (Causal Links, LLC)
A key challenge for policymakers in many developing countries is to decide which intervention or collection of interventions works best to improve learning outcomes in their schools. Our aim is to develop a causal model that explains student learning outcomes in terms of observable characteristics as well as conditions and processes difficult to observe directly. We start with a theoretical model based on the results of previous research, direct experience and experts’ knowledge in the field. This model is then refined through application of supervised learning methods to available data sets. Once calibrated with local data in a country, the model estimates the probability that a given intervention would affect learning outcomes.
An Approach for Mining Accumulated Crop Cultivation Problems and their Solutions
El-Beltagy, Samhaa R. (Cairo University) | Rafea, Ahmed (American University in Cairo) | Mabrouk, Said (The Central Lab for Agricultural Expert Systems) | Rafea, Mahmoud (The Central Lab for Agricultural Expert Systems)
This paper presents an approach for mining agricultural problems that have been accumulated in a textual database over a period of 5 years. The problems, which are accompanied by their solutions, offer a wealth of knowledge that can be used by decision makers, researchers, and farmers alike. However, this wealth of knowledge can not be unlocked without a) representing these problems in a structured format, and b) applying algorithms that can summarize and analyze this information. Towards the achievement of the first goal, a multi-faceted object extraction methodology is presented, and for the achievement of the second, association rules are employed. As a proof of concept, the tool was applied of a set of weed problems. The presented methodology can be modified to work with any help and support textual database where both problems and their solutions are present.
A Step Towards Modeling and Destabilizing Human Trafficking Networks Using Machine Learning Methods
Amin, Shreya (Independent Researcher)
Human trafficking is a multi-dimensional problem for which we have incomplete data, limited knowledge of the exploiters, and no understanding of the dynamics of the process. It is a problem that requires a larger, more complete database, understanding of key actors and their interactions in a dynamic environment. These methods exist in the areas of Data Mining, Machine Learning, Network Analysis, and Multi-agent systems. Using these methods, it is possible to create a model which is unique to detecting and preventing human trafficking. These methods can give applicable and successful solutions for different components of the problem of human trafficking. The goal is to build an intelligent system to enable collaboration and analysis, to identify and profile victims, traffickers, buyers, and exploiters, to predict human trafficking patterns, and to disrupt and destabilize human trafficking networks. In this paper, I will outline how some of these methods may be able to help analyze and model the dynamic phenomenon of human trafficking. The purpose is to see whether, using intelligent systems and appropriate collaboration and analysis tools, optimized intervention strategies can be created to profile victims and traffickers as well as impact, dissolve, and disrupt the human trafficking network in such a way that the network is unable to recover.
Detecting Bots Based on Keylogging Activities
Al-Hammadi, Yousof, Aickelin, Uwe
A bot is a piece of software that is usually installed on an infected machine without the user's knowledge. A bot is controlled remotely by the attacker under a Command and Control structure. Recent statistics show that bots represent one of the fastest growing threats to our network by performing malicious activities such as email spamming or keylogging. However, few bot detection techniques have been developed to date. In this paper, we investigate a behavioural algorithm to detect a single bot that uses keylogging activity. Our approach involves the use of function calls analysis for the detection of the bot with a keylogging component. Correlation of the frequency of a specified time-window is performed to enhance he detection scheme. We perform a range of experiments with the spybot. Our results show that there is a high correlation between some function calls executed by this bot which indicates abnormal activity in our system.
Large Scale Nonparametric Bayesian Inference: Data Parallelisation in the Indian Buffet Process
Doshi-velez, Finale, Mohamed, Shakir, Ghahramani, Zoubin, Knowles, David A.
Nonparametric Bayesian models provide a framework for flexible probabilistic modelling of complex datasets. Unfortunately, Bayesian inference methods often require high-dimensional averages and can be slow to compute, especially with the potentially unbounded representations associated with nonparametric models. We address the challenge of scaling nonparametric Bayesian inference to the increasingly large datasets found in real-world applications, focusing on the case of parallelising inference in the Indian Buffet Process (IBP). Our approach divides a large data set between multiple processors. The processors use message passing to compute likelihoods in an asynchronous, distributed fashion and to propagate statistics about the global Bayesian posterior. This novel MCMC sampler is the first parallel inference scheme for IBP-based models, scaling to datasets orders of magnitude larger than had previously been possible.
On the asymptotic equivalence between differential Hebbian and temporal difference learning using a local third factor
Kolodziejski, Christoph, Porr, Bernd, Tamosiunaite, Minija, Wörgötter, Florentin
In this theoretical contribution we provide mathematical proof that two of the most important classes of network learning - correlation-based differential Hebbian learning and reward-based temporal difference learning - are asymptotically equivalent when timing the learning with a local modulatory signal. This opens the opportunity to consistently reformulate most of the abstract reinforcement learning framework from a correlation based perspective that is more closely related to the biophysics of neurons.
On the asymptotic equivalence between differential Hebbian and temporal difference learning using a local third factor
Kolodziejski, Christoph, Porr, Bernd, Tamosiunaite, Minija, Wörgötter, Florentin
In this theoretical contribution we provide mathematical proof that two of the most important classes of network learning - correlation-based differential Hebbian learning and reward-based temporal difference learning - are asymptotically equivalent when timing the learning with a local modulatory signal. This opens the opportunity to consistently reformulate most of the abstract reinforcement learning framework from a correlation based perspective that is more closely related to the biophysics of neurons.
Relative Margin Machines
Jebara, Tony, Shivaswamy, Pannagadatta K.
In classification problems, Support Vector Machines maximize the margin of separation between two classes. While the paradigm has been successful, the solution obtained by SVMs is dominated by the directions with large data spread and biased to separate the classes by cutting along large spread directions. This article proposes a novel formulation to overcome such sensitivity and maximizes the margin relative to the spread of the data. The proposed formulation can be efficiently solved and experiments on digit datasets show drastic performance improvements over SVMs.
Modeling Short-term Noise Dependence of Spike Counts in Macaque Prefrontal Cortex
Onken, Arno, Grünewälder, Steffen, Munk, Matthias, Obermayer, Klaus
Correlations between spike counts are often used to analyze neural coding. The noise is typically assumed to be Gaussian. Yet, this assumption is often inappropriate, especially for low spike counts. In this study, we present copulas as an alternative approach. With copulas it is possible to use arbitrary marginal distributions such as Poisson or negative binomial that are better suited for modeling noise distributions of spike counts. Furthermore, copulas place a wide range of dependence structures at the disposal and can be used to analyze higher order interactions. We develop a framework to analyze spike count data by means of copulas. Methods for parameter inference based on maximum likelihood estimates and for computation of Shannon entropy are provided. We apply the method to our data recorded from macaque prefrontal cortex. The data analysis leads to three significant findings: (1) copula-based distributions provide better fits than discretized multivariate normal distributions; (2) negative binomial margins fit the data better than Poisson margins; and (3) a dependence model that includes only pairwise interactions overestimates the information entropy by at least 19% compared to the model with higher order interactions.
Streaming k-means approximation
Ailon, Nir, Jaiswal, Ragesh, Monteleoni, Claire
We provide a clustering algorithm that approximately optimizes the k-means objective, in the one-pass streaming setting. We make no assumptions about the data, and our algorithm is very light-weight in terms of memory, and computation. This setting is applicable to unsupervised learning on massive data sets, or resource-constrained devices. The two main ingredients of our theoretical work are: a derivation of an extremely simple pseudo-approximation batch algorithm for k-means, in which the algorithm is allowed to output more than k centers (based on the recent k-means++"), and a streaming clustering algorithm in which batch clustering algorithms are performed on small inputs (fitting in memory) and combined in a hierarchical manner. Empirical evaluations on real and simulated data reveal the practical utility of our method."