Education
Definition and properties to assess multi-agent environments as social intelligence tests
Insa-Cabrera, Javier, Hernández-Orallo, José
Social intelligence in natural and artificial systems is usually measured by the evaluation of associated traits or tasks that are deemed to represent some facets of social behaviour. The amalgamation of these traits is then used to configure the intuitive notion of social intelligence. Instead, in this paper we start from a parametrised definition of social intelligence as the expected performance in a set of environments with several agents, and we assess and derive tests from it. This definition makes several dependencies explicit: (1) the definition depends on the choice (and weight) of environments and agents, (2) the definition may include both competitive and cooperative behaviours depending on how agents and rewards are arranged into teams, (3) the definition mostly depends on the abilities of other agents, and (4) the actual difference between social intelligence and general intelligence (or other abilities) depends on these choices. As a result, we address the problem of converting this definition into a more precise one where some fundamental properties ensuring social behaviour (such as action and reward dependency and anticipation on competitive/cooperative behaviours) are met as well as some other more instrumental properties (such as secernment, boundedness, symmetry, validity, reliability, efficiency), which are convenient to convert the definition into a practical test. From the definition and the formalised properties, we take a look at several representative multi-agent environments, tests and games to see whether they meet these properties.
LARSEN-ELM: Selective Ensemble of Extreme Learning Machines using LARS for Blended Data
Han, Bo, He, Bo, Nian, Rui, Ma, Mengmeng, Zhang, Shujing, Li, Minghui, Lendasse, Amaury
Extreme learning machine (ELM) as a neural network algorithm has shown its good performance, such as fast speed, simple structure etc, but also, weak robustness is an unavoidable defect in original ELM for blended data. We present a new machine learning framework called LARSEN-ELM for overcoming this problem. In our paper, we would like to show two key steps in LARSEN-ELM. In the first step, preprocessing, we select the input variables highly related to the output using least angle regression (LARS). In the second step, training, we employ Genetic Algorithm (GA) based selective ensemble and original ELM. In the experiments, we apply a sum of two sines and four datasets from UCI repository to verify the robustness of our approach. The experimental results show that compared with original ELM and other methods such as OP-ELM, GASEN-ELM and LSBoost, LARSEN-ELM significantly improve robustness performance while keeping a relatively high speed.
Bayesian image segmentations by Potts prior and loopy belief propagation
Tanaka, Kazuyuki, Kataoka, Shun, Yasuda, Muneki, Waizumi, Yuji, Hsu, Chiou-Ting
This paper presents a Bayesian image segmentation model based on Potts prior and loopy belief propagation. The proposed Bayesian model involves several terms, including the pairwise interactions of Potts models, and the average vectors and covariant matrices of Gauss distributions in color image modeling. These terms are often referred to as hyperparameters in statistical machine learning theory. In order to determine these hyperparameters, we propose a new scheme for hyperparameter estimation based on conditional maximization of entropy in the Potts prior. The algorithm is given based on loopy belief propagation. In addition, we compare our conditional maximum entropy framework with the conventional maximum likelihood framework, and also clarify how the first order phase transitions in LBP's for Potts models influence our hyperparameter estimation procedures.
$OntoMath^{PRO}$ Ontology: A Linked Data Hub for Mathematics
Nevzorova, Olga, Zhiltsov, Nikita, Kirillovich, Alexander, Lipachev, Evgeny
In this paper, we present an ontology of mathematical knowledge concepts that covers a wide range of the fields of mathematics and introduces a balanced representation between comprehensive and sensible models. We demonstrate the applications of this representation in information extraction, semantic search, and education. We argue that the ontology can be a core of future integration of math-aware data sets in the Web of Data and, therefore, provide mappings onto relevant datasets, such as DBpedia and ScienceWISE.
Normalized Online Learning
Ross, Stephane, Mineiro, Paul, Langford, John
We introduce online learning algorithms which are independent of feature scales, proving regret bounds dependent on the ratio of scales existent in the data rather than the absolute scale. This has several useful effects: there is no need to pre-normalize data, the test-time and test-space complexity are reduced, and the algorithms are more robust.
Algorithms for Approximate Minimization of the Difference Between Submodular Functions, with Applications
Iyer, Rishabh, Bilmes, Jeff A.
We extend the work of Narasimhan and Bilmes [30] for minimizing set functions representable as a dierence between submodular functions. Similar to [30], our new algorithms are guaranteed to monotonically reduce the objective function at every step. We empirically and theoretically show that the per-iteration cost of our algorithms is much less than [30], and our algorithms can be used to efficiently minimize a dierence between submodular functions under various combinatorial constraints, a problem not previously addressed. We provide computational bounds and a hardness result on the multiplicative inapproximability of minimizing the dierence between submodular functions. We show, however, that it is possible to give worst-case additive bounds by providing a polynomial time computable lower-bound on the minima. Finally we show how a number of machine learning problems can be modeled as minimizing the dierence between submodular functions. We experimentally show the validity of our algorithms by testing them on the problem of feature selection with submodular cost features.
Targeting Optimal Active Learning via Example Quality
Evans, Lewis P. G., Adams, Niall M., Anagnostopoulos, Christoforos
In many classification problems unlabelled data is abundant and a subset can be chosen for labelling. This defines the context of active learning (AL), where methods systematically select that subset, to improve a classifier by retraining. Given a classification problem, and a classifier trained on a small number of labelled examples, consider the selection of a single further example. This example will be labelled by the oracle and then used to retrain the classifier. This example selection raises a central question: given a fully specified stochastic description of the classification problem, which example is the optimal selection? If optimality is defined in terms of loss, this definition directly produces expected loss reduction (ELR), a central quantity whose maximum yields the optimal example selection. This work presents a new theoretical approach to AL, example quality, which defines optimal AL behaviour in terms of ELR. Once optimal AL behaviour is defined mathematically, reasoning about this abstraction provides insights into AL. In a theoretical context the optimal selection is compared to existing AL methods, showing that heuristics can make sub-optimal selections. Algorithms are constructed to estimate example quality directly. A large-scale experimental study shows these algorithms to be competitive with standard AL methods.
Dynamic Feature Scaling for Online Learning of Binary Classifiers
Scaling feature values is an important step in numerous machine learning tasks. Different features can have different value ranges and some form of a feature scaling is often required in order to learn an accurate classifier. However, feature scaling is conducted as a preprocessing task prior to learning. This is problematic in an online setting because of two reasons. First, it might not be possible to accurately determine the value range of a feature at the initial stages of learning when we have observed only a few number of training instances. Second, the distribution of data can change over the time, which render obsolete any feature scaling that we perform in a pre-processing step. We propose a simple but an effective method to dynamically scale features at train time, thereby quickly adapting to any changes in the data stream. We compare the proposed dynamic feature scaling method against more complex methods for estimating scaling parameters using several benchmark datasets for binary classification. Our proposed feature scaling method consistently outperforms more complex methods on all of the benchmark datasets and improves classification accuracy of a state-of-the-art online binary classifier algorithm.
kLog: A Language for Logical and Relational Learning with Kernels
Frasconi, Paolo, Costa, Fabrizio, De Raedt, Luc, De Grave, Kurt
We introduce kLog, a novel approach to statistical relational learning. Unlike standard approaches, kLog does not represent a probability distribution directly. It is rather a language to perform kernel-based learning on expressive logical and relational representations. kLog allows users to specify learning problems declaratively. It builds on simple but powerful concepts: learning from interpretations, entity/relationship data modeling, logic programming, and deductive databases. Access by the kernel to the rich representation is mediated by a technique we call graphicalization: the relational representation is first transformed into a graph --- in particular, a grounded entity/relationship diagram. Subsequently, a choice of graph kernel defines the feature space. kLog supports mixed numerical and symbolic data, as well as background knowledge in the form of Prolog or Datalog programs as in inductive logic programming systems. The kLog framework can be applied to tackle the same range of tasks that has made statistical relational learning so popular, including classification, regression, multitask learning, and collective classification. We also report about empirical comparisons, showing that kLog can be either more accurate, or much faster at the same level of accuracy, than Tilde and Alchemy. kLog is GPLv3 licensed and is available at http://klog.dinfo.unifi.it along with tutorials.
Spectral Bandits for Smooth Graph Functions with Applications in Recommender Systems
Kocák, Tomáš (Inria Lille) | Valko, Michal (Inria Lille) | Munos, Rémi ( Inria Lille and Microsoft Research New England ) | Kveton, Branislav (Technicolor California) | Agrawal, Shipra (Microsoft Research India)
Smooth functions on graphs have wide applications in manifold and semi-supervised learning. In this paper, we study a bandit problem where the payoffs of arms are smooth on a graph. This framework is suitable for solving online learning problems that involve graphs, such as content-based recommendation. In this problem, each recommended item is a node and its expected rating is similar to its neighbors. The goal is to recommend items that have high expected ratings. We aim for the algorithms where the cumulative regret would not scale poorly with the number of nodes. In particular, we introduce the notion of an effective dimension, which is small in real-world graphs, and propose two algorithms for solving our problem that scale linearly in this dimension. Our experiments on real-world content recommendation problem show that a good estimator of user preferences for thousands of items can be learned from just tens nodes evaluations.