Goto

Collaborating Authors

 Country


Feature Hashing for Large Scale Multitask Learning

arXiv.org Artificial Intelligence

Empirical evidence suggests that hashing is an effective strategy for dimensionality reduction and practical nonparametric estimation. In this paper we provide exponential tail bounds for feature hashing and show that the interaction between random subspaces is negligible with high probability. We demonstrate the feasibility of this approach with experimental results for a new use case -- multitask learning with hundreds of thousands of tasks.


Security Analysis of Online Centroid Anomaly Detection

arXiv.org Machine Learning

Security issues are crucial in a number of machine learning applications, especially in scenarios dealing with human activity rather than natural phenomena (e.g., information ranking, spam detection, malware detection, etc.). It is to be expected in such cases that learning algorithms will have to deal with manipulated data aimed at hampering decision making. Although some previous work addressed the handling of malicious data in the context of supervised learning, very little is known about the behavior of anomaly detection methods in such scenarios. In this contribution we analyze the performance of a particular method -- online centroid anomaly detection -- in the presence of adversarial noise. Our analysis addresses the following security-related issues: formalization of learning and attack processes, derivation of an optimal attack, analysis of its efficiency and constraints. We derive bounds on the effectiveness of a poisoning attack against centroid anomaly under different conditions: bounded and unbounded percentage of traffic, and bounded false positive rate. Our bounds show that whereas a poisoning attack can be effectively staged in the unconstrained case, it can be made arbitrarily difficult (a strict upper bound on the attacker's gain) if external constraints are properly used. Our experimental evaluation carried out on real HTTP and exploit traces confirms the tightness of our theoretical bounds and practicality of our protection mechanisms.


Ptarithmetic

arXiv.org Artificial Intelligence

The present article introduces ptarithmetic (short for "polynomial time arithmetic") -- a formal number theory similar to the well known Peano arithmetic, but based on the recently born computability logic (see http://www.cis.upenn.edu/~giorgi/cl.html) instead of classical logic. The formulas of ptarithmetic represent interactive computational problems rather than just true/false statements, and their "truth" is understood as existence of a polynomial time solution. The system of ptarithmetic elaborated in this article is shown to be sound and complete. Sound in the sense that every theorem T of the system represents an interactive number-theoretic computational problem with a polynomial time solution and, furthermore, such a solution can be effectively extracted from a proof of T. And complete in the sense that every interactive number-theoretic problem with a polynomial time solution is represented by some theorem T of the system. The paper is self-contained, and can be read without any previous familiarity with computability logic.


A New Understanding of Prediction Markets Via No-Regret Learning

arXiv.org Artificial Intelligence

We explore the striking mathematical connections that exist between market scoring rules, cost function based prediction markets, and no-regret learning. We show that any cost function based prediction market can be interpreted as an algorithm for the commonly studied problem of learning from expert advice by equating trades made in the market with losses observed by the learning algorithm. If the loss of the market organizer is bounded, this bound can be used to derive an O(sqrt(T)) regret bound for the corresponding learning algorithm. We then show that the class of markets with convex cost functions exactly corresponds to the class of Follow the Regularized Leader learning algorithms, with the choice of a cost function in the market corresponding to the choice of a regularizer in the learning problem. Finally, we show an equivalence between market scoring rules and prediction markets with convex cost functions. This implies that market scoring rules can also be interpreted naturally as Follow the Regularized Leader algorithms, and may be of independent interest. These connections provide new insight into how it is that commonly studied markets, such as the Logarithmic Market Scoring Rule, can aggregate opinions into accurate estimates of the likelihood of future events.


Comment on "Fastest learning in small-world neural networks"

arXiv.org Machine Learning

This comment reexamines Simard et al.'s work in [D. Simard, L. Nadeau, H. Kroger, Phys. Lett. A 336 (2005) 8-15]. We found that Simard et al. calculated mistakenly the local connectivity lengths Dlocal of networks. The right results of Dlocal are presented and the supervised learning performance of feedforward neural networks (FNNs) with different rewirings are re-investigated in this comment. This comment discredits Simard et al's work by two conclusions: 1) Rewiring connections of FNNs cannot generate networks with small-world connectivity; 2) For different training sets, there do not exist networks with a certain number of rewirings generating reduced learning errors than networks with other numbers of rewiring.


Syntactic Topic Models

arXiv.org Artificial Intelligence

The syntactic topic model (STM) is a Bayesian nonparametric model of language that discovers latent distributions of words (topics) that are both semantically and syntactically coherent. The STM models dependency parsed corpora where sentences are grouped into documents. It assumes that each word is drawn from a latent topic chosen by combining document-level features and the local syntactic context. Each document has a distribution over latent topics, as in topic models, which provides the semantic consistency. Each element in the dependency parse tree also has a distribution over the topics of its children, as in latent-state syntax models, which provides the syntactic consistency. These distributions are convolved so that the topic of each word is likely under both its document and syntactic context. We derive a fast posterior inference algorithm based on variational methods. We report qualitative and quantitative studies on both synthetic data and hand-parsed documents. We show that the STM is a more predictive model of language than current models based only on syntax or only on topics.


Feature Importance in Bayesian Assessment of Newborn Brain Maturity from EEG

arXiv.org Artificial Intelligence

The methodology of Bayesian Model Averaging (BMA) is applied for assessment of newborn brain maturity from sleep EEG. In theory this methodology provides the most accurate assessments of uncertainty in decisions. However, the existing BMA techniques have been shown providing biased assessments in the absence of some prior information enabling to explore model parameter space in details within a reasonable time. The lack in details leads to disproportional sampling from the posterior distribution. In case of the EEG assessment of brain maturity, BMA results can be biased because of the absence of information about EEG feature importance. In this paper we explore how the posterior information about EEG features can be used in order to reduce a negative impact of disproportional sampling on BMA performance. We use EEG data recorded from sleeping newborns to test the efficiency of the proposed BMA technique.


A Quasi-Newton Approach to Nonsmooth Convex Optimization Problems in Machine Learning

arXiv.org Machine Learning

We extend the well-known BFGS quasi-Newton method and its memory-limited variant LBFGS to the optimization of nonsmooth convex objectives. This is done in a rigorous fashion by generalizing three components of BFGS to subdifferentials: the local quadratic model, the identification of a descent direction, and the Wolfe line search conditions. We prove that under some technical conditions, the resulting subBFGS algorithm is globally convergent in objective function value. We apply its memory-limited variant (subLBFGS) to L_2-regularized risk minimization with the binary hinge loss. To extend our algorithm to the multiclass and multilabel settings, we develop a new, efficient, exact line search algorithm. We prove its worst-case time complexity bounds, and show that our line search can also be used to extend a recently developed bundle method to the multiclass and multilabel settings. We also apply the direction-finding component of our algorithm to L_1-regularized risk minimization with logistic loss. In all these contexts our methods perform comparable to or better than specialized state-of-the-art solvers on a number of publicly available datasets. An open source implementation of our algorithms is freely available.



Query Learning with Exponential Query Costs

arXiv.org Machine Learning

In query learning, the goal is to identify an unknown object while minimizing the number of "yes" or "no" questions (queries) posed about that object. A well-studied algorithm for query learning is known as generalized binary search (GBS). We show that GBS is a greedy algorithm to optimize the expected number of queries needed to identify the unknown object. We also generalize GBS in two ways. First, we consider the case where the cost of querying grows exponentially in the number of queries and the goal is to minimize the expected exponential cost. Then, we consider the case where the objects are partitioned into groups, and the objective is to identify only the group to which the object belongs. We derive algorithms to address these issues in a common, information-theoretic framework. In particular, we present an exact formula for the objective function in each case involving Shannon or Renyi entropy, and develop a greedy algorithm for minimizing it. Our algorithms are demonstrated on two applications of query learning, active learning and emergency response.