Goto

Collaborating Authors

 Computational Learning Theory


Binary Matrix Factorization via Dictionary Learning

arXiv.org Machine Learning

Matrix factorization is a key tool in data analysis; its applications include recommender systems, correlation analysis, signal processing, among others. Binary matrices are a particular case which has received significant attention for over thirty years, especially within the field of data mining. Dictionary learning refers to a family of methods for learning overcomplete basis (also called frames) in order to efficiently encode samples of a given type; this area, now also about twenty years old, was mostly developed within the signal processing field. In this work we propose two binary matrix factorization methods based on a binary adaptation of the dictionary learning paradigm to binary matrices. The proposed algorithms focus on speed and scalability; they work with binary factors combined with bit-wise operations and a few auxiliary integer ones. Furthermore, the methods are readily applicable to online binary matrix factorization. Another important issue in matrix factorization is the choice of rank for the factors; we address this model selection problem with an efficient method based on the Minimum Description Length principle. Our preliminary results show that the proposed methods are effective at producing interpretable factorizations of various data types of different nature.


Optimal Bounds on the VC-dimension

arXiv.org Machine Learning

The VC-dimension of a set system is a way to capture its complexity and has been a key parameter studied extensively in machine learning and geometry communities. In this paper, we resolve two longstanding open problems on bounding the VC-dimension of two fundamental set systems: $k$-fold unions/intersections of half-spaces, and the simplices set system. Among other implications, it settles an open question in machine learning that was first studied in the 1989 foundational paper of Blumer, Ehrenfeucht, Haussler and Warmuth as well as by Eisenstat and Angluin and Johnson.


Probably approximately correct learning of Horn envelopes from queries

arXiv.org Artificial Intelligence

We propose an algorithm for learning the Horn envelope of an arbitrary domain using an expert, or an oracle, capable of answering certain types of queries about this domain. Attribute exploration from formal concept analysis is a procedure that solves this problem, but the number of queries it may ask is exponential in the size of the resulting Horn formula in the worst case. We recall a well-known polynomial-time algorithm for learning Horn formulas with membership and equivalence queries and modify it to obtain a polynomial-time probably approximately correct algorithm for learning the Horn envelope of an arbitrary domain. Keywords: PAC learning, attribute exploration, FCA, formal concept 2010 MSC: 68T27, 06B99 1. Introduction The learnability of concepts from oracle queries has received significant attention in learning theory. The most common types of oracles investigated in the literature are membership and equivalence oracles, and for these types of oracles various results have been obtained showing learnability in polynomial time. One of the most prominent examples is the fact that Horn formulas can be learnt in polynomial time with access to membership and equivalence oracles [1]. In the realm of formal concept analysis [2], a different learning method has been established almost simultaneously with the standard query learning setting. The theory of formal concept analysis emerged as a subfield of mathematical order theory, more precisely of lattice theory, and it studies lattices as hierarchies of concepts. Since its emergence in the early 1980s, it has evolved into a rich theory with a wide range of applications. An important technique of formal concept analysis is the attribute exploration algorithm. A Horn envelope of a theory is a Horn formula whose set of models includes all the models of the theory and is as specific as possible [3].


Learning Theory and Algorithms for Revenue Management in Sponsored Search

arXiv.org Machine Learning

Online advertisement is the main source of revenue for Internet business. Advertisers are typically ranked according to a score that takes into account their bids and potential click-through rates(eCTR). Generally, the likelihood that a user clicks on an ad is often modeled by optimizing for the click through rates rather than the performance of the auction in which the click through rates will be used. This paper attempts to eliminate this dis-connection by proposing loss functions for click modeling that are based on final auction performance.In this paper, we address two feasible metrics (AUC^R and SAUC) to evaluate the on-line RPM (revenue per mille) directly rather than the CTR. And then, we design an explicit ranking function by incorporating the calibration fac-tor and price-squashed factor to maximize the revenue. Given the power of deep networks, we also explore an implicit optimal ranking function with deep model. Lastly, various experiments with two real world datasets are presented. In particular, our proposed methods perform better than the state-of-the-art methods with regard to the revenue of the platform.


Adaptation to Easy Data in Prediction with Limited Advice

arXiv.org Machine Learning

We derive an online learning algorithm with improved regret guarantees for "easy" loss sequences. We consider two types of "easiness": (a) stochastic loss sequences and (b) adversarial loss sequences with small effective range of the losses. While a number of algorithms have been proposed for exploiting small effective range in the full information setting, Gerchinovitz and Lattimore [2016] have shown the impossibility of regret scaling with the effective range of the losses in the bandit setting. We show that just one additional observation per round is sufficient to bypass the impossibility result. The proposed Second Order Difference Adjustments (SODA) algorithm requires no prior knowledge of the effective range of the losses, $\varepsilon$, and achieves an $O(\varepsilon \sqrt{KT \ln K}) + \tilde{O}(\varepsilon K \sqrt[4]{T})$ expected regret guarantee, where $T$ is the time horizon and $K$ is the number of actions. The scaling with the effective loss range is achieved under significantly weaker assumptions than those made by Cesa-Bianchi and Shamir [2018] in an earlier attempt to bypass the impossibility result. We also provide regret lower bound of $\Omega(\varepsilon\sqrt{T K})$, which almost matches the upper bound. In addition, we show that in the stochastic setting SODA achieves an $O\left(\sum_{a:\Delta_a>0} \frac{K\varepsilon^2}{\Delta_a}\right)$ pseudo-regret bound that holds simultaneously with the adversarial regret guarantee. In other words, SODA is safe against an unrestricted oblivious adversary and provides improved regret guarantees for at least two different types of "easiness" simultaneously.


LeapsAndBounds: A Method for Approximately Optimal Algorithm Configuration

arXiv.org Artificial Intelligence

We consider the problem of configuring general-purpose solvers to run efficiently on problem instances drawn from an unknown distribution. The goal of the configurator is to find a configuration that runs fast on average on most instances, and do so with the least amount of total work. It can run a chosen solver on a random instance until the solver finishes or a timeout is reached. We propose LeapsAndBounds, an algorithm that tests configurations on randomly selected problem instances for longer and longer time. We prove that the capped expected runtime of the configuration returned by LeapsAndBounds is close to the optimal expected runtime, while our algorithm's running time is near-optimal. Our results show that LeapsAndBounds is more efficient than the recent algorithm of Kleinberg et al. (2017), which, to our knowledge, is the only other algorithm configuration method with non-trivial theoretical guarantees. Experimental results on configuring a public SAT solver on a new benchmark dataset also stand witness to the superiority of our method.


The Mathematics of Machine Learning - AI Trends

#artificialintelligence

In the last few months, I have had several people contact me about their enthusiasm for venturing into the world of data science and using Machine Learning (ML) techniques to probe statistical regularities and build impeccable data-driven products. However, I've observed that some actually lack the necessary mathematical intuition and framework to get useful results. This is the main reason I decided to write this blog post. Recently, there has been an upsurge in the availability of many easy-to-use machine and deep learning packages such as scikit-learn, Weka, Tensorflow etc. Machine Learning theory is a field that intersects statistical, probabilistic, computer science and algorithmic aspects arising from learning iteratively from data and finding hidden insights which can be used to build intelligent applications. Despite the immense possibilities of Machine and Deep Learning, a thorough mathematical understanding of many of these techniques is necessary for a good grasp of the inner workings of the algorithms and getting good results.


How To Become A Machine Learning Expert In One Simple Step -- Swan Intelligence

#artificialintelligence

The web is full of good explanations of machine learning algorithms. And every second applicant for a data science position has finished the Coursera course on machine learning. Theory will not help you choose good values for the 16 parameters a standard implementation of a random forest takes. The default values are good to get started, but which parameters should you modify depending on your data? Choosing the right features, algorithms and parameters is an art.


Machine Learning: Bridging the Gaps in IT Data Silos

#artificialintelligence

In today's complex business world – where many organizations operate in silos, data is plentiful and it's challenging to get a big-picture view of the entire IT landscape – how can enterprises better manage, analyze and interpret tremendous amounts of data? The next big thing in ITOA – machine learning – is providing a viable solution. Machine learning studies how to design algorithms that can learn by observing data, discovering new insights in data, developing systems that can automatically adapt and customize themselves, and designing systems where it's too complicated and costly to implement all possible circumstances, such as search engines and self-driving cars. There's been a significant increase in machine learning applications in ITOA due, in large part, to the ongoing growth of machine learning theory, algorithms, and computational resources on demand. Many organizations are finding that machine learning allows them to better analyze large amounts of data, gain valuable insights, reduce incident investigation time, determine which alerts are correlated and what causes event storms – and even prevent incidents from happening in the first place.


Chaining Mutual Information and Tightening Generalization Bounds

arXiv.org Machine Learning

Bounding the generalization error of learning algorithms has a long history, that yet falls short in explaining various generalization successes including those of deep learning. Two important difficulties are (i) exploiting the dependencies between the hypotheses, (ii) exploiting the dependence between the algorithm's input and output. Progress on the first point was made with the chaining method, originating from the work of Kolmogorov and used in the VC-dimension bound. More recently, progress on the second point was made with the mutual information method by Russo and Zou '15. Yet, these two methods are currently disjoint. In this paper, we introduce a technique to combine chaining and mutual information methods, to obtain a generalization bound that is both algorithm-dependent and that exploits the dependencies between the hypotheses. We provide an example in which our bound significantly outperforms both the chaining and the mutual information bounds. As a corollary, we tighten Dudley inequality under the knowledge that a learning algorithm chooses its output from a small subset of hypotheses with high probability; an assumption motivated by the performance of SGD discussed in Zhang et al. '17.