Goto

Collaborating Authors

 Computational Learning Theory


Activized Learning: Transforming Passive to Active with Improved Label Complexity

arXiv.org Machine Learning

We study the theoretical advantages of active learning over passive learning. Specifically, we prove that, in noise-free classifier learning for VC classes, any passive learning algorithm can be transformed into an active learning algorithm with asymptotically strictly superior label complexity for all nontrivial target functions and distributions. We further provide a general characterization of the magnitudes of these improvements in terms of a novel generalization of the disagreement coefficient. We also extend these results to active learning in the presence of label noise, and find that even under broad classes of noise distributions, we can typically guarantee strict improvements over the known results for passive learning.


Manipulation of Nanson's and Baldwin's Rules

AAAI Conferences

Nanson's and Baldwin's voting rules selecta winner by successively eliminatingcandidates with low Borda scores. We showthat these rules have a number of desirablecomputational properties. In particular,with unweighted votes, it isNP-hard to manipulate either rule with one manipulator, whilstwith weighted votes, it isNP-hard to manipulate either rule with a small number ofcandidates and a coalition of manipulators.As only a couple of other voting rulesare known to be NP-hard to manipulatewith a single manipulator, Nanson'sand Baldwin's rules appearto be particularly resistant to manipulation from a theoretical perspective.We also propose a number of approximation methodsfor manipulating these two rules.Experiments demonstrate that both rules areoften difficult to manipulate in practice.These results suggest that elimination stylevoting rules deserve further study.


Lower Bounds for Width-Restricted Clause Learning on Formulas of Small Width

AAAI Conferences

Clause learning is a technique used by back-tracking-based propositional satisfiability solvers, where some clauses obtained by analysis of conflicts are added to the formula during backtracking. It has been observed empirically that clause learning does not significantly improve the performance of a solver when restricted to learning clauses of small width only. This experience is supported by lower bound theorems. It is shown that lower bounds on the runtime of width-restricted clause learning follow from lower bounds on the width of resolution proofs. This yields the first lower bounds on width-restricted clause learning for formulas in 3-CNF.


Towards Understanding and Harnessing the Potential of Clause Learning

arXiv.org Artificial Intelligence

Efficient implementations of DPLL with the addition of clause learning are the fastest complete Boolean satisfiability solvers and can handle many significant real-world problems, such as verification, planning and design. Despite its importance, little is known of the ultimate strengths and limitations of the technique. This paper presents the first precise characterization of clause learning as a proof system (CL), and begins the task of understanding its power by relating it to the well-studied resolution proof system. In particular, we show that with a new learning scheme, CL can provide exponentially shorter proofs than many proper refinements of general resolution (RES) satisfying a natural property. These include regular and Davis-Putnam resolution, which are already known to be much stronger than ordinary DPLL. We also show that a slight variant of CL with unlimited restarts is as powerful as RES itself. Translating these analytical results to practice, however, presents a challenge because of the nondeterministic nature of clause learning algorithms. We propose a novel way of exploiting the underlying problem structure, in the form of a high level problem description such as a graph or PDDL specification, to guide clause learning algorithms toward faster solutions. We show that this leads to exponential speed-ups on grid and randomized pebbling problems, as well as substantial improvements on certain ordering formulas.


Manipulation of Nanson's and Baldwin's Rules

arXiv.org Artificial Intelligence

Nanson's and Baldwin's voting rules select a winner by successively eliminating candidates with low Borda scores. We show that these rules have a number of desirable computational properties. In particular, with unweighted votes, it is NP-hard to manipulate either rule with one manipulator, whilst with weighted votes, it is NP-hard to manipulate either rule with a small number of candidates and a coalition of manipulators. As only a couple of other voting rules are known to be NP-hard to manipulate with a single manipulator, Nanson's and Baldwin's rules appear to be particularly resistant to manipulation from a theoretical perspective. We also propose a number of approximation methods for manipulating these two rules. Experiments demonstrate that both rules are often difficult to manipulate in practice. These results suggest that elimination style voting rules deserve further study.


Finding a Path is Harder than Finding a Tree

arXiv.org Artificial Intelligence

I consider the problem of learning an optimal path graphical model from data and show the problem to be NP-hard for the maximum likelihood and minimum description length approaches and a Bayesian approach. This hardness result holds despite the fact that the problem is a restriction of the polynomially solvable problem of finding the optimal tree graphical model.


Solving Rubik's Cube Using SAT Solvers

arXiv.org Artificial Intelligence

Rubik's Cube is an easily-understood puzzle, which is originally called the "magic cube". It is a well-known planning problem, which has been studied for a long time. Yet many simple properties remain unknown. This paper studies whether modern SAT solvers are applicable to this puzzle. To our best knowledge, we are the first to translate Rubik's Cube to a SAT problem. To reduce the number of variables and clauses needed for the encoding, we replace a naive approach of 6 Boolean variables to represent each color on each facelet with a new approach of 3 or 2 Boolean variables. In order to be able to solve quickly Rubik's Cube, we replace the direct encoding of 18 turns with the layer encoding of 18-subtype turns based on 6-type turns. To speed up the solving further, we encode some properties of two-phase algorithm as an additional constraint, and restrict some move sequences by adding some constraint clauses. Using only efficient encoding cannot solve this puzzle. For this reason, we improve the existing SAT solvers, and develop a new SAT solver based on PrecoSAT, though it is suited only for Rubik's Cube. The new SAT solver replaces the lookahead solving strategy with an ALO (\emph{at-least-one}) solving strategy, and decomposes the original problem into sub-problems. Each sub-problem is solved by PrecoSAT. The empirical results demonstrate both our SAT translation and new solving technique are efficient. Without the efficient SAT encoding and the new solving technique, Rubik's Cube will not be able to be solved still by any SAT solver. Using the improved SAT solver, we can find always a solution of length 20 in a reasonable time. Although our solver is slower than Kociemba's algorithm using lookup tables, but does not require a huge lookup table.


Notes on a New Philosophy of Empirical Science

arXiv.org Machine Learning

This book presents a methodology and philosophy of empirical science based on large scale lossless data compression. In this view a theory is scientific if it can be used to build a data compression program, and it is valuable if it can compress a standard benchmark database to a small size, taking into account the length of the compressor itself. This methodology therefore includes an Occam principle as well as a solution to the problem of demarcation. Because of the fundamental difficulty of lossless compression, this type of research must be empirical in nature: compression can only be achieved by discovering and characterizing empirical regularities in the data. Because of this, the philosophy provides a way to reformulate fields such as computer vision and computational linguistics as empirical sciences: the former by attempting to compress databases of natural images, the latter by attempting to compress large text databases. The book argues that the rigor and objectivity of the compression principle should set the stage for systematic progress in these fields. The argument is especially strong in the context of computer vision, which is plagued by chronic problems of evaluation. The book also considers the field of machine learning. Here the traditional approach requires that the models proposed to solve learning problems be extremely simple, in order to avoid overfitting. However, the world may contain intrinsically complex phenomena, which would require complex models to understand. The compression philosophy can justify complex models because of the large quantity of data being modeled (if the target database is 100 Gb, it is easy to justify a 10 Mb model). The complex models and abstractions learned on the basis of the raw data (images, language, etc) can then be reused to solve any specific learning problem, such as face recognition or machine translation.


Online Learning: Stochastic and Constrained Adversaries

arXiv.org Machine Learning

Learning theory has largely focused on two main learning scenarios. The first is the classical statistical setting where instances are drawn i.i.d. from a fixed distribution and the second scenario is the online learning, completely adversarial scenario where adversary at every time step picks the worst instance to provide the learner with. It can be argued that in the real world neither of these assumptions are reasonable. It is therefore important to study problems with a range of assumptions on data. Unfortunately, theoretical results in this area are scarce, possibly due to absence of general tools for analysis. Focusing on the regret formulation, we define the minimax value of a game where the adversary is restricted in his moves. The framework captures stochastic and non-stochastic assumptions on data. Building on the sequential symmetrization approach, we define a notion of distribution-dependent Rademacher complexity for the spectrum of problems ranging from i.i.d. to worst-case. The bounds let us immediately deduce variation-type bounds. We then consider the i.i.d. adversary and show equivalence of online and batch learnability. In the supervised setting, we consider various hybrid assumptions on the way that x and y variables are chosen. Finally, we consider smoothed learning problems and show that half-spaces are online learnable in the smoothed model. In fact, exponentially small noise added to adversary's decisions turns this problem with infinite Littlestone's dimension into a learnable problem.


Reduced Ordered Binary Decision Diagram with Implied Literals: A New knowledge Compilation Approach

arXiv.org Artificial Intelligence

Knowledge compilation is an approach to tackle the computational intractability of general reasoning problems. According to this approach, knowledge bases are converted off-line into a target compilation language which is tractable for on-line querying. Reduced ordered binary decision diagram (ROBDD) is one of the most influential target languages. We generalize ROBDD by associating some implied literals in each node and the new language is called reduced ordered binary decision diagram with implied literals (ROBDD-L). Then we discuss a kind of subsets of ROBDD-L called ROBDD-i with precisely i implied literals (0 \leq i \leq \infty). In particular, ROBDD-0 is isomorphic to ROBDD; ROBDD-\infty requires that each node should be associated by the implied literals as many as possible. We show that ROBDD-i has uniqueness over some specific variables order, and ROBDD-\infty is the most succinct subset in ROBDD-L and can meet most of the querying requirements involved in the knowledge compilation map. Finally, we propose an ROBDD-i compilation algorithm for any i and a ROBDD-\infty compilation algorithm. Based on them, we implement a ROBDD-L package called BDDjLu and then get some conclusions from preliminary experimental results: ROBDD-\infty is obviously smaller than ROBDD for all benchmarks; ROBDD-\infty is smaller than the d-DNNF the benchmarks whose compilation results are relatively small; it seems that it is better to transform ROBDDs-\infty into FBDDs and ROBDDs rather than straight compile the benchmarks.