AITopics

1107.3059

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
Europe > Switzerland (0.04)
Europe > France > Île-de-France > Paris > Paris (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.35)

Rubinstein, J. Hyam, Rubinstein, Benjamin I. P., Bartlett, Peter L.

Bounding Embeddings of VC Classes into Maximum Classes

arXiv.org Machine LearningJan-28-2014

One of the earliest conjectures in computational learning theory-the Sample Compression conjecture-asserts that concept classes (equivalently set systems) admit compression schemes of size linear in their VC dimension. To-date this statement is known to be true for maximum classes---those that possess maximum cardinality for their VC dimension. The most promising approach to positively resolving the conjecture is by embedding general VC classes into maximum classes without super-linear increase to their VC dimensions, as such embeddings would extend the known compression schemes to all VC classes. We show that maximum classes can be characterised by a local-connectivity property of the graph obtained by viewing the class as a cubical complex. This geometric characterisation of maximum VC classes is applied to prove a negative embedding result which demonstrates VC-d classes that cannot be embedded in any maximum class of VC dimension lower than 2d. On the other hand, we show that every VC-d class C embeds in a VC-(d+D) maximum class where D is the deficiency of C, i.e., the difference between the cardinalities of a maximum VC-d class and of C. For VC-2 classes in binary n-cubes for 4 <= n <= 6, we give best possible results on embedding into maximum classes. For some special classes of Boolean functions, relationships with maximum classes are investigated. Finally we give a general recursive procedure for embedding VC-d classes into VC-(d+k) maximum classes for smallest k.

artificial intelligence, machine learning, maximum class, (16 more...)

1401.7388

Country:

Oceania > Australia > Victoria > Melbourne (0.04)
Oceania > Australia > Queensland (0.04)
North America > United States (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.84)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (1.00)

Riondato, Matteo, Vandin, Fabio

Finding the True Frequent Itemsets

arXiv.org Machine LearningJan-22-2014

Frequent Itemsets (FIs) mining is a fundamental primitive in data mining. It requires to identify all itemsets appearing in at least a fraction $\theta$ of a transactional dataset $\mathcal{D}$. Often though, the ultimate goal of mining $\mathcal{D}$ is not an analysis of the dataset \emph{per se}, but the understanding of the underlying process that generated it. Specifically, in many applications $\mathcal{D}$ is a collection of samples obtained from an unknown probability distribution $\pi$ on transactions, and by extracting the FIs in $\mathcal{D}$ one attempts to infer itemsets that are frequently (i.e., with probability at least $\theta$) generated by $\pi$, which we call the True Frequent Itemsets (TFIs). Due to the inherently stochastic nature of the generative process, the set of FIs is only a rough approximation of the set of TFIs, as it often contains a huge number of \emph{false positives}, i.e., spurious itemsets that are not among the TFIs. In this work we design and analyze an algorithm to identify a threshold $\hat{\theta}$ such that the collection of itemsets with frequency at least $\hat{\theta}$ in $\mathcal{D}$ contains only TFIs with probability at least $1-\delta$, for some user-specified $\delta$. Our method uses results from statistical learning theory involving the (empirical) VC-dimension of the problem at hand. This allows us to identify almost all the TFIs without including any false positive. We also experimentally compare our method with the direct mining of $\mathcal{D}$ at frequency $\theta$ and with techniques based on widely-used standard bounds (i.e., the Chernoff bounds) of the binomial distribution, and show that our algorithm outperforms these methods and achieves even better results than what is guaranteed by the theoretical analysis.

artificial intelligence, itemset, machine learning, (16 more...)

1301.1218

Country:

North America > United States (0.28)
Europe (0.28)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.89)

Paskov, Hristo S., West, Robert, Mitchell, John C., Hastie, Trevor

Compressive Feature Learning

This paper addresses the problem of unsupervised feature learning for text data. Our method is grounded in the principle of minimum description length and uses a dictionary-based compression scheme to extract a succinct feature set. Specifically, our method finds a set of word $k$-grams that minimizes the cost of reconstructing the text losslessly. We formulate document compression as a binary optimization task and show how to solve it approximately via a sequence of reweighted linear programs that are efficient to solve and parallelizable. As our method is unsupervised, features may be extracted once and subsequently used in a variety of tasks. We demonstrate the performance of these features over a range of scenarios including unsupervised exploratory analysis and supervised text categorization. Our compressed feature space is two orders of magnitude smaller than the full $k$-gram space and matches the text categorization accuracy achieved in the full feature space. This dimensionality reduction not only results in faster training times, but it can also help elucidate structure in unsupervised learning tasks and reduce the amount of training data necessary for supervised learning.

algorithm, artificial intelligence, machine learning, (18 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory > Minimum Complexity Machines (0.34)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Shalizi, Cosma, Kontorovich, Aryeh

Predictive PAC Learning and Process Decompositions

We informally call a stochastic process learnable if it admits a generalization error approaching zero in probability for any concept class with finite VC-dimension (IID processes are the simplest example). A mixture of learnable processes need not be learnable itself, and certainly its generalization error need not decay at the same rate. In this paper, we argue that it is natural in predictive PAC to condition not on the past observations but on the mixture component of the sample path. This definition not only matches what a realistic learner might demand, but also allows us to sidestep several otherwise grave problems in learning from dependent data. In particular, we give a novel PAC generalization bound for mixtures of learnable processes with a generalization error that is not worse than that of each mixture component. We also provide a characterization of mixtures of absolutely regular ($\beta$-mixing) processes, of independent interest.

artificial intelligence, machine learning, regular process, (17 more...)

Country: North America > United States (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (1.00)

Sabato, Sivan, Sarwate, Anand D., Srebro, Nati

Auditing: Active Learning with Outcome-Dependent Query Costs

We propose a learning setting in which unlabeled data is free, and the cost of a label depends on its value, which is not known in advance. We study binary classification in an extreme case, where the algorithm only pays for negative labels. Our motivation are applications such as fraud detection, in which investigating an honest transaction should be avoided if possible. We term the setting auditing, and consider the auditing complexity of an algorithm: The number of negative points it labels to learn a hypothesis with low relative error. We design auditing algorithms for thresholds on the line and axis-aligned rectangles, and show that with these algorithms, the auditing complexity can be significantly lower than the active label complexity. We discuss a general approach for auditing for a general hypothesis class, and describe several interesting directions for future work.

artificial intelligence, complexity, machine learning, (16 more...)

Country: North America > United States (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.46)

Daniely, Amit, Linial, Nati, Shalev-Shwartz, Shai

More data speeds up training time in learning halfspaces over sparse vectors

The increased availability of data in recent years led several authors to ask whether it is possible to use data as a {\em computational} resource. That is, if more data is available, beyond the sample complexity limit, is it possible to use the extra examples to speed up the computation time required to perform the learning task? We give the first positive answer to this question for a {\em natural supervised learning problem} --- we consider agnostic PAC learning of halfspaces over $3$-sparse vectors in $\{-1,1,0\}^n$. This class is inefficiently learnable using $O\left(n/\epsilon^2\right)$ examples. Our main contribution is a novel, non-cryptographic, methodology for establishing computational-statistical gaps, which allows us to show that, under a widely believed assumption that refuting random $\mathrm{3CNF}$ formulas is hard, efficiently learning this class using $O\left(n/\epsilon^2\right)$ examples is impossible. We further show that under stronger hardness assumptions, even $O\left(n^{1.499}/\epsilon^2\right)$ examples do not suffice. On the other hand, we show a new algorithm that learns this class efficiently using $\tilde{\Omega}\left(n^2/\epsilon^2\right)$ examples. This formally establishes the tradeoff between sample and computational complexity for a natural supervised learning problem.

algorithm, artificial intelligence, machine learning, (18 more...)

Country: Asia > Middle East > Israel (0.15)

Industry: Education > Focused Education > Special Education (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.67)

Shalizi, Cosma Rohilla, Kontorovich, Aryeh

Predictive PAC Learning and Process Decompositions

arXiv.org Machine LearningSep-19-2013

artificial intelligence, machine learning, regular process, (17 more...)

1309.4859

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
North America > United States > New York (0.05)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(3 more...)

Genre: Research Report (0.41)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (1.00)

arXiv.org Artificial IntelligenceAug-23-2013

The Fractal Dimension of SAT Formulas

Ansótegui, C., Bonet, M. L., Giráldez-Cru, J., Levy, J.

Modern SAT solvers have experienced a remarkable progress on solving industrial instances. Most of the techniques have been developed after an intensive experimental testing process. Recently, there have been some attempts to analyze the structure of these formulas in terms of complex networks, with the long-term aim of explaining the success of these SAT solving techniques, and possibly improving them. We study the fractal dimension of SAT formulas, and show that most industrial families of formulas are self-similar, with a small fractal dimension. We also show that this dimension is not affected by the addition of learnt clauses. We explore how the dimension of a formula, together with other graph properties can be used to characterize SAT instances. Finally, we give empirical evidence that these graph properties can be used in state-of-the-art portfolios.

artificial intelligence, formula, machine learning, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1007/978-3-319-08587-6_8

1308.5046

Country: Europe > Germany > Baden-Württemberg > Karlsruhe Region > Karlsruhe (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.67)

Meybodi, M. R. Mollakhalili, Meybodi, M. R.

Extended Distributed Learning Automata:A New Method for Solving Stochastic Graph Optimization Problems

arXiv.org Artificial IntelligenceAug-13-2013

In this paper, a new structure of cooperative learning automata so-called extended learning automata (eDLA) is introduced. Based on the proposed structure, a new iterative randomized heuristic algorithm for finding optimal sub-graph in a stochastic edge-weighted graph through sampling is proposed. It has been shown that the proposed algorithm based on new networked-structure can be to solve the optimization problems on stochastic graph through less number of sampling in compare to standard sampling. Stochastic graphs are graphs in which the edges have an unknown distribution probability weights. Proposed algorithm uses an eDLA to find a policy that leads to an induced sub-graph that satisfies some restrictions such as minimum or maximum weight (length). At each stage of the proposed algorithm, eDLA determines which edges to be sampled. This eDLA-based proposed sampling method may result in decreasing unnecessary samples and hence decreasing the time that algorithm requires for finding the optimal sub-graph. It has been shown that proposed method converge to optimal solution, furthermore the probability of this convergence can be made arbitrarily close to 1 by using a sufficiently small learning rate. A new variance-aware threshold value was proposed that can be improving significantly convergence rate of the proposed eDLA-based algorithm. It has been shown that the proposed algorithm is competitive in terms of the quality of the solution

algorithm, artificial intelligence, machine learning, (16 more...)

arXiv.org Artificial Intelligence

1308.2772

Country:

Asia > Middle East > Iran > Tehran Province > Tehran (0.04)
North America > United States > New York (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory > Learning Automata (0.86)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.67)