AITopics | Cohn, David

Provable Approximations for Constrained $\ell_p$ Regression

Jubran, Ibrahim, Cohn, David, Feldman, Dan

arXiv.org Machine LearningFeb-27-2019

The $\ell_p$ linear regression problem is to minimize $f(x)=||Ax-b||_p$ over $x\in\mathbb{R}^d$, where $A\in\mathbb{R}^{n\times d}$, $b\in \mathbb{R}^n$, and $p>0$. To avoid overfitting and bound $||x||_2$, the constrained $\ell_p$ regression minimizes $f(x)$ over every unit vector $x\in\mathbb{R}^d$. This makes the problem non-convex even for the simplest case $d=p=2$. Instead, ridge regression is used to minimize the Lagrange form $f(x)+\lambda ||x||_2$ over $x\in\mathbb{R}^d$, which yields a convex problem in the price of calibrating the regularization parameter $\lambda>0$. We provide the first provable constant factor approximation algorithm that solves the constrained $\ell_p$ regression directly, for every constant $p,d\geq 1$. Using core-sets, its running time is $O(n \log n)$ including extensions for streaming and distributed (big) data. In polynomial time, it can handle outliers, $p\in (0,1)$ and minimize $f(x)$ over every $x$ and permutation of rows in $A$. Experimental results are also provided, including open source and comparison to existing software.

algorithm, artificial intelligence, machine learning, (18 more...)

arXiv.org Machine Learning

1902.10407

Country:

North America > United States (0.28)
Asia > Middle East > Israel (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.87)

Add feedback

Recursive Attribute Factoring

Cohn, David, Verma, Deepak, Pfleger, Karl

Neural Information Processing SystemsDec-31-2007

Clustering, or factoring of a document collection attempts to "explain" each observed documentin terms of one or a small number of inferred prototypes. Prior work demonstrated that when links exist between documents in the corpus (as is the case with a collection of web pages or scientific papers), building a joint model of document contents and connections produces a better model than that built from contents or connections alone. Many problems arise when trying to apply these joint models to corpus at the scale of the World Wide Web, however; one of these is that the sheer overhead of representing a feature space on the order of billions of dimensions becomes impractical. Weaddress this problem with a simple representational shift inspired by probabilistic relationalmodels: instead of representing document linkage in terms of the identities of linking documents, we represent it by the explicit and inferred attributes ofthe linking documents. Several surprising results come with this shift: in addition to being computationally more tractable, the new model produces factors thatmore cleanly decompose the document collection. We discuss several variations on this model and show how some can be seen as exact generalizations of the PageRank algorithm.

artificial intelligence, attribute factoring, information management, (18 more...)

Neural Information Processing Systems

Country:

North America > United States > Washington > King County > Seattle (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Informed Projections

Cohn, David

Neural Information Processing SystemsDec-31-2003

Low rank approximation techniques are widespread in pattern recognition research -- they include Latent Semantic Analysis (LSA), Probabilistic LSA, Principal Components Analysus (PCA), the Generative Aspect Model, and many forms of bibliometric analysis. All make use of a low-dimensional manifold onto which data are projected. Such techniques are generally "unsupervised," which allows them to model data in the absence of labels or categories. With many practical problems, however, some prior knowledge is available in the form of context. In this paper, I describe a principled approach to incorporating such information, and demonstrate its application to PCA-based approximations of several data sets. 1 Introduction Many practical problems involve modeling large, high-dimensional data sets to uncover similarities or latent structure.

artificial intelligence, natural language, projection, (20 more...)

Neural Information Processing Systems

Country: Europe (0.14)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

How to Dynamically Merge Markov Decision Processes

Singh, Satinder P., Cohn, David

Neural Information Processing SystemsDec-31-1998

We are frequently called upon to perform multiple tasks that compete forour attention and resource. Often we know the optimal solution to each task in isolation; in this paper, we describe how this knowledge can be exploited to efficiently find good solutions for doing the tasks in parallel. We formulate this problem as that of dynamically merging multiple Markov decision processes (MDPs) into a composite MDP, and present a new theoretically-sound dynamic programmingalgorithm for finding an optimal policy for the composite MDP. We analyze various aspects of our algorithm and illustrate its use on a simple merging problem. Every day, we are faced with the problem of doing mUltiple tasks in parallel, each of which competes for our attention and resource. If we are running a job shop, we must decide which machines to allocate to which jobs, and in what order, so that no jobs miss their deadlines. If we are a mail delivery robot, we must find the intended recipients of the mail while simultaneously avoiding fixed obstacles (such as walls) and mobile obstacles (such as people), and still manage to keep ourselves sufficiently charged up. Frequently we know how to perform each task in isolation; this paper considers how we can take the information we have about the individual tasks and combine it to efficiently find an optimal solution for doing the entire set of tasks in parallel. More importantly, we describe a theoretically-sound algorithm for doing this merging dynamically; new tasks (such as a new job arrival at a job shop) can be assimilated online into the solution being found for the ongoing set of simultaneous tasks.

artificial intelligence, mdp, optimization problem, (18 more...)

Neural Information Processing Systems

Country: North America > United States > Colorado > Boulder County > Boulder (0.14)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.71)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.57)

Add feedback

How to Dynamically Merge Markov Decision Processes

Singh, Satinder P., Cohn, David

Neural Information Processing SystemsDec-31-1998

We are frequently called upon to perform multiple tasks that compete for our attention and resource. Often we know the optimal solution to each task in isolation; in this paper, we describe how this knowledge can be exploited to efficiently find good solutions for doing the tasks in parallel. We formulate this problem as that of dynamically merging multiple Markov decision processes (MDPs) into a composite MDP, and present a new theoretically-sound dynamic programming algorithm for finding an optimal policy for the composite MDP. We analyze various aspects of our algorithm and illustrate its use on a simple merging problem. Every day, we are faced with the problem of doing mUltiple tasks in parallel, each of which competes for our attention and resource. If we are running a job shop, we must decide which machines to allocate to which jobs, and in what order, so that no jobs miss their deadlines. If we are a mail delivery robot, we must find the intended recipients of the mail while simultaneously avoiding fixed obstacles (such as walls) and mobile obstacles (such as people), and still manage to keep ourselves sufficiently charged up. Frequently we know how to perform each task in isolation; this paper considers how we can take the information we have about the individual tasks and combine it to efficiently find an optimal solution for doing the entire set of tasks in parallel. More importantly, we describe a theoretically-sound algorithm for doing this merging dynamically; new tasks (such as a new job arrival at a job shop) can be assimilated online into the solution being found for the ongoing set of simultaneous tasks.

artificial intelligence, mdp, optimization problem, (18 more...)

Neural Information Processing Systems

Country:

North America > United States > Colorado (0.14)
North America > United States > California (0.14)
Europe > United Kingdom > England (0.14)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.77)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.72)

Add feedback

The 1995 Fall Symposia Series

Cohn, David, Lewis, David, Aha, David W., Burke, Robin, Srihari, Rohini K., Horswill, Ian, Buvac, Sasa, Siegel, Eric V., Fehling, Michael

AI MagazineMar-15-1996

The Association for the Advancement of Artificial Intelligence (AAAI) held its 1995 Fall Symposia Series on 10 to 12 November in Cambridge, Massachusetts. This article contains summaries of the eight symposia that were conducted: (1) Active Learning; (2) Adaptation of Knowledge for Reuse; (3) AI Applications in Knowledge Navigation and Retrieval; (4) Computational Models for Integrating Language and Vision; (5) Embodied Language and Action Symposium; (6) Formalizing Context; (7) Genetic Programming; and (8) Rational Agency: Concepts, Theories, Models, and Applications.

Fall Symposia series, machine learning, management and information, (3 more...)

AI Magazine

Industry: Information Technology (0.78)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.78)

Add feedback

The 1995 Fall Symposia Series

Cohn, David, Lewis, David, Aha, David W., Burke, Robin, Srihari, Rohini K., Horswill, Ian, Buvac, Sasa, Siegel, Eric V., Fehling, Michael

AI MagazineMar-15-1996

The Association for the Advancement of Artificial Intelligence (AAAI) held its 1995 Fall Symposia Series on 10 to 12 November in Cambridge, Massachusetts. This article contains summaries of the eight symposia that were conducted: (1) Active Learning; (2) Adaptation of Knowledge for Reuse; (3) AI Applications in Knowledge Navigation and Retrieval; (4) Computational Models for Integrating Language and Vision; (5) Embodied Language and Action Symposium; (6) Formalizing Context; (7) Genetic Programming; and (8) Rational Agency: Concepts, Theories, Models, and Applications.

artificial intelligence, natural language, rationality, (16 more...)

AI Magazine

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.25)

Industry: Information Technology (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Robots (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (0.53)

Add feedback

Can neural networks do better than the Vapnik-Chervonenkis bounds?

Cohn, David, Tesauro, Gerald

Neural Information Processing SystemsDec-31-1991

These experiments are designed to test whether average generalization performance can surpass the worst-case bounds obtained from formal learning theory using the Vapnik-Chervonenkis dimension (Blumer et al., 1989). We indeed find that, in some cases, the average generalization is significantly better than the VC bound: the approach to perfect performance is exponential in the number of examples m, rather than the 11m result of the bound. In other cases, we do find the 11m behavior of the VC bound, and in these cases, the numerical prefactor is closely related to prefactor contained in the bound.

artificial data, artificial intelligence, neural network, (20 more...)

Neural Information Processing Systems

Country:

North America > United States > Washington > King County > Seattle (0.14)
North America > United States > California (0.14)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.85)

Add feedback

Can neural networks do better than the Vapnik-Chervonenkis bounds?

Cohn, David, Tesauro, Gerald

Neural Information Processing SystemsDec-31-1991

These experiments are designed to test whether average generalization performance can surpass the worst-case bounds obtained from formal learning theory using the Vapnik-Chervonenkis dimension (Blumer et al., 1989). We indeed find that, in some cases, the average generalization is significantly better than the VC bound: the approach to perfect performance is exponential in the number of examples m, rather than the 11m result of the bound. In other cases, we do find the 11m behavior of the VC bound, and in these cases, the numerical prefactor is closely related to prefactor contained in the bound.

artificial intelligence, generalization, neural network, (19 more...)

Neural Information Processing Systems

Country: North America > United States > California (0.14)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.85)

Add feedback

Can neural networks do better than the Vapnik-Chervonenkis bounds?

Cohn, David, Tesauro, Gerald

Neural Information Processing SystemsDec-31-1991

These experiments are designed to test whether average generalization performance can surpass the worst-case bounds obtained from formal learning theory using the Vapnik-Chervonenkis dimension (Blumer et al., 1989). We indeed find that, in some cases, the average generalization is significantly better than the VC bound: the approach to perfect performance is exponential in the number of examples m, rather than the 11m result of the bound. In other cases, we do find the 11m behavior of the VC bound, and in these cases, the numerical prefactor is closely related to prefactor contained in the bound.

artificial intelligence, generalization, neural network, (19 more...)

Neural Information Processing Systems

Country: North America > United States > California (0.14)

Technology: