AITopics | Caruana, Rich

Using multiple samples to learn mixture models

Lee, Jason D., Gilad-Bachrach, Ran, Caruana, Rich

Neural Information Processing SystemsDec-31-2013

In the mixture models problem it is assumed that there are $K$ distributions $\theta_{1},\ldots,\theta_{K}$ and one gets to observe a sample from a mixture of these distributions with unknown coefficients. The goal is to associate instances with their generating distributions, or to identify the parameters of the hidden distributions. In this work we make the assumption that we have access to several samples drawn from the same $K$ underlying distributions, but with different mixing weights. As with topic modeling, having multiple samples is often a reasonable assumption. Instead of pooling the data into one sample, we prove that it is possible to use the differences between the samples to better recover the underlying structure. We present algorithms that recover the underlying structure under milder assumptions than the current state of art when either the dimensionality or the separation is high. The methods, when applied to topic modeling, allow generalization to words not present in the training data.

algorithm, artificial intelligence, machine learning, (18 more...)

Neural Information Processing Systems

Country:

North America > United States (0.14)
Asia (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Using Multiple Samples to Learn Mixture Models

Lee, Jason D, Gilad-Bachrach, Ran, Caruana, Rich

arXiv.org Machine LearningNov-27-2013

In the mixture models problem it is assumed that there are $K$ distributions $\theta_{1},\ldots,\theta_{K}$ and one gets to observe a sample from a mixture of these distributions with unknown coefficients. The goal is to associate instances with their generating distributions, or to identify the parameters of the hidden distributions. In this work we make the assumption that we have access to several samples drawn from the same $K$ underlying distributions, but with different mixing weights. As with topic modeling, having multiple samples is often a reasonable assumption. Instead of pooling the data into one sample, we prove that it is possible to use the differences between the samples to better recover the underlying structure. We present algorithms that recover the underlying structure under milder assumptions than the current state of art when either the dimensionality or the separation is high. The methods, when applied to topic modeling, allow generalization to words not present in the training data.

algorithm, artificial intelligence, machine learning, (18 more...)

arXiv.org Machine Learning

1311.7184

Country: North America > United States (0.14)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

(Not) Bounding the True Error

Langford, John, Caruana, Rich

Neural Information Processing SystemsDec-31-2002

We present a new approach to bounding the true error rate of a continuous valued classifier based upon PAC-Bayes bounds. The method first constructs a distribution over classifiers by determining how sensitive each parameter in the model is to noise. The true error rate of the stochastic classifier found with the sensitivity analysis can then be tightly bounded using a PAC-Bayes bound.

artificial intelligence, machine learning, neural network, (15 more...)

Neural Information Processing Systems

Country: North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

(Not) Bounding the True Error

Langford, John, Caruana, Rich

Neural Information Processing SystemsDec-31-2002

We present a new approach to bounding the true error rate of a continuous valued classifier based upon PAC-Bayes bounds. The method first constructs adistribution over classifiers by determining how sensitive each parameter in the model is to noise. The true error rate of the stochastic classifier found with the sensitivity analysis can then be tightly bounded using a PAC-Bayes bound.

artificial intelligence, machine learning, neural network, (15 more...)

Neural Information Processing Systems

Country: North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Overfitting in Neural Nets: Backpropagation, Conjugate Gradient, and Early Stopping

Caruana, Rich, Lawrence, Steve, Giles, C. Lee

Neural Information Processing SystemsDec-31-2001

The conventional wisdom is that backprop nets with excess hidden units generalize poorly. We show that nets with excess capacity generalize well when trained with backprop and early stopping. Experiments suggest two reasons for this: 1) Overfitting can vary significantly in different regions of the model. Excess capacity allows better fit to regions of high non-linearity, and backprop often avoids overfitting the regions of low non-linearity.

artificial intelligence, generalization, neural network, (14 more...)

Neural Information Processing Systems

Country: North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)

Genre: Research Report > New Finding (0.47)

Industry: Energy > Oil & Gas (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Backpropagation (0.41)

Add feedback

Overfitting in Neural Nets: Backpropagation, Conjugate Gradient, and Early Stopping

Caruana, Rich, Lawrence, Steve, Giles, C. Lee

Neural Information Processing SystemsDec-31-2001

The conventional wisdom is that backprop nets with excess hidden units generalize poorly. We show that nets with excess capacity generalize well when trained with backprop and early stopping. Experiments suggest tworeasons for this: 1) Overfitting can vary significantly in different regions of the model. Excess capacity allows better fit to regions of high non-linearity, and backprop often avoids overfitting the regions of low non-linearity.

artificial intelligence, generalization, neural network, (14 more...)

Neural Information Processing Systems

Country: North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)

Genre: Research Report > New Finding (0.47)

Industry: Energy > Oil & Gas (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Backpropagation (0.41)

Add feedback

Overfitting in Neural Nets: Backpropagation, Conjugate Gradient, and Early Stopping

Caruana, Rich, Lawrence, Steve, Giles, C. Lee

Neural Information Processing SystemsDec-31-2001

The conventional wisdom is that backprop nets with excess hidden units generalize poorly. We show that nets with excess capacity generalize well when trained with backprop and early stopping. Experiments suggest two reasons for this: 1) Overfitting can vary significantly in different regions of the model. Excess capacity allows better fit to regions of high non-linearity, and backprop often avoids overfitting the regions of low non-linearity.

artificial intelligence, generalization, neural network, (14 more...)

Neural Information Processing Systems

Country: North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)

Genre: Research Report > New Finding (0.47)

Industry: Energy > Oil & Gas (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Backpropagation (0.41)

Add feedback

Promoting Poor Features to Supervisors: Some Inputs Work Better as Outputs

Caruana, Rich, Sa, Virginia R. de

Neural Information Processing SystemsDec-31-1997

In supervised learning there is usually a clear distinction between inputs and outputs - inputs are what you will measure, outputs are what you will predict from those measurements. This paper shows that the distinction between inputs and outputs is not this Some features are more useful as extra outputs than assimple. By using a feature as an output we get more than just the case values but can. For many features this mapping may be more useful than the feature value itself. We present two regression problems and one classification problem where performance improves if features that could have been used as inputs are used as extra outputs instead.

health & medicine, neural network, std, (17 more...)

Neural Information Processing Systems

Country: North America > United States > California > San Francisco County > San Francisco (0.28)

Industry: Health & Medicine > Therapeutic Area (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.97)

Add feedback

Promoting Poor Features to Supervisors: Some Inputs Work Better as Outputs

Caruana, Rich, Sa, Virginia R. de

Neural Information Processing SystemsDec-31-1997

In supervised learning there is usually a clear distinction between inputs and outputs - inputs are what you will measure, outputs are what you will predict from those measurements. This paper shows that the distinction between inputs and outputs is not this simple. Some features are more useful as extra outputs than as inputs. By using a feature as an output we get more than just the case values but can. For many features this mapping may be more useful than the feature value itself.

health & medicine, neural network, std, (18 more...)

Neural Information Processing Systems

Country: North America > United States > California > San Francisco County > San Francisco (0.28)

Industry: Health & Medicine > Therapeutic Area (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.98)

Add feedback

Using the Future to "Sort Out" the Present: Rankprop and Multitask Learning for Medical Risk Evaluation

Caruana, Rich, Baluja, Shumeet, Mitchell, Tom

Neural Information Processing SystemsDec-31-1996

This paper presents two methods that can improve generalization on a broad class of problems. This class includes identifying low risk pneumonia patients. The first method, rankprop, tries to learn simple models that support ranking future cases while simultaneously learning to rank the training set. The second, multitask learning, uses lab tests available only during training, as additional target values to bias learning towards a more predictive hidden layer. Experiments using a database of pneumonia patients indicate that together these methods outperform standard backpropagation by 10-50%. Rankprop and MTL are applicable to a large class of problems in which the goal is to learn a relative ranking over the instance space, and where the training data includes features that will not be available at run time. Such problems include identifying higher-risk medical patients as early as possible, identifying lower-risk financial investments, and visual analysis of scenes that become easier to analyze as they are approached in the future. Acknowledgements We thank Greg Cooper, Michael Fine, and other members of the Pitt/CMU Cost-Effective Health Care group for help with the Medis Database. This work was supported by ARPA grant F33615-93-1-1330, NSF grant BES-9315428, Agency for Health Care Policy and Research grant HS06468, and an NSF Graduate Student Fellowship (Baluja).

health & medicine, neural network, training set, (16 more...)

Neural Information Processing Systems

Country: North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)

Industry: Health & Medicine (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Filters

Collaborating Authors

Caruana, Rich

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Using multiple samples to learn mixture models

Using Multiple Samples to Learn Mixture Models

(Not) Bounding the True Error

(Not) Bounding the True Error

Overfitting in Neural Nets: Backpropagation, Conjugate Gradient, and Early Stopping

Overfitting in Neural Nets: Backpropagation, Conjugate Gradient, and Early Stopping

Overfitting in Neural Nets: Backpropagation, Conjugate Gradient, and Early Stopping

Promoting Poor Features to Supervisors: Some Inputs Work Better as Outputs

Promoting Poor Features to Supervisors: Some Inputs Work Better as Outputs

Using the Future to "Sort Out" the Present: Rankprop and Multitask Learning for Medical Risk Evaluation