AITopics | hme

A key design constraint when implementing Monte Carlo and variational inference algorithms is that it must be possible to cheaply and exactly evaluate the marginal densities of proposal distributions and variational families. This takes many interesting proposals off the table, such as those based on involved simulations or stochastic optimization. This paper broadens the design space, by presenting a framework for applying Monte Carlo and variational inference algorithms when proposal densities cannot be exactly evaluated. Our framework, recursive auxiliary-variable inference (RAVI), instead approximates the necessary densities using meta-inference: an additional layer of Monte Carlo or variational inference, that targets the proposal, rather than the model. RAVI generalizes and unifies several existing methods for inference with expressive approximating families, which we show correspond to specific choices of meta-inference algorithm, and provides new theory for analyzing their bias and variance. We illustrate RAVI's design framework and theorems by using them to analyze and improve upon Salimans et al.'s Markov Chain Variational Inference, and to design a novel sampler for Dirichlet process mixtures, achieving state-of-the-art results on a standard benchmark dataset from astronomy and on a challenging datacleaning task with Medicare hospital data.

artificial intelligence, inference strategy, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2203.02836

Country:

Oceania > Australia (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Genre: Research Report (0.64)

Industry: Health & Medicine > Health Care Providers & Services (0.54)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Help Me Explore: Minimal Social Interventions for Graph-Based Autotelic Agents

Akakzia, Ahmed, Serris, Olivier, Sigaud, Olivier, Colas, Cédric

arXiv.org Artificial IntelligenceFeb-10-2022

In the quest for autonomous agents learning open-ended repertoires of skills, most works take a Piagetian perspective: learning trajectories are the results of interactions between developmental agents and their physical environment. The Vygotskian perspective, on the other hand, emphasizes the centrality of the socio-cultural environment: higher cognitive functions emerge from transmissions of socio-cultural processes internalized by the agent. This paper argues that both perspectives could be coupled within the learning of autotelic agents to foster their skill acquisition. To this end, we make two contributions: 1) a novel social interaction protocol called Help Me Explore (HME), where autotelic agents can benefit from both individual and socially guided exploration. In social episodes, a social partner suggests goals at the frontier of the learning agent knowledge. In autotelic episodes, agents can either learn to master their own discovered goals or autonomously rehearse failed social goals; 2) GANGSTR, a graph-based autotelic agent for manipulation domains capable of decomposing goals into sequences of intermediate sub-goals. We show that when learning within HME, GANGSTR overcomes its individual learning limits by mastering the most complex configurations (e.g. stacks of 5 blocks) with only few social interventions.

agent, configuration, sp-0, (9 more...)

arXiv.org Artificial Intelligence

2202.05129

Country:

Europe > Netherlands > North Brabant > Eindhoven (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > United States > New York (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)

Genre: Research Report > New Finding (0.46)

Industry:

Education (1.00)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.34)
Health & Medicine > Therapeutic Area > Neurology (0.34)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

Add feedback

Hierarchical Distributed Representations for Statistical Language Modeling

Blitzer, John, Pereira, Fernando, Weinberger, Kilian Q., Saul, Lawrence K.

Neural Information Processing SystemsDec-31-2005

Statistical language models estimate the probability of a word occurring in a given context. The most common language models rely on a discrete enumeration of predictive contexts (e.g., n-grams) and consequently fail to capture and exploit statistical regularities across these contexts. In this paper, we show how to learn hierarchical, distributed representations of word contexts that maximize the predictive value of a statistical language model. The representations are initialized by unsupervised algorithms for linear and nonlinear dimensionality reduction [14], then fed as input into a hierarchical mixture of experts, where each expert is a multinomial distribution over predicted words [12]. While the distributed representations in our model are inspired by the neural probabilistic language model of Bengio et al. [2, 3], our particular architecture enables us to work with significantly larger vocabularies and training corpora. For example, on a large-scale bigram modeling task involving a sixty thousand word vocabulary and a training corpus of three million sentences, we demonstrate consistent improvement over class-based bigram models [10, 13]. We also discuss extensions of our approach to longer multiword contexts.

dimensionality reduction, probability, representation, (14 more...)

Neural Information Processing Systems

Country:

North America > Canada > Alberta > Census Division No. 15 > Improvement District No. 9 > Banff (0.04)
North America > United States > Rhode Island > Providence County > Providence (0.04)
North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Hierarchical Distributed Representations for Statistical Language Modeling

Blitzer, John, Pereira, Fernando, Weinberger, Kilian Q., Saul, Lawrence K.

Neural Information Processing SystemsDec-31-2005

Statistical language models estimate the probability of a word occurring in a given context. The most common language models rely on a discrete enumeration of predictive contexts (e.g., n-grams) and consequently fail to capture and exploit statistical regularities across these contexts. In this paper, we show how to learn hierarchical, distributed representations of word contexts that maximize the predictive value of a statistical language model. The representations are initialized by unsupervised algorithms for linear and nonlinear dimensionality reduction [14], then fed as input into a hierarchical mixture of experts, where each expert is a multinomial distribution over predicted words [12]. While the distributed representations in our model are inspired by the neural probabilistic language model of Bengio et al. [2, 3], our particular architecture enables us to work with significantly larger vocabularies and training corpora. For example, on a large-scale bigram modeling task involving a sixty thousand word vocabulary and a training corpus of three million sentences, we demonstrate consistent improvement over class-based bigram models [10, 13]. We also discuss extensions of our approach to longer multiword contexts.

dimensionality reduction, probability, representation, (14 more...)

Neural Information Processing Systems

Country:

North America > Canada > Alberta > Census Division No. 15 > Improvement District No. 9 > Banff (0.04)
North America > United States > Rhode Island > Providence County > Providence (0.04)
North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Hierarchical Distributed Representations for Statistical Language Modeling

Blitzer, John, Pereira, Fernando, Weinberger, Kilian Q., Saul, Lawrence K.

Neural Information Processing SystemsDec-31-2005

Statistical language models estimate the probability of a word occurring in a given context. The most common language models rely on a discrete enumeration of predictive contexts (e.g., n-grams) and consequently fail to capture and exploit statistical regularities across these contexts. In this paper, we show how to learn hierarchical, distributed representations of word contexts that maximize the predictive value of a statistical language model. The representations are initialized by unsupervised algorithms for linear and nonlinear dimensionality reduction [14], then fed as input into a hierarchical mixture of experts, where each expert is a multinomial distribution overpredicted words [12]. While the distributed representations in our model are inspired by the neural probabilistic language model of Bengio et al. [2, 3], our particular architecture enables us to work with significantly larger vocabularies and training corpora. For example, on a large-scale bigram modeling task involving a sixty thousand word vocabulary anda training corpus of three million sentences, we demonstrate consistent improvement over class-based bigram models [10, 13]. We also discuss extensions of our approach to longer multiword contexts.

artificial intelligence, machine learning, natural language, (16 more...)

Neural Information Processing Systems

Country: North America > United States (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Interpolating Earth-science Data using RBF Networks and Mixtures of Experts

Wan, Ernest, Bone, Don

Neural Information Processing SystemsDec-31-1997

We present a mixture of experts (ME) approach to interpolate sparse, spatially correlated earth-science data. Kriging is an interpolation method which uses a global covariation model estimated from the data to take account of the spatial dependence in the data. Based on the close relationship between kriging and the radial basis function (RBF) network (Wan & Bone, 1996), we use a mixture of generalized RBF networks to partition the input space into statistically correlated regions and learn the local covariation model of the data in each region. Applying the ME approach to simulated and real-world data, we show that it is able to achieve good partitioning of the input space, learn the local covariation models and improve generalization.

covariation model, grbf network, local covariation model, (13 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Jordan (0.05)
Oceania > Australia > Australian Capital Territory > Canberra (0.05)
North America > United States > New York (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.50)

Add feedback

Adaptively Growing Hierarchical Mixtures of Experts

Fritsch, Jürgen, Finke, Michael, Waibel, Alex

Neural Information Processing SystemsDec-31-1997

We propose a novel approach to automatically growing and pruning Hierarchical Mixtures of Experts. The constructive algorithm proposed here enables large hierarchies consisting of several hundred experts to be trained effectively. We show that HME's trained by our automatic growing procedure yield better generalization performance than traditional static and balanced hierarchies. Evaluation of the algorithm is performed (1) on vowel classification and (2) within a hybrid version of the JANUS r9] speech recognition system using a subset of the Switchboard large-vocabulary speaker-independent continuous speech recognition database.

hierarchical mixture, hme, probability, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
Asia > Middle East > Jordan (0.05)
North America > United States > New York (0.04)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.95)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.90)

Add feedback

Interpolating Earth-science Data using RBF Networks and Mixtures of Experts

Wan, Ernest, Bone, Don

Neural Information Processing SystemsDec-31-1997

We present a mixture of experts (ME) approach to interpolate sparse, spatially correlated earth-science data. Kriging is an interpolation method which uses a global covariation model estimated from the data to take account of the spatial dependence in the data. Based on the close relationship between kriging and the radial basis function (RBF) network (Wan & Bone, 1996), we use a mixture of generalized RBF networks to partition the input space into statistically correlated regions and learn the local covariation model of the data in each region. Applying the ME approach to simulated and real-world data, we show that it is able to achieve good partitioning of the input space, learn the local covariation models and improve generalization.

covariation model, grbf network, local covariation model, (13 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Jordan (0.05)
Oceania > Australia > Australian Capital Territory > Canberra (0.05)
North America > United States > New York (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.50)

Add feedback

Adaptively Growing Hierarchical Mixtures of Experts

Fritsch, Jürgen, Finke, Michael, Waibel, Alex

Neural Information Processing SystemsDec-31-1997

We propose a novel approach to automatically growing and pruning Hierarchical Mixtures of Experts. The constructive algorithm proposed here enables large hierarchies consisting of several hundred experts to be trained effectively. We show that HME's trained by our automatic growing procedure yield better generalization performance than traditional static and balanced hierarchies. Evaluation of the algorithm is performed (1) on vowel classification and (2) within a hybrid version of the JANUS r9] speech recognition system using a subset of the Switchboard large-vocabulary speaker-independent continuous speech recognition database.

hierarchical mixture, hme, probability, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
Asia > Middle East > Jordan (0.05)
North America > United States > New York (0.04)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.95)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.90)

Add feedback

Adaptively Growing Hierarchical Mixtures of Experts

Fritsch, Jürgen, Finke, Michael, Waibel, Alex

Neural Information Processing SystemsDec-31-1997

We propose a novel approach to automatically growing and pruning Hierarchical Mixtures of Experts. The constructive algorithm proposed hereenables large hierarchies consisting of several hundred experts to be trained effectively. We show that HME's trained by our automatic growing procedure yield better generalization performance thantraditional static and balanced hierarchies. Evaluation of the algorithm is performed (1) on vowel classification and (2) within a hybrid version of the JANUS r9] speech recognition systemusing a subset of the Switchboard large-vocabulary speaker-independent continuous speech recognition database.

artificial intelligence, hme, machine learning, (19 more...)

Neural Information Processing Systems

Country: North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)

Technology: