AITopics | Le, Quoc

Collaborating Authors

Le, Quoc

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Go Wide, Then Narrow: Efficient Training of Deep Thin Networks

Zhou, Denny, Ye, Mao, Chen, Chen, Meng, Tianjian, Tan, Mingxing, Song, Xiaodan, Le, Quoc, Liu, Qiang, Schuurmans, Dale

arXiv.org Machine LearningAug-17-2020

For deploying a deep learning model into production, it needs to be both accurate and compact to meet the latency and memory constraints. This usually results in a network that is deep (to ensure performance) and yet thin (to improve computational efficiency). In this paper, we propose an efficient method to train a deep thin network with a theoretic guarantee. Our method is motivated by model compression. It consists of three stages. First, we sufficiently widen the deep thin network and train it until convergence. Then, we use this well-trained deep wide network to warm up (or initialize) the original deep thin network. This is achieved by layerwise imitation, that is, forcing the thin network to mimic the intermediate outputs of the wide network from layer to layer. Finally, we further fine tune this already well-initialized deep thin network. The theoretical guarantee is established by using the neural mean field analysis. It demonstrates the advantage of our layerwise imitation approach over backpropagation. We also conduct large-scale empirical experiments to validate the proposed method. By training with our method, ResNet50 can outperform ResNet101, and BERT Base can be comparable with BERT Large, when ResNet101 and BERT Large are trained under the standard training procedures as in the literature.

deep learning, neural network, thin network, (16 more...)

arXiv.org Machine Learning

2007.00811

Country: North America > United States > Texas (0.14)

Genre: Research Report (0.50)

Industry: Education (0.89)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Can weight sharing outperform random architecture search? An investigation with TuNAS

Bender, Gabriel, Liu, Hanxiao, Chen, Bo, Chu, Grace, Cheng, Shuyang, Kindermans, Pieter-Jan, Le, Quoc

arXiv.org Machine LearningAug-13-2020

Efficient Neural Architecture Search methods based on weight sharing have shown good promise in democratizing Neural Architecture Search for computer vision models. There is, however, an ongoing debate whether these efficient methods are significantly better than random search. Here we perform a thorough comparison between efficient and random search methods on a family of progressively larger and more challenging search spaces for image classification and detection on ImageNet and COCO. While the efficacies of both methods are problem-dependent, our experiments demonstrate that there are large, realistic tasks where efficient search methods can provide substantial gains over random search. In addition, we propose and evaluate techniques which improve the quality of searched architectures and reduce the need for manual hyper-parameter tuning. Source code and experiment data are available at https://github.com/google-research/google-research/tree/master/tunas

architecture, artificial intelligence, neural network, (19 more...)

arXiv.org Machine Learning

2008.0612

Country: Asia > Middle East > Israel (0.14)

Genre: Research Report > Experimental Study (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Backprop Evolution

Alber, Maximilian, Bello, Irwan, Zoph, Barret, Kindermans, Pieter-Jan, Ramachandran, Prajit, Le, Quoc

arXiv.org Machine LearningAug-8-2018

The back-propagation algorithm is the cornerstone of deep learning. Despite its importance, few variations of the algorithm have been attempted. This work presents an approach to discover new variations of the back-propagation equation. We use a domain specific lan- guage to describe update equations as a list of primitive functions. An evolution-based method is used to discover new propagation rules that maximize the generalization per- formance after a few epochs of training. We find several update equations that can train faster with short training times than standard back-propagation, and perform similar as standard back-propagation at convergence.

deep learning, equation, neural network, (19 more...)

arXiv.org Machine Learning

1808.02822

Country: Europe > Sweden (0.14)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Memory Augmented Policy Optimization for Program Synthesis with Generalization

Liang, Chen, Norouzi, Mohammad, Berant, Jonathan, Le, Quoc, Lao, Ni

arXiv.org Machine LearningJul-8-2018

This paper presents Memory Augmented Policy Optimization (MAPO): a novel policy optimization formulation that incorporates a memory buffer of promising trajectories to reduce the variance of policy gradient estimates for deterministic environments with discrete actions. The formulation expresses the expected return objective as a weighted sum of two terms: an expectation over a memory of trajectories with high rewards, and a separate expectation over the trajectories outside the memory. We propose 3 techniques to make an efficient training algorithm for MAPO: (1) distributed sampling from inside and outside memory with an actor-learner architecture; (2) a marginal likelihood constraint over the memory to accelerate training; (3) systematic exploration to discover high reward trajectories. MAPO improves the sample efficiency and robustness of policy gradient, especially on tasks with a sparse reward. We evaluate MAPO on weakly supervised program synthesis from natural language with an emphasis on generalization. On the WikiTableQuestions benchmark we improve the state-of-the-art by 2.5%, achieving an accuracy of 46.2%, and on the WikiSQL benchmark, MAPO achieves an accuracy of 74.9% with only weak supervision, outperforming several strong baselines with full supervision. Our code is open sourced at https://github.com/crazydonkey200/neural-symbolic-machines

logic programming, neural network, trajectory, (23 more...)

arXiv.org Machine Learning

1807.02322

Country: Asia > Middle East (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (0.87)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Neural Symbolic Machines: Learning Semantic Parsers on Freebase with Weak Supervision

Liang, Chen, Berant, Jonathan, Le, Quoc, Forbus, Kenneth D., Lao, Ni

arXiv.org Artificial IntelligenceApr-23-2017

Harnessing the statistical power of neural networks to perform language understanding and symbolic reasoning is difficult, when it requires executing efficient discrete operations against a large knowledge-base. In this work, we introduce a Neural Symbolic Machine, which contains (a) a neural "programmer", i.e., a sequence-to-sequence model that maps language utterances to programs and utilizes a key-variable memory to handle compositionality (b) a symbolic "computer", i.e., a Lisp interpreter that performs program execution, and helps find good programs by pruning the search space. We apply REINFORCE to directly optimize the task reward of this structured prediction problem. To train with weak supervision and improve the stability of REINFORCE, we augment it with an iterative maximum-likelihood training process. NSM outperforms the state-of-the-art on the WebQuestionsSP dataset when trained from question-answer pairs only, without requiring any feature engineering or domain-specific knowledge.

computer, deep learning, neural network, (19 more...)

arXiv.org Artificial Intelligence

1611.0002

Country:

North America > United States (0.68)
Asia > Middle East > Israel (0.14)
Asia > Middle East > Qatar (0.14)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (0.89)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.86)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Latent Sequence Decompositions

Chan, William, Zhang, Yu, Le, Quoc, Jaitly, Navdeep

arXiv.org Machine LearningFeb-7-2017

We present the Latent Sequence Decompositions (LSD) framework. LSD decomposes sequences with variable lengthed output units as a function of both the input sequence and the output sequence. We present a training algorithm which samples valid extensions and an approximate decoding algorithm. We experiment with the Wall Street Journal speech recognition task. Our LSD model achieves 12.9% WER compared to a character baseline of 14.8% WER. When combined with a convolutional network on the encoder, we achieve 9.6% WER.

decomposition, deep learning, speech recognition, (16 more...)

arXiv.org Machine Learning

1610.03035

Country: North America > United States > Massachusetts (0.14)

Genre: Research Report (0.40)

Industry: Media (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.70)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.69)

Add feedback

Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer

Shazeer, Noam, Mirhoseini, Azalia, Maziarz, Krzysztof, Davis, Andy, Le, Quoc, Hinton, Geoffrey, Dean, Jeff

arXiv.org Machine LearningJan-23-2017

The capacity of a neural network to absorb information is limited by its number of parameters. Conditional computation, where parts of the network are active on a per-example basis, has been proposed in theory as a way of dramatically increasing model capacity without a proportional increase in computation. In practice, however, there are significant algorithmic and performance challenges. In this work, we address these challenges and finally realize the promise of conditional computation, achieving greater than 1000x improvements in model capacity with only minor losses in computational efficiency on modern GPU clusters. We introduce a Sparsely-Gated Mixture-of-Experts layer (MoE), consisting of up to thousands of feed-forward sub-networks. A trainable gating network determines a sparse combination of these experts to use for each example. We apply the MoE to the tasks of language modeling and machine translation, where model capacity is critical for absorbing the vast quantities of knowledge available in the training corpora. We present model architectures in which a MoE with up to 137 billion parameters is applied convolutionally between stacked LSTM layers. On large language modeling and machine translation benchmarks, these models achieve significantly better results than state-of-the-art at lower computational cost.

deep learning, moe layer, neural network, (18 more...)

arXiv.org Machine Learning

1701.06538

Country: Europe (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Direct Optimization of Ranking Measures

Le, Quoc, Smola, Alexander

arXiv.org Artificial IntelligenceApr-25-2007

Web page ranking and collaborative filtering require the optimization of sophisticated performance measures. Current Support Vector approaches are unable to optimize them directly and focus on pairwise comparisons instead. We present a new approach which allows direct optimization of the relevant loss functions. This is achieved via structured estimation in Hilbert spaces. It is most related to Max-Margin-Markov networks optimization of multivariate performance measures. Key to our approach is that during training the ranking problem can be viewed as a linear assignment problem, which can be solved by the Hungarian Marriage algorithm. At test time, a sort operation is sufficient, as our algorithm assigns a relevance score to every (document, query) pair. Experiments show that the our algorithm is fast and that it works very well.

algorithm, artificial intelligence, optimization problem, (16 more...)

arXiv.org Artificial Intelligence

0704.3359

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.14)

Genre: Research Report > New Finding (0.68)

Industry:

Government (0.68)
Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.48)

Add feedback