AITopics | Deng, Yuntian

Weighted Gaussian Process Bandits for Non-stationary Environments

Deng, Yuntian, Zhou, Xingyu, Kim, Baekjin, Tewari, Ambuj, Gupta, Abhishek, Shroff, Ness

arXiv.org Artificial IntelligenceJul-5-2021

In this paper, we consider the Gaussian process (GP) bandit optimization problem in a non-stationary environment. To capture external changes, the black-box function is allowed to be time-varying within a reproducing kernel Hilbert space (RKHS). To this end, we develop WGP-UCB, a novel UCB-type algorithm based on weighted Gaussian process regression. A key challenge is how to cope with infinite-dimensional feature maps. To that end, we leverage kernel approximation techniques to prove a sublinear regret bound, which is the first (frequentist) sublinear regret guarantee on weighted time-varying bandits with general nonlinear rewards. This result generalizes both non-stationary linear bandits and standard GP-UCB algorithms. Further, a novel concentration inequality is achieved for weighted Gaussian process regression with general weights. We also provide universal upper bounds and weight-dependent upper bounds for weighted maximum information gains. These results are potentially of independent interest for applications such as news ranking and adaptive pricing, where weights can be adopted to capture the importance or quality of data. Finally, we conduct experiments to highlight the favorable gains of the proposed algorithm in many cases when compared to existing methods.

algorithm, artificial intelligence, banking & finance, (17 more...)

arXiv.org Artificial Intelligence

2107.02371

Country: North America > United States > Michigan (0.28)

Genre: Research Report (0.81)

Industry: Banking & Finance > Trading (0.68)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.47)

Add feedback

Residual Energy-Based Models for Text

Bakhtin, Anton, Deng, Yuntian, Gross, Sam, Ott, Myle, Ranzato, Marc'Aurelio, Szlam, Arthur

arXiv.org Machine LearningDec-21-2020

Current large-scale auto-regressive language models (Radford et al., 2019; Liu et al., 2018; Graves, 2013) display impressive fluency and can generate convincing text. In this work we start by asking the question: Can the generations of these models be reliably distinguished from real text by statistical discriminators? We find experimentally that the answer is affirmative when we have access to the training data for the model, and guardedly affirmative even if we do not. This suggests that the auto-regressive models can be improved by incorporating the (globally normalized) discriminators into the generative process. We give a formalism for this using the Energy-Based Model framework, and show that it indeed improves the results of the generative models, measured both in terms of perplexity and in terms of human evaluation.

deep learning, language model, neural network, (22 more...)

arXiv.org Machine Learning

2004.10188

Country: North America > United States > New York (0.14)

Genre:

Research Report (0.64)
Instructional Material (0.45)

Industry:

Leisure & Entertainment > Sports > Football (1.00)
Government > Regional Government > North America Government > United States Government (1.00)
Education (0.92)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(2 more...)

Add feedback

Cascaded Text Generation with Markov Transformers

Deng, Yuntian, Rush, Alexander M.

arXiv.org Machine LearningJun-1-2020

The two dominant approaches to neural text generation are fully autoregressive models, using serial beam search decoding, and non-autoregressive models, using parallel decoding with no output dependencies. This work proposes an autoregressive model with sub-linear parallel time generation. Noting that conditional random fields with bounded context can be decoded in parallel, we propose an efficient cascaded decoding approach for generating high-quality output. To parameterize this cascade, we introduce a Markov transformer, a variant of the popular fully autoregressive model that allows us to simultaneously decode with specific autoregressive context cutoffs. This approach requires only a small modification from standard autoregressive training, while showing competitive accuracy/speed tradeoff compared to existing methods on five machine translation datasets.

arxiv preprint arxiv, deep learning, neural network, (20 more...)

arXiv.org Machine Learning

2006.01112

Country:

North America > Canada (0.14)
Asia > Vietnam (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.67)

Add feedback

AdaptivFloat: A Floating-point based Data Type for Resilient Deep Learning Inference

Tambe, Thierry, Yang, En-Yu, Wan, Zishen, Deng, Yuntian, Reddi, Vijay Janapa, Rush, Alexander, Brooks, David, Wei, Gu-Yeon

arXiv.org Machine LearningOct-15-2019

Conventional hardware-friendly quantization methods, such as fixed-point or integer, tend to perform poorly at very low word sizes as their shrinking dynamic ranges cannot adequately capture the wide data distributions commonly seen in sequence transduction models. We present AdaptivFloat, a floating-point inspired number representation format for deep learning that dynamically maximizes and optimally clips its available dynamic range, at a layer granularity, in order to create faithful encoding of neural network parameters. AdaptivFloat consistently produces higher inference accuracies compared to block floating-point, uniform, IEEE-like float or posit encodings at very low precision ($\leq$ 8-bit) across a diverse set of state-of-the-art neural network topologies. And notably, AdaptivFloat is seen surpassing baseline FP32 performance by up to +0.3 in BLEU score and -0.75 in word error rate at weight bit widths that are $\leq$ 8-bit. Experimental results on a deep neural network (DNN) hardware accelerator, exploiting AdaptivFloat logic in its computational datapath, demonstrate per-operation energy and area that is 0.9$\times$ and 1.14$\times$, respectively, that of equivalent bit width integer-based accelerator variants.

adaptivfloat, deep learning, neural network, (21 more...)

arXiv.org Machine Learning

1909.13271

Country:

North America > United States (0.28)
North America > Canada > Quebec (0.14)

Genre: Research Report (0.43)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Real or Fake? Learning to Discriminate Machine from Human Generated Text

Bakhtin, Anton, Gross, Sam, Ott, Myle, Deng, Yuntian, Ranzato, Marc'Aurelio, Szlam, Arthur

arXiv.org Machine LearningJun-7-2019

Recent advances in generative modeling of text have demonstrated remarkable improvements in terms of fluency and coherency. In this work we investigate to which extent a machine can discriminate real from machine generated text. This is important in itself for automatic detection of computer generated stories, but can also serve as a tool for further improving text generation. We show that learning a dedicated scoring function to discriminate between real and fake text achieves higher precision than employing the likelihood of a generative model. The scoring functions generalize to other generators than those used for training as long as these generators have comparable model complexity and are trained on similar datasets.

architecture, deep learning, neural network, (22 more...)

arXiv.org Machine Learning

1906.03351

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)

Add feedback

Visual Attention Model for Cross-Sectional Stock Return Prediction and End-to-End Multimodal Market Representation Learning

AAAI ConferencesMay-15-2019

Technical and fundamental analysis are traditional tools used to analyze individual stocks; however, the finance literature has shown that the price movement of each individual stock correlates heavily with other stocks, especially those within the same sector. In this paper we propose a general-purpose market representation that incorporates fundamental and technical indicators and relationships between individual stocks. We treat the daily stock market as a ‘market image’ where rows (grouped by market sector) represent individual stocks and columns represent indicators. We apply a convolutional neural network over this market image to build market features in a hierarchical way. We use a recurrent neural network, with an attention mechanism over the market feature maps, to model temporal dynamics in the market. We show that our proposed model outperforms strong baselines in both short-term and long-term stock return prediction tasks. We also show another use for our market image: to construct concise and dense market embeddings suitable for downstream prediction tasks.

deep learning, market image, neural network, (21 more...)

AAAI Conferences

The Thirty-Second International Flairs Conference

Country:

North America > United States (0.14)
Asia > Middle East > Iran (0.14)

Genre: Research Report > New Finding (0.68)

Industry: Banking & Finance > Trading (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Latent Alignment and Variational Attention

Deng, Yuntian, Kim, Yoon, Chiu, Justin, Guo, Demi, Rush, Alexander

Neural Information Processing SystemsDec-31-2018

Neural attention has become central to many state-of-the-art models in natural language processing and related domains. Attention networks are an easy-to-train and effective method for softly simulating alignment; however, the approach does not marginalize over latent alignments in a probabilistic sense. This property makes it difficult to compare attention to other alignment approaches, to compose it with probabilistic models, and to perform posterior inference conditioned on observed data. A related latent approach, hard attention, fixes these issues, but is generally harder to train and less accurate. This work considers variational attention networks, alternatives to soft and hard attention for learning latent variable alignment models, with tighter approximation bounds based on amortized variational inference. We further propose methods for reducing the variance of gradients to make these approaches computationally feasible. Experiments show that for machine translation and visual question answering, inefficient exact latent variable models outperform standard neural attention, but these gains go away when using hard attention based training. On the other hand, variational attention retains most of the performance gain but with training speed comparable to neural attention.

artificial intelligence, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Country:

North America > United States (0.14)
North America > Canada (0.14)

Genre: Research Report (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Latent Alignment and Variational Attention

Deng, Yuntian, Kim, Yoon, Chiu, Justin, Guo, Demi, Rush, Alexander

Neural Information Processing SystemsDec-31-2018

Neural attention has become central to many state-of-the-art models in natural language processing and related domains. Attention networks are an easy-to-train and effective method for softly simulating alignment; however, the approach does not marginalize over latent alignments in a probabilistic sense. This property makes it difficult to compare attention to other alignment approaches, to compose it with probabilistic models, and to perform posterior inference conditioned on observed data. A related latent approach, hard attention, fixes these issues, but is generally harder to train and less accurate. This work considers variational attention networks, alternatives to soft and hard attention for learning latent variable alignment models, with tighter approximation bounds based on amortized variational inference. We further propose methods for reducing the variance of gradients to make these approaches computationally feasible. Experiments show that for machine translation and visual question answering, inefficient exact latent variable models outperform standard neural attention, but these gains go away when using hard attention based training. On the other hand, variational attention retains most of the performance gain but with training speed comparable to neural attention.

deep learning, neural network, proceedings, (21 more...)

Neural Information Processing Systems

Country:

North America > United States (0.14)
North America > Canada (0.14)

Genre: Research Report (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Bottom-Up Abstractive Summarization

Gehrmann, Sebastian, Deng, Yuntian, Rush, Alexander M.

arXiv.org Artificial IntelligenceAug-31-2018

Neural network-based methods for abstractive summarization produce outputs that are more fluent than other techniques, but which can be poor at content selection. This work proposes a simple technique for addressing this issue: use a data-efficient content selector to over-determine phrases in a source document that should be part of the summary. We use this selector as a bottom-up attention step to constrain the model to likely phrases. We show that this approach improves the ability to compress text, while still generating fluent summaries. This two-step process is both simpler and higher performing than other end-to-end content selection models, leading to significant improvements on ROUGE for both the CNN-DM and NYT corpus. Furthermore, the content selector can be trained with as little as 1,000 sentences, making it easy to transfer a trained summarizer to a new domain.

deep learning, neural network, summarization, (18 more...)

arXiv.org Artificial Intelligence

1808.10792

Country: Europe > Germany (0.28)

Genre: Research Report > New Finding (0.46)

Industry: Leisure & Entertainment > Sports > Football (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Latent Alignment and Variational Attention

Deng, Yuntian, Kim, Yoon, Chiu, Justin, Guo, Demi, Rush, Alexander M.

arXiv.org Machine LearningJul-10-2018

Neural attention has become central to many state-of-the-art models in natural language processing and related domains. Attention networks are an easy-to-train and effective method for softly simulating alignment; however, the approach does not marginalize over latent alignments in a probabilistic sense. This property makes it difficult to compare attention to other alignment approaches, to compose it with probabilistic models, and to perform posterior inference conditioned on observed data. A related latent approach, hard attention, fixes these issues, but is generally harder to train and less accurate. This work considers variational attention networks, alternatives to soft and hard attention for learning latent variable alignment models, with tighter approximation bounds based on amortized variational inference. We further propose methods for reducing the variance of gradients to make these approaches computationally feasible. Experiments show that for machine translation and visual question answering, inefficient exact latent variable models outperform standard neural attention, but these gains go away when using hard attention based training. On the other hand, variational attention retains most of the performance gain but with training speed comparable to neural attention.

deep learning, neural network, proceedings, (22 more...)

arXiv.org Machine Learning

1807.03756

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Filters

Collaborating Authors

Deng, Yuntian

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Weighted Gaussian Process Bandits for Non-stationary Environments

Residual Energy-Based Models for Text

Cascaded Text Generation with Markov Transformers

AdaptivFloat: A Floating-point based Data Type for Resilient Deep Learning Inference

Real or Fake? Learning to Discriminate Machine from Human Generated Text

Visual Attention Model for Cross-Sectional Stock Return Prediction and End-to-End Multimodal Market Representation Learning

Latent Alignment and Variational Attention

Latent Alignment and Variational Attention

Bottom-Up Abstractive Summarization

Latent Alignment and Variational Attention