Collaborating Authors


Forthcoming machine learning and AI seminars: May 2023 edition


This post contains a list of the AI-related seminars that are scheduled to take place between 10 May and 30 June 2023. All events detailed here are free and open for anyone to attend virtually. Natural Language Generation Problems and Challenges Speaker: Konstantinos Diamantaras Organised by: Chalmers AI Research Centre Zoom link is here. Exhaustive Symbolic Regression (or how to find the best function for your data) Speaker: Harry Desmond (University of Portsmouth) Organised by: University of Lisbon Register here. Multi-Fidelity Bayesian Optimization with Unreliable Information Sources Speakers: Julien Martinelli (Aalto University) Organised by: Finnish Center for Artificial Intelligence Zoom link is here.

Defending Against Neural Fake News

Neural Information Processing Systems

Recent progress in natural language generation has raised dual-use concerns. While applications like summarization and translation are positive, the underlying technology also might enable adversaries to generate neural fake news: targeted propaganda that closely mimics the style of real news. Modern computer security relies on careful threat modeling: identifying potential threats and vulnerabilities from an adversary's point of view, and exploring potential mitigations to these threats. Likewise, developing robust defenses against neural fake news requires us first to carefully investigate and characterize the risks of these models. We thus present a model for controllable text generation called Grover.

TaylorGAN: Neighbor-Augmented Policy Update for Sample-Efficient Natural Language Generation

Neural Information Processing Systems

Score function-based natural language generation (NLG) approaches such as REINFORCE, in general, suffer from low sample efficiency and training instability problems. This is mainly due to the non-differentiable nature of the discrete space sampling and thus these methods have to treat the discriminator as a black box and ignore the gradient information. To improve the sample efficiency and reduce the variance of REINFORCE, we propose a novel approach, TaylorGAN, which augments the gradient estimation by off-policy update and the first-order Taylor expansion. This approach enables us to train NLG models from scratch with smaller batch size -- without maximum likelihood pre-training, and outperforms existing GAN-based methods on multiple metrics of quality and diversity.

Review for NeurIPS paper: TaylorGAN: Neighbor-Augmented Policy Update Towards Sample-Efficient Natural Language Generation

Neural Information Processing Systems

This paper proposed a novel method for GAN-based natural language generation, where first order Taylor expension is used to estimate the gradient of the reword function. This method greatly mitigate the high variance problem of previous methods and improve the sample efficiency. Experiments show the proposed method achieve the state-of-the-art. The work is solid both in theory and in experiments.

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks Patrick Lewis

Neural Information Processing Systems

Large pre-trained language models have been shown to store factual knowledge in their parameters, and achieve state-of-the-art results when fine-tuned on downstream NLP tasks. However, their ability to access and precisely manipulate knowledge is still limited, and hence on knowledge-intensive tasks, their performance lags behind task-specific architectures. Additionally, providing provenance for their decisions and updating their world knowledge remain open research problems. Pre-trained models with a differentiable access mechanism to explicit nonparametric memory can overcome this issue, but have so far been only investigated for extractive downstream tasks. We explore a general-purpose fine-tuning recipe for retrieval-augmented generation (RAG) -- models which combine pre-trained parametric and non-parametric memory for language generation. We introduce RAG models where the parametric memory is a pre-trained seq2seq model and the non-parametric memory is a dense vector index of Wikipedia, accessed with a pre-trained neural retriever. We compare two RAG formulations, one which conditions on the same retrieved passages across the whole generated sequence, and another which can use different passages per token. We fine-tune and evaluate our models on a wide range of knowledge-intensive NLP tasks and set the state of the art on three open domain QA tasks, outperforming parametric seq2seq models and task-specific retrieve-and-extract architectures. For language generation tasks, we find that RAG models generate more specific, diverse and factual language than a state-of-the-art parametric-only seq2seq baseline.

Neural Rule-Execution Tracking Machine For Transformer-Based Text Generation

Neural Information Processing Systems

Sequence-to-Sequence (Seq2Seq) neural text generation models, especially the pre-trained ones (e.g., BART and T5), have exhibited compelling performance on various natural language generation tasks. However, the black-box nature of these models limits their application in tasks where specific rules (e.g., controllable constraints, prior knowledge) need to be executed. Previous works either design specific model structures (e.g., Copy Mechanism corresponding to the rule "the generated output should include certain words in the source input") or implement specialized inference algorithms (e.g., Constrained Beam Search) to execute particular rules through the text generation. These methods require the careful design case-by-case and are difficult to support multiple rules concurrently. In this paper, we propose a novel module named Neural Rule-Execution Tracking Machine (NRETM) that can be equipped into various transformer-based generators to leverage multiple rules simultaneously to guide the neural generation model for superior generation performance in an unified and scalable way. Extensive experiments on several benchmarks verify the effectiveness of our proposed model in both controllable and general text generation tasks.

Supplementary Material for Multi-modal Dependency Tree for Video Captioning

Neural Information Processing Systems

This paper introduces a novel video captioning method that generates sentences by constructing dependency trees. The proposed method offers a possible new way of generating fluent and relevant sentences for videos and may inspire more works that explicitly model the syntactic structure of sentences in natural language generation. It can also help develop more practical video processing systems, such as automatic video subtitling tools. However, such technique is still affected by the biases in the training data. When the videos involve minorities or uncommonly-seen subjects, it may produce undesired output or lead to inaccurate understanding of the video content.

The Limitations of ChatGPT


ChatGPT can readily traverse the vast amounts of information on the internet to answer almost any ad-hoc question users pose. That it does so via natural language, in close to real-time, is indicative of the immense advancements of Generative Artificial Intelligence--and of Natural Language Generation, in particular. ChatGPT's practical utility spans most tasks associated with language, including creating annotated training datasets for data scientists, to creating highly specific reports, emails, or papers for almost any facet of business or academia. Not surprisingly, vendors of all types are rushing to implement this language model to improve solutions for everything from Business Intelligence to content services. "It's still got its own set of limitations," admitted Abhishek Gupta, Principal Data Scientist, and Engineer at Talentica.

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models

Neural Information Processing Systems

We explore how generating a chain of thought--a series of intermediate reasoning steps--significantly improves the ability of large language models to perform complex reasoning. In particular, we show how such reasoning abilities emerge naturally in sufficiently large language models via a simple method called chain-ofthought prompting, where a few chain of thought demonstrations are provided as exemplars in prompting. Experiments on three large language models show that chain-of-thought prompting improves performance on a range of arithmetic, commonsense, and symbolic reasoning tasks. The empirical gains can be striking. For instance, prompting a PaLM 540B with just eight chain-of-thought exemplars achieves state-of-the-art accuracy on the GSM8K benchmark of math word problems, surpassing even finetuned GPT-3 with a verifier.