"As for why I tell a lot of stories, there's a joke about that. There was once a man who had a computer, and he asked it, 'Do you compute that you will ever be able to think like a human being?' And after assorted grindings and beepings, a slip of paper came out of the computer that said, 'That reminds me of a story . . . "
– from ANGELS FEAR: TOWARDS AN EPISTEMOLOGY OF THE SACRED. Gregory Bateson & Mary Catherine Bateson. (Part III 'Metalogue').
The term "fake news" barely made a blip on most people's radars a few years ago, yet many observers are saying that this pernicious form of disinformation -- now weaponized to spread virally thanks to social media -- could potentially destabilize democracies around the world. But so-called "fake news" is nothing new: after all, the practice of spreading of false information to influence public opinion has been a relatively common one since at least ancient times. However, what's alarming today is that fake news will likely no longer be only generated by humans. While there are automated methods to detect fake news created by humans, with recent AI advancements, especially in the field of natural language generation (NLG), it will now be possible for machines to produce convincing disinformation, written in the language and tone of established news sources -- on a much larger scale and to potentially much more devastating effect -- than ever before. So how to catch this kind of machine-generated propaganda?
The adoption of artificial intelligence (AI) is rapidly taking hold across global business, according to a new McKinsey Global Survey on the topic.1 1.The online survey was in the field from February 6 to February 16, 2018, and garnered responses from 2,135 participants representing the full range of regions, industries, company sizes, functional specialties, and tenures. To adjust for differences in response rates, the data are weighted by the contribution of each respondent's nation to global GDP. AI, typically defined as the ability of a machine to perform cognitive functions associated with human minds (such as perceiving, reasoning, learning, and problem solving), includes a range of capabilities that enable AI to solve business problems. The survey asked about nine in particular,2 2.The nine capabilities are natural-language text understanding, natural-language speech understanding, natural-language generation, virtual agents or conversational interfaces, computer vision, machine learning, physical robotics, autonomous vehicles, and robotic process automation (RPA). Some would argue that RPA should not be classified as AI in and of itself, but in our experience, RPA systems are increasingly incorporating AI capabilities.
The best approach towards NLP that is a blend of Machine Learning and Fundamental Meaning for maximizing the outcomes. Machine Learning only is at the core of many NLP platforms, however, the amalgamation of fundamental meaning and Machine Learning helps to make efficient NLP based chatbots. Machine Language is used to train the bots which leads it to continuous learning for natural language processing (NLP) and natural language generation (NLG). Both ML and FM has its own benefits and shortcomings as well. Best features of both the approaches are ideal for resolving the real-world business problems.
The exposure bias problem refers to the training-inference discrepancy caused by teacher forcing in maximum likelihood estimation (MLE) training for recurrent neural network language models (RNNLM). It has been regarded as a central problem for natural language generation (NLG) model training. Although a lot of algorithms have been proposed to avoid teacher forcing and therefore to remove exposure bias, there is little work showing how serious the exposure bias problem is. In this work, starting from the definition of exposure bias, we propose two simple and intuitive approaches to quantify exposure bias for MLE-trained language models. Experiments are conducted on both synthetic and real data-sets. Surprisingly, our results indicate that either exposure bias is trivial (i.e. indistinguishable from the mismatch between model and data distribution), or is not as significant as it is presumed to be (with a measured performance gap of 3%). With this work, we suggest re-evaluating the viewpoint that teacher forcing or exposure bias is a major drawback of MLE training.
The launch comes after years of research and improvements using artificial intelligence (AI) and natural language generation (NLG). According to Statista 2.14 billion people worldwide will buy goods and services online by 2021. In Canada, PayPal reports that eCommerce businesses are growing 28 times more than those who are not selling online. As more consumers turn to the web to research and purchase items, product content becomes increasingly important in order to drive this level of growth. According to Salsify 51% of the time, a product listing with more bullets will convert at a higher rate and outrank its top competitor.
How can we measure whether a natural language generation system produces both high quality and diverse outputs? Human evaluation captures quality but not diversity, as it does not catch models that simply plagiarize from the training set. On the other hand, statistical evaluation (i.e., perplexity) captures diversity but not quality, as models that occasionally emit low quality samples would be insufficiently penalized. In this paper, we propose a unified framework which evaluates both diversity and quality, based on the optimal error rate of predicting whether a sentence is human- or machine-generated. We demonstrate that this error rate can be efficiently estimated by combining human and statistical evaluation, using an evaluation metric which we call HUSE. On summarization and chit-chat dialogue, we show that (i) HUSE detects diversity defects which fool pure human evaluation and that (ii) techniques such as annealing for improving quality actually decrease HUSE due to decreased diversity.
One of the latest advancements for business development tools is the advent of augmented analytics. As per a report from Deloitte "Augmented analytics marks the next wave of disruption in the data analytics market". It is an approach that automates insights using machine learning and natural language generation. Gartner predicts "by 2020, more than 40% of data science tasks will be automated", resulting in increased productivity and broader use by data scientists. According to Accenture, "1 out of 3 insurers globally now uses Big Data from IoT technologies, such as Fitbit, Samsung Gear or Apple watch to collect lifestyle data from insureds".
In this paper, we propose a novel pretraining-based encoder-decoder framework, which can generate the output sequence based on the input sequence in a two-stage manner. For the encoder of our model, we encode the input sequence into context representations using BERT. For the decoder, there are two stages in our model, in the first stage, we use a Transformer-based decoder to generate a draft output sequence. In the second stage, we mask each word of the draft sequence and feed it to BERT, then by combining the input sequence and the draft representation generated by BERT, we use a Transformer-based decoder to predict the refined word for each masked position. To the best of our knowledge, our approach is the first method which applies the BERT into text generation tasks. As the first step in this direction, we evaluate our proposed method on the text summarization task. Experimental results show that our model achieves new state-of-the-art on both CNN/Daily Mail and New York Times datasets.
Hierarchical Neural story generation had the most realistic paragraphs of any text generation paper I came across last summer when I was doing research on this topic. I also find it fun that they used a subreddit as their training set (writingprompts). The architecture is a mixture of self attention layers and convolutional layers. Generate a prompt and then generate a full story. I tried to extend their work to prompt, outline, story but the results were meh (they were similar quality to not bothering with the outline step).