Goto

Collaborating Authors

 Large Language Model


【無料】GPT-3レベルのGoogle製Flan-T5を利用する方法

#artificialintelligence

「無料でGPT-3に匹敵する自然言語処理モデルを利用したい」「Googleが公開している自然言語処理モデルを試したい」このような場合には、Flan-T5がオススメです。この記事では、Google製Flan-T5を利用する方法を解説しています。


How far have we come with Language Models part2(Artificial Intelligence)

#artificialintelligence

Abstract: In this article, we introduce and evaluate the concept of robosourcing for creating educational content. Robosourcing lies in the intersection of crowdsourcing and large language models, where instead of a crowd of humans, requests to large language models replace some of the work traditionally performed by the crowd. Robosourcing includes a human-in-the-loop to provide priming (input) as well as to evaluate and potentially adjust the generated artefacts; these evaluations could also be used to improve the large language models. We propose a system to outline the robosourcing process. We further study the feasibility of robosourcing in the context of education by conducting an evaluation of robosourced and programming exercises, generated using OpenAI Codex.


Dark patterns in e-commerce: a dataset and its baseline evaluations

arXiv.org Artificial Intelligence

Dark patterns, which are user interface designs in online services, induce users to take unintended actions. Recently, dark patterns have been raised as an issue of privacy and fairness. Thus, a wide range of research on detecting dark patterns is eagerly awaited. In this work, we constructed a dataset for dark pattern detection and prepared its baseline detection performance with state-of-the-art machine learning methods. The original dataset was obtained from Mathur et al.'s study in 2019, which consists of 1,818 dark pattern texts from shopping sites. Then, we added negative samples, i.e., non-dark pattern texts, by retrieving texts from the same websites as Mathur et al.'s dataset. We also applied state-of-the-art machine learning methods to show the automatic detection accuracy as baselines, including BERT, RoBERTa, ALBERT, and XLNet. As a result of 5-fold cross-validation, we achieved the highest accuracy of 0.975 with RoBERTa. The dataset and baseline source codes are available at https://github.com/yamanalab/ec-darkpattern.


Measuring Progress on Scalable Oversight for Large Language Models

arXiv.org Artificial Intelligence

Developing safe and useful general-purpose AI systems will require us to make progress on scalable oversight: the problem of supervising systems that potentially outperform us on most skills relevant to the task at hand. Empirical work on this problem is not straightforward, since we do not yet have systems that broadly exceed our abilities. This paper discusses one of the major ways we think about this problem, with a focus on ways it can be studied empirically. We first present an experimental design centered on tasks for which human specialists succeed but unaided humans and current general AI systems fail. We then present a proof-of-concept experiment meant to demonstrate a key feature of this experimental design and show its viability with two question-answering tasks: MMLU and time-limited QuALITY. On these tasks, we find that human participants who interact with an unreliable large-language-model dialog assistant through chat -- a trivial baseline strategy for scalable oversight -- substantially outperform both the model alone and their own unaided performance. These results are an encouraging sign that scalable oversight will be tractable to study with present models and bolster recent findings that large language models can productively assist humans with difficult tasks.


A Generalist Agent

arXiv.org Artificial Intelligence

Inspired by progress in large-scale language modeling, we apply a similar approach towards building a single generalist agent beyond the realm of text outputs. The agent, which we refer to as Gato, works as a multi-modal, multi-task, multi-embodiment generalist policy. The same network with the same weights can play Atari, caption images, chat, stack blocks with a real robot arm and much more, deciding based on its context whether to output text, joint torques, button presses, or other tokens. In this report we describe the model and the data, and document the current capabilities of Gato. What is the capital of France?


AI is better at answering questions if you get another AI to ask them

New Scientist

An artificial intelligence model that makes suggestions to another AI can get it to produce results that are as good as if the prompts came from humans. The technique could be used to improve the performance of AIs whose internal workings remain opaque. Large language models (LLMs) – neural networks that are trained on vast data sets of online text and can produce convincing language – are also operated using language.


Generative A.I. doesn't much impress Noam Chomsky

#artificialintelligence

But just how smart are these large language models? On the last day of the conference, I interviewed legendary linguist Noam Chomsky, now 93 years old, and Gary Marcus, an emeritus professor of cognitive science at New York University who has spent much of the past decade highlighting the limits of deep learning. Both were distinctly unimpressed with today's cutting edge A.I. Chomsky's big disappointment is that these large language models don't tell us anything at all about how the human brain works. Chomsky has devoted much of his life to advancing the theory that there is a universal grammar, or at least a set of structural concepts, that underpin all human languages, and that this grammar is somehow hard-wired into the brain. Chomsky thinks this explains why human infants can master language so easily--whereas today's computer systems need to be fed what Chomsky rightly calls "astronomical amounts of data" and even then still don't actually understand language at all.


Is AI Art Really Art?

#artificialintelligence

Maureen F. McHugh's short story collection After the Apocalypse was merely prescient when published in 2011, but it appears positively prophetic a decade later with its narratives about respiratory virus pandemics, frayed social connections, and increased political violence. Few of her tales, however, are as haunting as "The Kingdom of the Blind," which will perhaps prove to be the most visionary of McHugh's stories. "The Kingdom of the Blind" takes as its subject artificial intelligence, grappling with the possibility that any consciousness which arises from soldering board and circuitry may be so alien that it's scarcely recognizable to us as a consciousness in the first place. The emergent process of consciousness as it develops in this AI is inscrutable and totally different from anything which resembles human thinking, posing a difficulty for the computer scientists who attempt to communicate with it. In sparse, elegant, and beautiful prose, McHugh's story describes how a massive interconnected computer program evolves a quality that could be described as "consciousness," and yet how to describe the thought which animates this being is impossible.


La veille de la cybersécurité

#artificialintelligence

What do our creations think of us? Generative Pre-trained Transformer 3 is a language model released by OpenAI in 2020 that uses deep learning to produce text that seems like it could have been written by a human. Taken individually, the AI's lines don't smack much of poetry or strictly cohere, but in aggregate, they gesture at something more. What would it produce if asked to meditate on the human soul and to produce spiritual poetry like ours? What does it think of our religious beliefs?


How Large Language Models are Transforming Machine-Paraphrased Plagiarism

arXiv.org Artificial Intelligence

The recent success of large language models for text generation poses a severe threat to academic integrity, as plagiarists can generate realistic paraphrases indistinguishable from original work. However, the role of large autoregressive transformers in generating machine-paraphrased plagiarism and their detection is still developing in the literature. This work explores T5 and GPT-3 for machine-paraphrase generation on scientific articles from arXiv, student theses, and Wikipedia. We evaluate the detection performance of six automated solutions and one commercial plagiarism detection software and perform a human study with 105 participants regarding their detection performance and the quality of generated examples. Our results suggest that large models can rewrite text humans have difficulty identifying as machine-paraphrased (53% mean acc.). Human experts rate the quality of paraphrases generated by GPT-3 as high as original texts (clarity 4.0/5, fluency 4.2/5, coherence 3.8/5). The best-performing detection model (GPT-3) achieves a 66% F1-score in detecting paraphrases.