Goto

Collaborating Authors

 Large Language Model


FewCLUE: A Chinese Few-shot Learning Evaluation Benchmark

arXiv.org Artificial Intelligence

Pretrained Language Models (PLMs) have achieved tremendous success in natural language understanding tasks. While different learning schemes -- fine-tuning, zero-shot and few-shot learning -- have been widely explored and compared for languages such as English, there is comparatively little work in Chinese to fairly and comprehensively evaluate and compare these methods. This work first introduces Chinese Few-shot Learning Evaluation Benchmark (FewCLUE), the first comprehensive small sample evaluation benchmark in Chinese. It includes nine tasks, ranging from single-sentence and sentence-pair classification tasks to machine reading comprehension tasks. Given the high variance of the few-shot learning performance, we provide multiple training/validation sets to facilitate a more accurate and stable evaluation of few-shot modeling. An unlabeled training set with up to 20,000 additional samples per task is provided, allowing researchers to explore better ways of using unlabeled samples. Next, we implement a set of state-of-the-art (SOTA) few-shot learning methods (including PET, ADAPET, LM-BFF, P-tuning and EFL), and compare their performance with fine-tuning and zero-shot learning schemes on the newly constructed FewCLUE benchmark.Our results show that: 1) all five few-shot learning methods exhibit better performance than fine-tuning or zero-shot learning; 2) among the five methods, PET is the best performing few-shot method; 3) few-shot learning performance is highly dependent on the specific task. Our benchmark and code are available at https://github.com/CLUEbenchmark/FewCLUE


AI as New Electricity?

#artificialintelligence

Till April 2020: GPT-2 was the king of AI, with his stunning 1.5B parameters. It is not easy to deal with it. It takes 6GB on your disk, but that's not the problem. The problem is processing speed: you have to wait several minutes for a single inference running on the CPU. With GPU, it would be at least ten times faster, in a case when you have NVidia GPU with at least 24 GB of Video RAM.


OpenAI warns AI behind GitHub's Copilot may be susceptible to bias

#artificialintelligence

Join executive leaders at the Data, Analytics, & Intelligent Automation Summit, presented by Accenture. Let the OSS Enterprise newsletter guide your open source journey! Last month, GitHub and OpenAI launched Copilot, a service that provides suggestions for whole lines of code inside development environments like Microsoft Visual Studio. Copilot is powered by an AI model called Codex that's trained on billions of lines of public code, and the companies claim Copilot works with a broad set of frameworks and languages and adapts to the edits developers make, matching their coding styles. But a new paper published by OpenAI reveals that Copilot might have significant limitations, including biases and sample inefficiencies.


Zero-shot Visual Question Answering using Knowledge Graph

arXiv.org Artificial Intelligence

Incorporating external knowledge to Visual Question Answering (VQA) has become a vital practical need. Existing methods mostly adopt pipeline approaches with different components for knowledge matching and extraction, feature learning, etc.However, such pipeline approaches suffer when some component does not perform well, which leads to error propagation and poor overall performance. Furthermore, the majority of existing approaches ignore the answer bias issue -- many answers may have never appeared during training (i.e., unseen answers) in real-word application. To bridge these gaps, in this paper, we propose a Zero-shot VQA algorithm using knowledge graphs and a mask-based learning mechanism for better incorporating external knowledge, and present new answer-based Zero-shot VQA splits for the F-VQA dataset. Experiments show that our method can achieve state-of-the-art performance in Zero-shot VQA with unseen answers, meanwhile dramatically augment existing end-to-end models on the normal F-VQA task.


A Survey on Data Augmentation for Text Classification

arXiv.org Artificial Intelligence

Data augmentation, the artificial creation of training data for machine learning by transformations, is a widely studied research field across machine learning disciplines. While it is useful for increasing the generalization capabilities of a model, it can also address many other challenges and problems, from overcoming a limited amount of training data over regularizing the objective to limiting the amount data used to protect privacy. Based on a precise description of the goals and applications of data augmentation (C1) and a taxonomy for existing works (C2), this survey is concerned with data augmentation methods for textual classification and aims to achieve a concise and comprehensive overview for researchers and practitioners (C3). Derived from the taxonomy, we divided more than 100 methods into 12 different groupings and provide state-of-the-art references expounding which methods are highly promising (C4). Finally, research perspectives that may constitute a building block for future work are given (C5).


How Much Can CLIP Benefit Vision-and-Language Tasks?

arXiv.org Artificial Intelligence

Most existing Vision-and-Language (V&L) models rely on pre-trained visual encoders, using a relatively small set of manually-annotated data (as compared to web-crawled data), to perceive the visual world. However, it has been observed that large-scale pretraining usually can result in better generalization performance, e.g., CLIP (Contrastive Language-Image Pre-training), trained on a massive amount of image-caption pairs, has shown a strong zero-shot capability on various vision tasks. To further study the advantage brought by CLIP, we propose to use CLIP as the visual encoder in various V&L models in two typical scenarios: 1) plugging CLIP into task-specific fine-tuning; 2) combining CLIP with V&L pre-training and transferring to downstream tasks. We show that CLIP significantly outperforms widely-used visual encoders trained with in-domain annotated data, such as BottomUp-TopDown. We achieve competitive or better results on diverse V&L tasks, while establishing new state-of-the-art results on Visual Question Answering, Visual Entailment, and V&L Navigation tasks. We release our code at https://github.com/clip-vil/CLIP-ViL.


A Classification of Artificial Intelligence Systems for Mathematics Education

arXiv.org Artificial Intelligence

This chapter provides an overview of the different Artificial Intelligence (AI) systems that are being used in contemporary digital tools for Mathematics Education (ME). It is aimed at researchers in AI and Machine Learning (ML), for whom we shed some light on the specific technologies that are being used in educational applications; and at researchers in ME, for whom we clarify: i) what the possibilities of the current AI technologies are, ii) what is still out of reach and iii) what is to be expected in the near future. We start our analysis by establishing a high-level taxonomy of AI tools that are found as components in digital ME applications. Then, we describe in detail how these AI tools, and in particular ML, are being used in two key applications, specifically AI-based calculators and intelligent tutoring systems. We finish the chapter with a discussion about student modeling systems and their relationship to artificial general intelligence.


Learn from Anywhere: Rethinking Generalized Zero-Shot Learning with Limited Supervision

arXiv.org Artificial Intelligence

A common problem with most zero and few-shot learning approaches is they suffer from bias towards seen classes resulting in sub-optimal performance. Existing efforts aim to utilize unlabeled images from unseen classes (i.e transductive zero-shot) during training to enable generalization. However, this limits their use in practical scenarios where data from target unseen classes is unavailable or infeasible to collect. In this work, we present a practical setting of inductive zero and few-shot learning, where unlabeled images from other out-of-data classes, that do not belong to seen or unseen categories, can be used to improve generalization in any-shot learning. We leverage a formulation based on product-of-experts and introduce a new AUD module that enables us to use unlabeled samples from out-of-data classes which are usually easily available and practically entail no annotation cost. In addition, we also demonstrate the applicability of our model to address a more practical and challenging, Generalized Zero-shot under a limited supervision setting, where even base seen classes do not have sufficient annotated samples.


Can AI learn from any public code online?

#artificialintelligence

Just days after GitHub announced its new Copilot tool, which generates complementary code for programmers' projects, web developer Kyle Peacock tweeted an oddity he had noticed. "I love to learn new things and build things," the algorithm wrote, when asked to generate an About Me page. While the About Me page was supposedly generated for a fake person, that link goes to the GitHub profile of David Celis, who The Verge can confirm is not a figment of Copilot's imagination. Celis is a coder and GitHub user with popular repositories, and even formerly worked at the company. "I'm not surprised that my public repositories are a part of the training data for Copilot," Celis told The Verge, adding that he was amused by the algorithm reciting his name.


GitHub's new tool uses AI to craft code. Some developers are furious

#artificialintelligence

Copilot launched last week in an invite-only Technical Preview, promising to save time by responding to users' code with its own smart suggestions. Those suggestions are based on billions of lines of public code that users have publicly contributed to GitHub, using an AI system called Codex from the research company OpenAI. GitHub describes Copilot as the AI equivalent of pair programming, in which two developers work together at a single computer. The idea is that one developer can bring new ideas or spot problems that the other developer might've missed, even if it requires more person-hours to do so.