Goto

Collaborating Authors

 Large Language Model


NEC develops Japanese-language generative AI

The Japan Times

NEC has developed a generative artificial intelligence model with high Japanese-language proficiency amid the global dominance of models trained in English. This month, NEC will start AI services based on its own large language model (LLM), the key technology for automatically creating sentences and other items, for business use in Japan, according to an announcement Thursday. Its sales target is ¥50 billion over the next three years. This could be due to a conflict with your ad-blocking or security software. Please add japantimes.co.jp and piano.io to your list of allowed sites. If this does not resolve the issue or you are unable to add the domains to your allowlist, please see this FAQ.


Students who use AI to cheat warned they will be exposed as detection services grow in use

FOX News

A geography professor shared his method to detect AI-generated plagiarism with Fox News. He developed it after noticing that ChatGPT produced fake citations. Companies that develop software to detect if artificial intelligence or humans authored an essay or other written assignment are having a windfall moment amid ChatGPT's wild success. ChatGPT launched last November and quickly grew to 100 million monthly active users by January, setting a record as the fastest-growing user base ever. The platform has been especially favored by younger generations, including students in middle school through college.


AI likely to spell end of traditional school classroom, leading expert says

The Guardian

Recent advances in AI are likely to spell the end of the traditional school classroom, one of the world's leading experts on AI has predicted. Prof Stuart Russell, a British computer scientist based at the University of California, Berkeley, said that personalised ChatGPT-style tutors have the potential to hugely enrich education and widen global access by delivering personalised tuition to every household with a smartphone. The technology could feasibly deliver "most material through to the end of high school", he said. "Education is the biggest benefit that we can look for in the next few years," Russell said before a talk on Friday at the UN's AI for Good Global Summit in Geneva. "It ought to be possible within a few years, maybe by the end of this decade, to be delivering a pretty high quality of education to every child in the world. However, he cautioned that deploying the powerful technology in the education sector also carries risks, including the potential for indoctrination. Russell cited evidence from studies using human tutors that one-to-one teaching can be two to three more times effective than traditional classroom lessons, allowing children to get tailored support and be led by curiosity. "Oxford and Cambridge don't really use a traditional classroom … they use tutors presumably because it's more effective," he said. "It's literally infeasible to do that for every child in the world.


MISGENDERED: Limits of Large Language Models in Understanding Pronouns

arXiv.org Artificial Intelligence

Content Warning: This paper contains examples of misgendering and erasure that could be offensive and potentially triggering. Gender bias in language technologies has been widely studied, but research has mostly been restricted to a binary paradigm of gender. It is essential also to consider non-binary gender identities, as excluding them can cause further harm to an already marginalized group. In this paper, we comprehensively evaluate popular language models for their ability to correctly use English gender-neutral pronouns (e.g., singular they, them) and neo-pronouns (e.g., ze, xe, thon) that are used by individuals whose gender identity is not represented by binary pronouns. We introduce MISGENDERED, a framework for evaluating large language models' ability to correctly use preferred pronouns, consisting of (i) instances declaring an individual's pronoun, followed by a sentence with a missing pronoun, and (ii) an experimental setup for evaluating masked and auto-regressive language models using a unified method. When prompted out-of-the-box, language models perform poorly at correctly predicting neo-pronouns (averaging 7.7% accuracy) and gender-neutral pronouns (averaging 34.2% accuracy). This inability to generalize results from a lack of representation of non-binary pronouns in training data and memorized associations. Few-shot adaptation with explicit examples in the prompt improves performance for neo-pronouns, but only to 64.7% even with 20 shots. We release the full dataset, code, and demo at https://tamannahossainkay.github.io/misgendered/


ALERT: Adapting Language Models to Reasoning Tasks

arXiv.org Artificial Intelligence

Current large language models can perform reasonably well on complex tasks that require step-by-step reasoning with few-shot learning. Are these models applying reasoning skills they have learnt during pre-training and reason outside of their training context, or are they simply memorizing their training corpus at finer granularity and have learnt to better understand their context? To tease apart these possibilities, we introduce ALERT, a benchmark and suite of analyses for assessing language models' reasoning ability comparing pre-trained and finetuned models on complex tasks that require reasoning skills to solve. ALERT provides a test bed to asses any language model on fine-grained reasoning skills, which spans over 20 datasets and covers 10 different reasoning skills. We leverage ALERT to further investigate the role of finetuning. With extensive empirical analysis we find that language models learn more reasoning skills such as textual entailment, abductive reasoning, and analogical reasoning during finetuning stage compared to pretraining state. We also find that when language models are finetuned they tend to overfit to the prompt template, which hurts the robustness of models causing generalization problems.


Large Language Models as Batteries-Included Zero-Shot ESCO Skills Matchers

arXiv.org Artificial Intelligence

Understanding labour market dynamics requires accurately identifying the skills required for and possessed by the workforce. Automation techniques are increasingly being developed to support this effort. However, automatically extracting skills from job postings is challenging due to the vast number of existing skills. The ESCO (European Skills, Competences, Qualifications and Occupations) framework provides a useful reference, listing over 13,000 individual skills. However, skills extraction remains difficult and accurately matching job posts to the ESCO taxonomy is an open problem. In this work, we propose an end-to-end zero-shot system for skills extraction from job descriptions based on large language models (LLMs). We generate synthetic training data for the entirety of ESCO skills and train a classifier to extract skill mentions from job posts. We also employ a similarity retriever to generate skill candidates which are then re-ranked using a second LLM. Using synthetic data achieves an RP@10 score 10 points higher than previous distant supervision approaches. Adding GPT-4 re-ranking improves RP@10 by over 22 points over previous methods. We also show that Framing the task as mock programming when prompting the LLM can lead to better performance than natural language prompts, especially with weaker LLMs. We demonstrate the potential of integrating large language models at both ends of skills matching pipelines. Our approach requires no human annotations and achieve extremely promising results on skills extraction against ESCO.


The Ethical Implications of Generative Audio Models: A Systematic Literature Review

arXiv.org Artificial Intelligence

At their core, generative models are a type of AI system that take in vast Generative audio models typically focus their applications in music amounts of training data to be able to produce a novel item that is and speech generation, with recent models having human-like quality similar to and statistically likely to exist in the data it was trained in their audio output. This paper conducts a systematic literature on. Though generative models have been around for decades with review of 884 papers in the area of generative audio models in order origins in the 1980s [9], the outputs of these models saw unprecedented to both quantify the degree to which researchers in the field are considering advances with the introduction of the transformer in 2017 potential negative impacts and identify the types of ethical which revolutionized the field by introducing a mechanism called implications researchers in this area need to consider. Though 65% "attention" that allowed for much more accurate and complex outputs of generative audio research papers note positive potential impacts of generative models [61]. Generative models may continue to of their work, less than 10% discuss any negative impacts. This improve as (a) their training data becomes larger (for text, imagine jarringly small percentage of papers considering negative impact the entire internet) and (b) researchers continue to make advances is particularly worrying because the issues brought to light by the in the architecture of the models. This paper focuses specifically few papers doing so are raising serious ethical implications and on the current landscape of generative audio models.


Procedurally generating rules to adapt difficulty for narrative puzzle games

arXiv.org Artificial Intelligence

This paper focuses on procedurally generating rules and communicating them to players to adjust the difficulty. This is part of a larger project to collect and adapt games in educational games for young children using a digital puzzle game designed for kindergarten. A genetic algorithm is used together with a difficulty measure to find a target number of solution sets and a large language model is used to communicate the rules in a narrative context. During testing the approach was able to find rules that approximate any given target difficulty within two dozen generations on average. The approach was combined with a large language model to create a narrative puzzle game where players have to host a dinner for animals that can't get along. Future experiments will try to improve evaluation, specialize the language model on children's literature, and collect multi-modal data from players to guide adaptation.


Why machines do not understand: A response to S{\o}gaard

arXiv.org Artificial Intelligence

Some defenders of so-called'artificial intelligence' believe that machines can understand language. In particular, Søgaard has argued in this journal for a thesis of this sort, on the basis of the idea (1) that where there is semantics there is also understanding and (2) that machines are not only capable of what he calls'inferential semantics', but even that they can (with the help of inputs from sensors) 'learn' referential semantics (Søgaard, 2022). We show that he goes wrong because he pays insufficient attention to the difference between language as used by humans and the sequences of inert of symbols which arise when language is stored on hard drives or in books in libraries. So-called large language models (LLMs), such as the ones built into chatGPT and GPT-4, contain encodings of natural language symbol sequences which represent morphological and syntactic relationships between their constituent symbols. This means that a model of this sort can represent both the internal structure of words and the ways in which words are put together to form phrases, sentences and paragraphs.


MultiQG-TI: Towards Question Generation from Multi-modal Sources

arXiv.org Artificial Intelligence

We study the new problem of automatic question generation (QG) from multi-modal sources containing images and texts, significantly expanding the scope of most of the existing work that focuses exclusively on QG from only textual sources. We propose a simple solution for our new problem, called MultiQG-TI, which enables a text-only question generator to process visual input in addition to textual input. Specifically, we leverage an image-to-text model and an optical character recognition model to obtain the textual description of the image and extract any texts in the image, respectively, and then feed them together with the input texts to the question generator. We only fine-tune the question generator while keeping the other components fixed. On the challenging ScienceQA dataset, we demonstrate that MultiQG-TI significantly outperforms ChatGPT with few-shot prompting, despite having hundred-times less trainable parameters. Additional analyses empirically confirm the necessity of both visual and textual signals for QG and show the impact of various modeling choices.