Large Language Model
Who is Sam Altman? The tech leader behind artificial intelligence lab OpenAI
Fox News correspondent Matt Finn has the latest on the impact of AI technology that some say could outpace humans on'Special Report.' Artificial intelligence will take center stage in the nation's capital on Tuesday, when tech CEO Sam Altman testifies for the first time before Congress regarding ChatGPT, his company's revolutionary chatbot. Altman's OpenAI, an AI research lab, revolutionized the technology last year when it released ChatGPT, a chatbot that's able to mimic human conversation based on prompts it is given. The company has gone on to release updated iterations of the chatbot since last November, which has sparked a race in Silicon Valley for other tech companies to build and release more power systems powered by artificial intelligence. Altman will appear before the Senate Judiciary subcommittee on privacy, technology, and the law on Tuesday morning amid pressure on government leaders to craft regulations for artificial intelligence.
Why use of AI is a major sticking point in the ongoing writers' strike
Using existing scripts to train AIs and deploying the technology to draft new scripts are major concerns in the ongoing Hollywood writers' strike Could AI soon write your favourite Hollywood film or streaming show? That concern is one of the issues driving a US film and television writers' strike that has halted many productions nationwide. The Writers Guild of America (WGA), a labour union representing writers who primarily work in film and television, began the work strike this month after reaching an impasse in negotiations with the Alliance of Motion Picture and Television Producers that represents the US entertainment industry. Part of the disagreement revolves around a WGA proposal to ban the industry from using AIs such as ChatGPT to generate story ideas or scripts for films and shows โ the union wants to ensure that such technologies do not undermine writers' compensation and writing credits. "The fear is that AI could be used to produce first drafts of shows, and then a small number of writers would work off of those scripts," says Virginia Doellgast at Cornell University in New York.
Look out, get-rich-quick schemes are coming to AI
But entrepreneurship and computer science experts say that is a misguided view of how artificial intelligence can help entrepreneurs. Nearly any moneymaking scheme devised solely by ChatGPT is bound to be generic, they said, because chatbots will regurgitate strategies that are widely known. Indeed, the tools are more useful helping people with actual business ideas do the technical work of starting a company, such as writing a business plan, creating an income statement or devising a marketing strategy.
ChatGPT misidentifies digital minister pushing AI use in Japan
ChatGPT failed to correctly identify digital minister Taro Kono, even as he advocates for more use of artificial intelligence to help overcome labor shortages caused by a population decline. "I asked ChatGPT who Kono Taro is and he came back with the wrong answer," Kono said in an interview with Bloomberg Television broadcast Monday. "So you need to be careful," he added. Kono asks that his name be written in Japanese style, with surname first. Asked how ChatGPT had identified him when he entered a query about himself, Kono said it had called him "prime minister of Japan."
Small Models are Valuable Plug-ins for Large Language Models
Xu, Canwen, Xu, Yichong, Wang, Shuohang, Liu, Yang, Zhu, Chenguang, McAuley, Julian
Large language models (LLMs) such as GPT-3 and GPT-4 are powerful but their weights are often publicly unavailable and their immense sizes make the models difficult to be tuned with common hardware. As a result, effectively tuning these models with large-scale supervised data can be challenging. As an alternative, In-Context Learning (ICL) can only use a small number of supervised examples due to context length limits. In this paper, we propose Super In-Context Learning (SuperICL) which allows black-box LLMs to work with locally fine-tuned smaller models, resulting in superior performance on supervised tasks. Our experiments demonstrate that SuperICL can improve performance beyond state-of-the-art fine-tuned models while addressing the instability problem of in-context learning. Furthermore, SuperICL can enhance the capabilities of smaller models, such as multilinguality and interpretability.
Pre-Training to Learn in Context
Gu, Yuxian, Dong, Li, Wei, Furu, Huang, Minlie
In-context learning, where pre-trained language models learn to perform tasks from task examples and instructions in their contexts, has attracted much attention in the NLP community. However, the ability of in-context learning is not fully exploited because language models are not explicitly trained to learn in context. To this end, we propose PICL (Pre-training for In-Context Learning), a framework to enhance the language models' in-context learning ability by pre-training the model on a large collection of "intrinsic tasks" in the general plain-text corpus using the simple language modeling objective. PICL encourages the model to infer and perform tasks by conditioning on the contexts while maintaining task generalization of pre-trained models. We evaluate the in-context learning performance of the model trained with PICL on seven widely-used text classification datasets and the Super-NaturalInstrctions benchmark, which contains 100+ NLP tasks formulated to text generation. Our experiments show that PICL is more effective and task-generalizable than a range of baselines, outperforming larger language models with nearly 4x parameters. The code is publicly available at https://github.com/thu-coai/PICL.
Soft Prompt Decoding for Multilingual Dense Retrieval
Huang, Zhiqi, Zeng, Hansi, Zamani, Hamed, Allan, James
In this work, we explore a Multilingual Information Retrieval (MLIR) task, where the collection includes documents in multiple languages. We demonstrate that applying state-of-the-art approaches developed for cross-lingual information retrieval to MLIR tasks leads to sub-optimal performance. This is due to the heterogeneous and imbalanced nature of multilingual collections -- some languages are better represented in the collection and some benefit from large-scale training data. To address this issue, we present KD-SPD, a novel soft prompt decoding approach for MLIR that implicitly "translates" the representation of documents in different languages into the same embedding space. To address the challenges of data scarcity and imbalance, we introduce a knowledge distillation strategy. The teacher model is trained on rich English retrieval data, and by leveraging bi-text data, our distillation framework transfers its retrieval knowledge to the multilingual document encoder. Therefore, our approach does not require any multilingual retrieval training data. Extensive experiments on three MLIR datasets with a total of 15 languages demonstrate that KD-SPD significantly outperforms competitive baselines in all cases. We conduct extensive analyses to show that our method has less language bias and better zero-shot transfer ability towards new languages.
Similarity-weighted Construction of Contextualized Commonsense Knowledge Graphs for Knowledge-intense Argumentation Tasks
Plenz, Moritz, Opitz, Juri, Heinisch, Philipp, Cimiano, Philipp, Frank, Anette
Arguments often do not make explicit how a conclusion follows from its premises. To compensate for this lack, we enrich arguments with structured background knowledge to support knowledge-intense argumentation tasks. We present a new unsupervised method for constructing Contextualized Commonsense Knowledge Graphs (CCKGs) that selects contextually relevant knowledge from large knowledge graphs (KGs) efficiently and at high quality. Our work goes beyond context-insensitive knowledge extraction heuristics by computing semantic similarity between KG triplets and textual arguments. Using these triplet similarities as weights, we extract contextualized knowledge paths that connect a conclusion to its premise, while maximizing similarity to the argument. We combine multiple paths into a CCKG that we optionally prune to reduce noise and raise precision. Intrinsic evaluation of the quality of our graphs shows that our method is effective for (re)constructing human explanation graphs. Manual evaluations in a large-scale knowledge selection setup confirm high recall and precision of implicit CSK in the CCKGs. Finally, we demonstrate the effectiveness of CCKGs in a knowledge-insensitive argument quality rating task, outperforming strong baselines and rivaling a GPT-3 based system.
CQE: A Comprehensive Quantity Extractor
Almasian, Satya, Kazakova, Vivian, Gรถldner, Philip, Gertz, Michael
Quantities are essential in documents to describe factual information. They are ubiquitous in application domains such as finance, business, medicine, and science in general. Compared to other information extraction approaches, interestingly only a few works exist that describe methods for a proper extraction and representation of quantities in text. In this paper, we present such a comprehensive quantity extraction framework from text data. It efficiently detects combinations of values and units, the behavior of a quantity (e.g., rising or falling), and the concept a quantity is associated with. Our framework makes use of dependency parsing and a dictionary of units, and it provides for a proper normalization and standardization of detected quantities. Using a novel dataset for evaluation, we show that our open source framework outperforms other systems and -- to the best of our knowledge -- is the first to detect concepts associated with identified quantities. The code and data underlying our framework are available at https://github.com/vivkaz/CQE.
Interpretability at Scale: Identifying Causal Mechanisms in Alpaca
Wu, Zhengxuan, Geiger, Atticus, Potts, Christopher, Goodman, Noah D.
Obtaining human-interpretable explanations of large, general-purpose language models is an urgent goal for AI safety. However, it is just as important that our interpretability methods are faithful to the causal dynamics underlying model behavior and able to robustly generalize to unseen inputs. Distributed Alignment Search (DAS) [23] is a powerful gradient descent method grounded in a theory of causal abstraction that uncovered perfect alignments between interpretable symbolic algorithms and small deep learning models fine-tuned for specific tasks. In the present paper, we scale DAS significantly by replacing the remaining brute-force search steps with learned parameters - an approach we call Boundless DAS. This enables us to efficiently search for interpretable causal structure in large language models while they follow instructions. We apply Boundless DAS to the Alpaca model (7B parameters), which, off the shelf, solves a simple numerical reasoning problem. With Boundless DAS, we discover that Alpaca does this by implementing a causal model with two interpretable boolean variables. Furthermore, we find that the alignment of neural representations with these variables is robust to changes in inputs and instructions. These findings mark a first step toward deeply understanding the inner-workings of our largest and most widely deployed language models.