Large Language Model
Exploring the Potential of Large Language Models (LLMs) in Learning on Graphs
Chen, Zhikai, Mao, Haitao, Li, Hang, Jin, Wei, Wen, Hongzhi, Wei, Xiaochi, Wang, Shuaiqiang, Yin, Dawei, Fan, Wenqi, Liu, Hui, Tang, Jiliang
Learning on Graphs has attracted immense attention due to its wide real-world applications. The most popular pipeline for learning on graphs with textual node attributes primarily relies on Graph Neural Networks (GNNs), and utilizes shallow text embedding as initial node representations, which has limitations in general knowledge and profound semantic understanding. In recent years, Large Language Models (LLMs) have been proven to possess extensive common knowledge and powerful semantic comprehension abilities that have revolutionized existing workflows to handle text data. In this paper, we aim to explore the potential of LLMs in graph machine learning, especially the node classification task, and investigate two possible pipelines: LLMs-as-Enhancers and LLMs-as-Predictors. The former leverages LLMs to enhance nodes' text attributes with their massive knowledge and then generate predictions through GNNs. The latter attempts to directly employ LLMs as standalone predictors. We conduct comprehensive and systematical studies on these two pipelines under various settings. From comprehensive empirical results, we make original observations and find new insights that open new possibilities and suggest promising directions to leverage LLMs for learning on graphs. Our codes and datasets are available at https://github.com/CurryTang/Graph-LLM.
Vistaar: Diverse Benchmarks and Training Sets for Indian Language ASR
Bhogale, Kaushal Santosh, Sundaresan, Sai, Raman, Abhigyan, Javed, Tahir, Khapra, Mitesh M., Kumar, Pratyush
Improving ASR systems is necessary to make new LLM-based use-cases accessible to people across the globe. In this paper, we focus on Indian languages, and make the case that diverse benchmarks are required to evaluate and improve ASR systems for Indian languages. To address this, we collate Vistaar as a set of 59 benchmarks across various language and domain combinations, on which we evaluate 3 publicly available ASR systems and 2 commercial systems. We also train IndicWhisper models by fine-tuning the Whisper models on publicly available training datasets across 12 Indian languages totalling to 10.7K hours. We show that IndicWhisper significantly improves on considered ASR systems on the Vistaar benchmark. Indeed, IndicWhisper has the lowest WER in 39 out of the 59 benchmarks, with an average reduction of 4.1 WER. We open-source all datasets, code and models.
The Human-or-Machine Matter: Turing-Inspired Reflections on an Everyday Issue
In his seminal paper ``Computing Machinery and Intelligence'', Alan Turing introduced the ``imitation game'' as part of exploring the concept of machine intelligence. The Turing Test has since been the subject of much analysis, debate, refinement and extension. Here we sidestep the question of whether a particular machine can be labeled intelligent, or can be said to match human capabilities in a given context. Instead, we first draw attention to the seemingly simpler question a person may ask themselves in an everyday interaction: ``Am I interacting with a human or with a machine?''. We then shift the focus from seeking a method for eliciting the answer, and, rather, reflect upon the importance and significance of this Human-or-Machine question and the use one may make of a reliable answer thereto. Whereas Turing's original test is widely considered to be more of a thought experiment, the Human-or-Machine matter as discussed here has obvious practical relevance. While it is still unclear if and when machines will be able to mimic human behavior with high fidelity in everyday contexts, we argue that near-term exploration of the issues raised here can contribute to refinement of methods for developing computerized systems, and may also lead to new insights into fundamental characteristics of human behavior.
Emergent Analogical Reasoning in Large Language Models
Webb, Taylor, Holyoak, Keith J., Lu, Hongjing
The recent advent of large language models has reinvigorated debate over whether human cognitive capacities might emerge in such generic models given sufficient training data. Of particular interest is the ability of these models to reason about novel problems zero-shot, without any direct training. In human cognition, this capacity is closely tied to an ability to reason by analogy. Here, we performed a direct comparison between human reasoners and a large language model (the text-davinci-003 variant of GPT-3) on a range of analogical tasks, including a non-visual matrix reasoning task based on the rule structure of Raven's Standard Progressive Matrices. We found that GPT-3 displayed a surprisingly strong capacity for abstract pattern induction, matching or even surpassing human capabilities in most settings; preliminary tests of GPT-4 indicated even better performance. Our results indicate that large language models such as GPT-3 have acquired an emergent ability to find zero-shot solutions to a broad range of analogy problems.
What Socrates Can Teach Us About AI
If Socrates was the wisest person in Ancient Greece, then large language models must be the most foolish systems in the modern world. In his Apology, Plato tells the story of how Socrates's friend Chaerephon goes to visit the oracle at Delphi. Chaerephon asks the oracle whether there is anyone wiser than Socrates. The priestess responds that there isn't: Socrates is the wisest of them all. At first, Socrates seems puzzled.
Why it's time to clean up AI's carbon footprint
Technology never exists in a vacuum, and the rise of cryptocurrency in the last two or three years shows that. While plenty of people were making extraordinary amounts of money from investing in bitcoin and its competitors, there was consternation about the impact those get-rich-quick speculators had on the environment. Mining cryptocurrency was environmentally taxing. The core principle behind it was that you had to expend effort to get rich. To mint a bitcoin or another cryptocurrency, you had to first "mine" it.
Towards More Human-like AI Communication: A Review of Emergent Communication Research
In the initial phase of AI research following the second AI winter, the focus was on identifying new areas where AI could outperform humans, with famous examples including chess [Silver et al., 2018], Go [Silver et al., 2016], and Starcraft [Vinyals et al., 2019]. While this was a limited application to games, it set the tone for research to prioritize building AI agents with superhuman capabilities. However, over the last decade, the research community has witnessed a shift towards a human-centric approach that aims to leverage AI to aid humans in everyday tasks and relieve them of repetitive duties [Xu, 2019, Riedl, 2019, Shneiderman, 2021]. The interaction between humans and machines is a crucial aspect of human-centric AI [Mikolov et al., 2016], and it should take place in domains where humans are already familiar and require little to no training. Therefore, applications that involve niche practices, such as coding and mathematics, should be avoided in favor of language-based applications. In particular, human-machine communication should be grounded in natural language, which presents the challenge of teaching artificial agents to communicate in multiple languages. Recent advances in natural language processing (NLP) have led to the emergence of the transformer architecture [Vaswani et al., 2017], which has become the preferred approach for language-based applications, as exemplified by Language Models (LMs) such as GPT3 [Brown et al., 2020], LLaMA [Touvron et al., 2023], and Lamda [Thoppilan et al., 2022]. One of the challenges for language model architectures is their focus on predicting the next word in a sentence rather than comprehending the broader context and purpose of language usage. While humans use language as a tool for coordination and communication to thrive in a shared environment, artificial intelligence may struggle to understand the subtleties and complexities of language fully.
Mass-Editing Memory in a Transformer
Meng, Kevin, Sharma, Arnab Sen, Andonian, Alex, Belinkov, Yonatan, Bau, David
Recent work has shown exciting promise in updating large language models with new memories, so as to replace obsolete information or add specialized knowledge. However, this line of work is predominantly limited to updating single associations. We develop MEMIT, a method for directly updating a language model with many memories, demonstrating experimentally that it can scale up to thousands of associations for GPT-J (6B) and GPT-NeoX (20B), exceeding prior work by orders of magnitude. Our code and data are at https://memit.baulab.info.
DiactTOD: Learning Generalizable Latent Dialogue Acts for Controllable Task-Oriented Dialogue Systems
Wu, Qingyang, Gung, James, Shu, Raphael, Zhang, Yi
Dialogue act annotations are important to improve response generation quality in task-oriented dialogue systems. However, it can be challenging to use dialogue acts to control response generation in a generalizable way because different datasets and tasks may have incompatible annotations. While alternative methods that utilize latent action spaces or reinforcement learning do not require explicit annotations, they may lack interpretability or face difficulties defining task-specific rewards. In this work, we present a novel end-to-end latent dialogue act model (DiactTOD) that represents dialogue acts in a latent space. DiactTOD, when pre-trained on a large corpus, is able to predict and control dialogue acts to generate controllable responses using these latent representations in a zero-shot fashion. Our approach demonstrates state-of-the-art performance across a wide range of experimental settings on the MultiWOZ dataset, including zero-shot, few-shot, and full data fine-tuning with both end-to-end and policy optimization configurations.
Okapi: Instruction-tuned Large Language Models in Multiple Languages with Reinforcement Learning from Human Feedback
Lai, Viet Dac, Van Nguyen, Chien, Ngo, Nghia Trung, Nguyen, Thuat, Dernoncourt, Franck, Rossi, Ryan A., Nguyen, Thien Huu
A key technology for the development of large language models (LLMs) involves instruction tuning that helps align the models' responses with human expectations to realize impressive learning abilities. Two major approaches for instruction tuning characterize supervised fine-tuning (SFT) and reinforcement learning from human feedback (RLHF), which are currently applied to produce the best commercial LLMs (e.g., ChatGPT). To improve the accessibility of LLMs for research and development efforts, various instruction-tuned open-source LLMs have also been introduced recently, e.g., Alpaca, Vicuna, to name a few. However, existing open-source LLMs have only been instruction-tuned for English and a few popular languages, thus hindering their impacts and accessibility to many other languages in the world. Among a few very recent work to explore instruction tuning for LLMs in multiple languages, SFT has been used as the only approach to instruction-tune LLMs for multiple languages. This has left a significant gap for fine-tuned LLMs based on RLHF in diverse languages and raised important questions on how RLHF can boost the performance of multilingual instruction tuning. To overcome this issue, we present Okapi, the first system with instruction-tuned LLMs based on RLHF for multiple languages. Okapi introduces instruction and response-ranked data in 26 diverse languages to facilitate the experiments and development of future multilingual LLM research. We also present benchmark datasets to enable the evaluation of generative LLMs in multiple languages. Our experiments demonstrate the advantages of RLHF for multilingual instruction over SFT for different base models and datasets. Our framework and resources are released at https://github.com/nlp-uoregon/Okapi.