AITopics

Learning on Graphs has attracted immense attention due to its wide real-world applications. The most popular pipeline for learning on graphs with textual node attributes primarily relies on Graph Neural Networks (GNNs), and utilizes shallow text embedding as initial node representations, which has limitations in general knowledge and profound semantic understanding. In recent years, Large Language Models (LLMs) have been proven to possess extensive common knowledge and powerful semantic comprehension abilities that have revolutionized existing workflows to handle text data. In this paper, we aim to explore the potential of LLMs in graph machine learning, especially the node classification task, and investigate two possible pipelines: LLMs-as-Enhancers and LLMs-as-Predictors. The former leverages LLMs to enhance nodes' text attributes with their massive knowledge and then generate predictions through GNNs. The latter attempts to directly employ LLMs as standalone predictors. We conduct comprehensive and systematical studies on these two pipelines under various settings. From comprehensive empirical results, we make original observations and find new insights that open new possibilities and suggest promising directions to leverage LLMs for learning on graphs. Our codes and datasets are available at https://github.com/CurryTang/Graph-LLM.

large language model, machine learning, natural language, (19 more...)

2307.03393

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
(5 more...)

Genre: Research Report > New Finding (0.93)

Industry: Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Bhogale, Kaushal Santosh, Sundaresan, Sai, Raman, Abhigyan, Javed, Tahir, Khapra, Mitesh M., Kumar, Pratyush

Vistaar: Diverse Benchmarks and Training Sets for Indian Language ASR

Improving ASR systems is necessary to make new LLM-based use-cases accessible to people across the globe. In this paper, we focus on Indian languages, and make the case that diverse benchmarks are required to evaluate and improve ASR systems for Indian languages. To address this, we collate Vistaar as a set of 59 benchmarks across various language and domain combinations, on which we evaluate 3 publicly available ASR systems and 2 commercial systems. We also train IndicWhisper models by fine-tuning the Whisper models on publicly available training datasets across 12 Indian languages totalling to 10.7K hours. We show that IndicWhisper significantly improves on considered ASR systems on the Vistaar benchmark. Indeed, IndicWhisper has the lowest WER in 39 out of the 59 benchmarks, with an average reduction of 4.1 WER. We open-source all datasets, code and models.

benchmark, large language model, machine learning, (17 more...)

2305.15386

Country:

Europe > France > Provence-Alpes-Côte d'Azur > Bouches-du-Rhône > Marseille (0.05)
Asia > Indonesia > Bali (0.04)
Europe > Czechia > South Moravian Region > Brno (0.04)
(5 more...)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.66)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.48)

Harel, David, Marron, Assaf

The Human-or-Machine Matter: Turing-Inspired Reflections on an Everyday Issue

In his seminal paper ``Computing Machinery and Intelligence'', Alan Turing introduced the ``imitation game'' as part of exploring the concept of machine intelligence. The Turing Test has since been the subject of much analysis, debate, refinement and extension. Here we sidestep the question of whether a particular machine can be labeled intelligent, or can be said to match human capabilities in a given context. Instead, we first draw attention to the seemingly simpler question a person may ask themselves in an everyday interaction: ``Am I interacting with a human or with a machine?''. We then shift the focus from seeking a method for eliciting the answer, and, rather, reflect upon the importance and significance of this Human-or-Machine question and the use one may make of a reliable answer thereto. Whereas Turing's original test is widely considered to be more of a thought experiment, the Human-or-Machine matter as discussed here has obvious practical relevance. While it is still unclear if and when machines will be able to mimic human behavior with high fidelity in everyday contexts, we argue that near-term exploration of the issues raised here can contribute to refinement of methods for developing computerized systems, and may also lead to new insights into fundamental characteristics of human behavior.

large language model, machine learning, natural language, (19 more...)

2305.04312

Country:

Asia > Middle East > Israel (0.04)
North America > United States > Hawaii (0.04)
Asia > China (0.04)

Genre: Research Report (0.50)

Industry:

Health & Medicine (1.00)
Information Technology > Security & Privacy (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
(5 more...)

Webb, Taylor, Holyoak, Keith J., Lu, Hongjing

Emergent Analogical Reasoning in Large Language Models

The recent advent of large language models has reinvigorated debate over whether human cognitive capacities might emerge in such generic models given sufficient training data. Of particular interest is the ability of these models to reason about novel problems zero-shot, without any direct training. In human cognition, this capacity is closely tied to an ability to reason by analogy. Here, we performed a direct comparison between human reasoners and a large language model (the text-davinci-003 variant of GPT-3) on a range of analogical tasks, including a non-visual matrix reasoning task based on the rule structure of Raven's Standard Progressive Matrices. We found that GPT-3 displayed a surprisingly strong capacity for abstract pattern induction, matching or even surpassing human capabilities in most settings; preliminary tests of GPT-4 indicated even better performance. Our results indicate that large language models such as GPT-3 have acquired an emergent ability to find zero-shot solutions to a broad range of analogy problems.

large language model, machine learning, natural language, (21 more...)

2212.09196

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.27)
Europe > Austria > Vienna (0.14)
North America > United States > Hawaii (0.04)
(3 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.93)

Industry:

Government (0.93)
Health & Medicine > Therapeutic Area (0.92)
Education > Educational Setting (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

TIME - TechAug-1-2023, 11:00:00 GMT

What Socrates Can Teach Us About AI

If Socrates was the wisest person in Ancient Greece, then large language models must be the most foolish systems in the modern world. In his Apology, Plato tells the story of how Socrates's friend Chaerephon goes to visit the oracle at Delphi. Chaerephon asks the oracle whether there is anyone wiser than Socrates. The priestess responds that there isn't: Socrates is the wisest of them all. At first, Socrates seems puzzled.

bullshitter, language model, socrate, (10 more...)

TIME - Tech

Country:

Europe > Greece (0.25)
North America > United States > California (0.05)

Industry: Media > News (0.31)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.58)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.50)

The GuardianAug-1-2023, 10:40:33 GMT

Why it's time to clean up AI's carbon footprint

Technology never exists in a vacuum, and the rise of cryptocurrency in the last two or three years shows that. While plenty of people were making extraordinary amounts of money from investing in bitcoin and its competitors, there was consternation about the impact those get-rich-quick speculators had on the environment. Mining cryptocurrency was environmentally taxing. The core principle behind it was that you had to expend effort to get rich. To mint a bitcoin or another cryptocurrency, you had to first "mine" it.

carbon footprint, environmental impact, footprint, (16 more...)

The Guardian

Country:

North America > United States > California > San Francisco County > San Francisco (0.06)
Oceania > Australia (0.05)
North America > United States > New York (0.05)
(3 more...)

Genre: Research Report (0.31)

Industry: Banking & Finance > Trading (1.00)

Technology:

Information Technology > e-Commerce > Financial Technology (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.80)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.78)

Towards More Human-like AI Communication: A Review of Emergent Communication Research

Brandizzi, Nicolo'

In the initial phase of AI research following the second AI winter, the focus was on identifying new areas where AI could outperform humans, with famous examples including chess [Silver et al., 2018], Go [Silver et al., 2016], and Starcraft [Vinyals et al., 2019]. While this was a limited application to games, it set the tone for research to prioritize building AI agents with superhuman capabilities. However, over the last decade, the research community has witnessed a shift towards a human-centric approach that aims to leverage AI to aid humans in everyday tasks and relieve them of repetitive duties [Xu, 2019, Riedl, 2019, Shneiderman, 2021]. The interaction between humans and machines is a crucial aspect of human-centric AI [Mikolov et al., 2016], and it should take place in domains where humans are already familiar and require little to no training. Therefore, applications that involve niche practices, such as coding and mathematics, should be avoided in favor of language-based applications. In particular, human-machine communication should be grounded in natural language, which presents the challenge of teaching artificial agents to communicate in multiple languages. Recent advances in natural language processing (NLP) have led to the emergence of the transformer architecture [Vaswani et al., 2017], which has become the preferred approach for language-based applications, as exemplified by Language Models (LMs) such as GPT3 [Brown et al., 2020], LLaMA [Touvron et al., 2023], and Lamda [Thoppilan et al., 2022]. One of the challenges for language model architectures is their focus on predicting the next word in a sentence rather than comprehending the broader context and purpose of language usage. While humans use language as a tool for coordination and communication to thrive in a shared environment, artificial intelligence may struggle to understand the subtleties and complexities of language fully.

agent, communication, learning, (16 more...)

doi: 10.1109/ACCESS.2023.3339656

2308.02541

Country:

North America > United States > California > Los Angeles County > Long Beach (0.14)
Europe > France (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(24 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Therapeutic Area > Neurology (0.92)
Leisure & Entertainment > Games > Computer Games (0.88)
Education > Curriculum > Subject-Specific Education (0.66)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.86)

Mass-Editing Memory in a Transformer

Meng, Kevin, Sharma, Arnab Sen, Andonian, Alex, Belinkov, Yonatan, Bau, David

Recent work has shown exciting promise in updating large language models with new memories, so as to replace obsolete information or add specialized knowledge. However, this line of work is predominantly limited to updating single associations. We develop MEMIT, a method for directly updating a language model with many memories, demonstrating experimentally that it can scale up to thousands of associations for GPT-J (6B) and GPT-NeoX (20B), exceeding prior work by orders of magnitude. Our code and data are at https://memit.baulab.info.

conference paper, knowledge, language model, (16 more...)

2210.07229

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > Massachusetts > Norfolk County > Wellesley (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(2 more...)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment > Sports (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

DiactTOD: Learning Generalizable Latent Dialogue Acts for Controllable Task-Oriented Dialogue Systems

Wu, Qingyang, Gung, James, Shu, Raphael, Zhang, Yi

Dialogue act annotations are important to improve response generation quality in task-oriented dialogue systems. However, it can be challenging to use dialogue acts to control response generation in a generalizable way because different datasets and tasks may have incompatible annotations. While alternative methods that utilize latent action spaces or reinforcement learning do not require explicit annotations, they may lack interpretability or face difficulties defining task-specific rewards. In this work, we present a novel end-to-end latent dialogue act model (DiactTOD) that represents dialogue acts in a latent space. DiactTOD, when pre-trained on a large corpus, is able to predict and control dialogue acts to generate controllable responses using these latent representations in a zero-shot fashion. Our approach demonstrates state-of-the-art performance across a wide range of experimental settings on the MultiWOZ dataset, including zero-shot, few-shot, and full data fine-tuning with both end-to-end and policy optimization configurations.

large language model, machine learning, natural language, (19 more...)

2308.00878

Country:

Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
North America > United States > Washington > King County > Seattle (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
(2 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.70)

Lai, Viet Dac, Van Nguyen, Chien, Ngo, Nghia Trung, Nguyen, Thuat, Dernoncourt, Franck, Rossi, Ryan A., Nguyen, Thien Huu

Okapi: Instruction-tuned Large Language Models in Multiple Languages with Reinforcement Learning from Human Feedback

A key technology for the development of large language models (LLMs) involves instruction tuning that helps align the models' responses with human expectations to realize impressive learning abilities. Two major approaches for instruction tuning characterize supervised fine-tuning (SFT) and reinforcement learning from human feedback (RLHF), which are currently applied to produce the best commercial LLMs (e.g., ChatGPT). To improve the accessibility of LLMs for research and development efforts, various instruction-tuned open-source LLMs have also been introduced recently, e.g., Alpaca, Vicuna, to name a few. However, existing open-source LLMs have only been instruction-tuned for English and a few popular languages, thus hindering their impacts and accessibility to many other languages in the world. Among a few very recent work to explore instruction tuning for LLMs in multiple languages, SFT has been used as the only approach to instruction-tune LLMs for multiple languages. This has left a significant gap for fine-tuned LLMs based on RLHF in diverse languages and raised important questions on how RLHF can boost the performance of multilingual instruction tuning. To overcome this issue, we present Okapi, the first system with instruction-tuned LLMs based on RLHF for multiple languages. Okapi introduces instruction and response-ranked data in 26 diverse languages to facilitate the experiments and development of future multilingual LLM research. We also present benchmark datasets to enable the evaluation of generative LLMs in multiple languages. Our experiments demonstrate the advantages of RLHF for multilingual instruction over SFT for different base models and datasets. Our framework and resources are released at https://github.com/nlp-uoregon/Okapi.

large language model, machine learning, natural language, (20 more...)

2307.16039

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > Dominican Republic (0.04)
Asia > China > Hong Kong (0.04)
(7 more...)

Genre: Research Report (0.64)

Industry: Education > Curriculum > Subject-Specific Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)