AITopics | Witbrock, Michael

Plotting

Witbrock, Michael

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Enhancing Logical Reasoning of Large Language Models through Logic-Driven Data Augmentation

Bao, Qiming, Peng, Alex Yuxuan, Deng, Zhenyun, Zhong, Wanjun, Gendron, Gael, Pistotti, Timothy, Tan, Neset, Young, Nathan, Chen, Yang, Zhu, Yonghua, Denny, Paul, Witbrock, Michael, Liu, Jiamou

arXiv.org Artificial IntelligenceOct-14-2023

Combining large language models with logical reasoning enhance their capacity to address problems in a robust and reliable manner. Nevertheless, the intricate nature of logical reasoning poses challenges to gathering reliable data from web for building comprehensive training datasets, subsequently affecting the performance on downstream tasks. To address this, we introduce a novel logic-driven data augmentation approach, AMR-LDA. AMR-LDA converts the original text into an Abstract Meaning Representation (AMR) graph, a structured semantic representation that encapsulates the logic structure of the sentence, upon which operations are performed to generate logically modified AMR graphs. The modified AMR graphs are subsequently converted back into texts to create augmented data. Notably, our methodology is architecture-agnostic and enhances generative large language models, such as GPT-3.5 and GPT-4, through prompt augmentation, and fine-tuning discriminative large language models through contrastive learning with logic-driven data augmentation. Empirical evidence underscores the efficacy of our proposed method with improvement in performance across seven downstream tasks, such as logical reasoning reading comprehension, textual entailment, and natural language inference. Furthermore, our method ranked first on the ReClor leaderboard \url{https://eval.ai/web/challenges/challenge-page/503/leaderboard/1347}. The source code and data are publicly available \url{https://github.com/Strong-AI-Lab/Logical-Equivalence-driven-AMR-Data-Augmentation-for-Representation-Learning}.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2305.12599

Country:

North America > United States > Texas (0.14)
North America > United States > Louisiana (0.14)
Europe > Spain > Canary Islands (0.14)

Genre: Research Report (1.00)

Industry: Education > Assessment & Standards (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Add feedback

Answering Unseen Questions With Smaller Language Models Using Rationale Generation and Dense Retrieval

Hartill, Tim, Benavides-Prado, Diana, Witbrock, Michael, Riddle, Patricia J.

arXiv.org Artificial IntelligenceOct-12-2023

When provided with sufficient explanatory context, smaller Language Models have been shown to exhibit strong reasoning ability on challenging short-answer question-answering tasks where the questions are unseen in training. We evaluate two methods for further improvement in this setting. Both methods focus on combining rationales generated by a larger Language Model with longer contexts created from a multi-hop dense retrieval system. The first method ($\textit{RR}$) involves training a Rationale Ranking model to score both generated rationales and retrieved contexts with respect to relevance and truthfulness. We then use the scores to derive combined contexts from both knowledge sources using a number of combinatory strategies. For the second method ($\textit{RATD}$) we utilise retrieval-augmented training datasets developed by Hartill et al. 2023 to train a smaller Reasoning model such that it becomes proficient at utilising relevant information from longer text sequences that may be only partially evidential and frequently contain many irrelevant sentences. We find that both methods significantly improve results. Our single best Reasoning model materially improves upon strong comparable prior baselines for unseen evaluation datasets (StrategyQA 58.9 $\rightarrow$ 61.7 acc., CommonsenseQA 63.6 $\rightarrow$ 72.7 acc., ARC-DA 31.6 $\rightarrow$ 52.1 F1, IIRC 25.5 $\rightarrow$ 27.3 F1) and a version utilising our prior knowledge of each type of question in selecting a context combination strategy does even better. Our proposed models also generally outperform direct prompts against much larger models (BLOOM 175B and StableVicuna 13B) in both few-shot chain-of-thought and standard few-shot settings.

computational linguistic, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2308.04711

Country:

Europe (0.67)
North America > United States > Texas > Harris County > Houston (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report (1.00)

Industry:

Leisure & Entertainment (0.67)
Health & Medicine (0.47)
Government (0.46)
Energy (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.97)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.93)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.67)

Add feedback

Disentanglement of Latent Representations via Causal Interventions

Gendron, Gaël, Witbrock, Michael, Dobbie, Gillian

arXiv.org Artificial IntelligenceSep-22-2023

The process of generating data such as images is controlled by independent and unknown factors of variation. The retrieval of these variables has been studied extensively in the disentanglement, causal representation learning, and independent component analysis fields. Recently, approaches merging these domains together have shown great success. Instead of directly representing the factors of variation, the problem of disentanglement can be seen as finding the interventions on one image that yield a change to a single factor. Following this assumption, we introduce a new method for disentanglement inspired by causal dynamics that combines causality theory with vector-quantized variational autoencoders. Our model considers the quantized vectors as causal variables and links them in a causal graph. It performs causal interventions on the graph and generates atomic transitions affecting a unique factor of variation in the image. We also introduce a new task of action retrieval that consists of finding the action responsible for the transition between two images. We test our method on standard synthetic and real-world disentanglement datasets. We show that it can effectively disentangle the factors of variation and perform precise interventions on high-level semantic attributes of an image without affecting its quality, even with imbalanced data distributions.

artificial intelligence, machine learning, variation, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.24963/ijcai.2023/361

2302.00869

Country:

North America > Canada (0.94)
Europe (0.68)
North America > United States > California > Los Angeles County > Long Beach (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Teaching Smaller Language Models To Generalise To Unseen Compositional Questions

Hartill, Tim, Tan, Neset, Witbrock, Michael, Riddle, Patricia J.

arXiv.org Artificial IntelligenceAug-20-2023

We equip a smaller Language Model to generalise to answering challenging compositional questions that have not been seen in training. To do so we propose a combination of multitask supervised pretraining on up to 93 tasks designed to instill diverse reasoning abilities, and a dense retrieval system that aims to retrieve a set of evidential paragraph fragments. Recent progress in question-answering has been achieved either through prompting methods against very large pretrained Language Models in zero or few-shot fashion, or by fine-tuning smaller models, sometimes in conjunction with information retrieval. We focus on the less explored question of the extent to which zero-shot generalisation can be enabled in smaller models with retrieval against a corpus within which sufficient information to answer a particular question may not exist. We establish strong baselines in this setting for diverse evaluation datasets (StrategyQA, CommonsenseQA, IIRC, DROP, Musique and ARC-DA), and show that performance can be significantly improved by adding retrieval-augmented training datasets which are designed to expose our models to a variety of heuristic reasoning strategies such as weighing partial evidence or ignoring an irrelevant context.

computational linguistic, information retrieval, question answering, (15 more...)

arXiv.org Artificial Intelligence

2308.00946

Country:

Europe (1.00)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre:

Research Report > New Finding (0.67)
Research Report > Experimental Study (0.46)

Industry:

Education (0.46)
Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.67)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.66)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.49)
(2 more...)

Add feedback

Neuromodulation Gated Transformer

Knowles, Kobe, Bensemann, Joshua, Benavides-Prado, Diana, Yogarajan, Vithya, Witbrock, Michael, Dobbie, Gillian, Chen, Yang

arXiv.org Artificial IntelligenceMay-11-2023

We introduce a novel architecture, the Neuromodulation Gated Transformer (NGT), which implements neuromodulation in transformers via a multiplicative effect. We compare it to baselines and show that it results in the best average performance on the SuperGLUE benchmark validation sets. Cellular neuromodulation is a biological mechanism involving neurons, where their intrinsic properties are continuously modified in a context-dependent manner according to stimuli, i.e., biochemicals called neuromodulators (Bargmann & Marder, 2013; Marder et al., 2014; Shine et al., 2021; Vecoven et al., 2020); it allows for the regulation of a population of neurons (Katz & Edwards, 1999). It has achieved notable success in the continual learning domain (Beaulieu et al., 2020; Ellefsen et al., 2015; Velez & Clune, 2017). Transformers (Vaswani et al., 2017) are architectures that eliminate recurrence by relying entirely on attention.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2305.03232

Country: North America > United States > Minnesota (0.29)

Genre: Research Report (0.65)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

Adversarial Inverse Reinforcement Learning for Mean Field Games

Chen, Yang, Zhang, Libo, Liu, Jiamou, Witbrock, Michael

arXiv.org Artificial IntelligenceApr-17-2023

Mean field games (MFGs) provide a mathematically tractable framework for modelling large-scale multi-agent systems by leveraging mean field theory to simplify interactions among agents. It enables applying inverse reinforcement learning (IRL) to predict behaviours of large populations by recovering reward signals from demonstrated behaviours. However, existing IRL methods for MFGs are powerless to reason about uncertainties in demonstrated behaviours of individual agents. This paper proposes a novel framework, Mean-Field Adversarial IRL (MF-AIRL), which is capable of tackling uncertainties in demonstrations. We build MF-AIRL upon maximum entropy IRL and a new equilibrium concept. We evaluate our approach on simulated tasks with imperfect demonstrations. Experimental results demonstrate the superiority of MF-AIRL over existing methods in reward recovery.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

2104.14654

Genre: Research Report > New Finding (0.34)

Industry: Health & Medicine > Therapeutic Area > Immunology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Input-length-shortening and text generation via attention values

Tan, Neşet Özkan, Peng, Alex Yuxuan, Bensemann, Joshua, Bao, Qiming, Hartill, Tim, Gahegan, Mark, Witbrock, Michael

arXiv.org Artificial IntelligenceMar-13-2023

Identifying words that impact a task's performance more than others is a challenge in natural language processing. Transformers models have recently addressed this issue by incorporating an attention mechanism that assigns greater attention (i.e., relevance) scores to some words than others. Because of the attention mechanism's high computational cost, transformer models usually have an input-length limitation caused by hardware constraints. This limitation applies to many transformers, including the well-known bidirectional encoder representations of the transformer (BERT) model. In this paper, we examined BERT's attention assignment mechanism, focusing on two questions: (1) How can attention be employed to reduce input length? (2) How can attention be used as a control mechanism for conditional text generation? We investigated these questions in the context of a text classification task. We discovered that BERT's early layers assign more critical attention scores for text classification tasks compared to later layers. We demonstrated that the first layer's attention sums could be used to filter tokens in a given sequence, considerably decreasing the input length while maintaining good test accuracy. We also applied filtering, which uses a compute-efficient semantic similarities algorithm, and discovered that retaining approximately 6\% of the original sequence is sufficient to obtain 86.5\% accuracy. Finally, we showed that we could generate data in a stable manner and indistinguishable from the original one by only using a small percentage (10\%) of the tokens with high attention scores according to BERT's first layer.

machine learning, natural language, text classification, (20 more...)

arXiv.org Artificial Intelligence

2303.07585

Country:

Oceania > New Zealand (0.17)
North America > United States > Louisiana (0.14)

Genre: Research Report > New Finding (0.93)

Industry:

Media > Film (0.46)
Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)

Add feedback

Learning Density-Based Correlated Equilibria for Markov Games

Zhang, Libo, Chen, Yang, Takisaka, Toru, Khoussainov, Bakh, Witbrock, Michael, Liu, Jiamou

arXiv.org Artificial IntelligenceFeb-15-2023

Correlated Equilibrium (CE) is a well-established solution concept that captures coordination among agents and enjoys good algorithmic properties. In real-world multi-agent systems, in addition to being in an equilibrium, agents' policies are often expected to meet requirements with respect to safety, and fairness. Such additional requirements can often be expressed in terms of the state density which measures the state-visitation frequencies during the course of a game. However, existing CE notions or CE-finding approaches cannot explicitly specify a CE with particular properties concerning state density; they do so implicitly by either modifying reward functions or using value functions as the selection criteria. The resulting CE may thus not fully fulfil the state-density requirements. In this paper, we propose Density-Based Correlated Equilibria (DBCE), a new notion of CE that explicitly takes state density as selection criterion. Concretely, we instantiate DBCE by specifying different state-density requirements motivated by real-world applications. To compute DBCE, we put forward the Density Based Correlated Policy Iteration algorithm for the underlying control problem. We perform experiments on various games where results demonstrate the advantage of our CE-finding approach over existing methods in scenarios with state-density concerns.

agent, artificial intelligence, requirement, (16 more...)

arXiv.org Artificial Intelligence

2302.08001

Genre: Research Report (0.70)

Industry: Leisure & Entertainment > Games (0.68)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)

Add feedback

Rapid Connectionist Speaker Adaptation

Witbrock, Michael, Haffner, Patrick

arXiv.org Artificial IntelligenceNov-14-2022

We present SVCnet, a system for modelling speaker variability. Encoder Neural Networks specialized for each speech sound produce low dimensionality models of acoustical variation, and these models are further combined into an overall model of voice variability. A training procedure is described which minimizes the dependence of this model on which sounds have been uttered. Using the trained model (SVCnet) and a brief, unconstrained sample of a new speaker's voice, the system produces a Speaker Voice Code that can be used to adapt a recognition system to the new speaker without retraining. A system which combines SVCnet with an MS-TDNN recognizer is described

artificial intelligence, machine learning, variation, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/ICASSP.1992.225874

2211.08978

Country: North America > United States (0.69)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.51)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.31)

Add feedback

Semantic Construction Grammar: Bridging the NL / Logic Divide

Schneider, Dave, Witbrock, Michael

arXiv.org Artificial IntelligenceDec-9-2021

In this paper, we discuss Semantic Construction Grammar (SCG), a system developed over the past several years to facilitate translation between natural language and logical representations. Crucially, SCG is designed to support a variety of different methods of representation, ranging from those that are fairly close to the NL structure (e.g. so-called 'logical forms'), to those that are quite different from the NL structure, with higher-order and high-arity relations. Semantic constraints and checks on representations are integral to the process of NL understanding with SCG, and are easily carried out due to the SCG's integration with Cyc's Knowledge Base and inference engine.

artificial intelligence, expert system, natural language, (19 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/2740908.2741710

2112.05256

Country: North America > United States (0.47)

Genre:

Research Report (0.50)
Workflow (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (0.70)
Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (0.69)

Add feedback