AITopics

2312.06037

Country:

North America > United States > Georgia > Clarke County > Athens (0.14)
Europe > Switzerland (0.04)
North America > United States > Virginia (0.04)
(14 more...)

Genre:

Research Report (1.00)
Overview (1.00)
Instructional Material > Course Syllabus & Notes (1.00)

Industry:

Media (1.00)
Leisure & Entertainment (1.00)
Information Technology > Security & Privacy (1.00)
(6 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
(4 more...)

Ferry, Quentin RV., Ching, Joshua, Kawai, Takashi

Emergence and Function of Abstract Representations in Self-Supervised Transformers

arXiv.org Artificial IntelligenceDec-8-2023

Human intelligence relies in part on our brains' ability to create abstract mental models that succinctly capture the hidden blueprint of our reality. Such abstract world models notably allow us to rapidly navigate novel situations by generalizing prior knowledge, a trait deep learning systems have historically struggled to replicate. However, the recent shift from supervised to self-supervised objectives, combined with expressive transformer-based architectures, have yielded powerful foundation models that appear to learn versatile representations that can support a wide range of downstream tasks. This promising development raises the intriguing possibility of such models developing in silico abstract world models. We test this hypothesis by studying the inner workings of small-scale transformers trained to reconstruct partially masked visual scenes generated from a simple blueprint. We show that the network develops intermediate abstract representations, or abstractions, that encode all semantic features of the dataset. These abstractions manifest as low-dimensional manifolds where the embeddings of semantically related tokens transiently converge, thus allowing for the generalization of downstream computations. Using precise manipulation experiments, we demonstrate that abstractions are central to the network's decision-making process. Our research also suggests that these abstractions are compositionally structured, exhibiting features like contextual independence and part-whole relationships that mirror the compositional nature of the dataset. Finally, we introduce a Language-Enhanced Architecture (LEA) designed to encourage the network to articulate its computations. We find that LEA develops an abstraction-centric language that can be easily interpreted, allowing us to more readily access and steer the network's decision-making process.

abstraction, representation, self-supervised transformer, (12 more...)

2312.05361

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)

Hu, Zhiting, Shu, Tianmin

Language Models, Agent Models, and World Models: The LAW for Machine Reasoning and Planning

arXiv.org Artificial IntelligenceDec-8-2023

Despite their tremendous success in many applications, large language models often fall short of consistent reasoning and planning in various (language, embodied, and social) scenarios, due to inherent limitations in their inference, learning, and modeling capabilities. In this position paper, we present a new perspective of machine reasoning, LAW, that connects the concepts of Language models, Agent models, and World models, for more robust and versatile reasoning capabilities. In particular, we propose that world and agent models are a better abstraction of reasoning, that introduces the crucial elements of deliberate human-like reasoning, including beliefs about the world and other agents, anticipation of consequences, goals/rewards, and strategic planning. Crucially, language models in LAW serve as a backend to implement the system or its elements and hence provide the computational power and adaptability. We review the recent studies that have made relevant progress and discuss future research directions towards operationalizing the LAW framework.

arxiv preprint arxiv, reasoning, world model, (13 more...)

2312.0523

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Italy > Piedmont > Turin Province > Turin (0.04)
Europe > Germany > Berlin (0.04)

Genre: Research Report (1.00)

Industry:

Information Technology (0.67)
Leisure & Entertainment > Games (0.46)
Education (0.46)
Health & Medicine > Therapeutic Area > Neurology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)

Hubert, Nicolas, Monnin, Pierre, Paulheim, Heiko

Beyond Transduction: A Survey on Inductive, Few Shot, and Zero Shot Link Prediction in Knowledge Graphs

arXiv.org Artificial IntelligenceDec-8-2023

Knowledge graphs (KGs) comprise entities interconnected by relations of different semantic meanings. KGs are being used in a wide range of applications. However, they inherently suffer from incompleteness, i.e. entities or facts about entities are missing. Consequently, a larger body of works focuses on the completion of missing information in KGs, which is commonly referred to as link prediction (LP). This task has traditionally and extensively been studied in the transductive setting, where all entities and relations in the testing set are observed during training. Recently, several works have tackled the LP task under more challenging settings, where entities and relations in the test set may be unobserved during training, or appear in only a few facts. These works are known as inductive, few-shot, and zero-shot link prediction. In this work, we conduct a systematic review of existing works in this area. A thorough analysis leads us to point out the undesirable existence of diverging terminologies and task definitions for the aforementioned settings, which further limits the possibility of comparison between recent works. We consequently aim at dissecting each setting thoroughly, attempting to reveal its intrinsic characteristics. A unifying nomenclature is ultimately proposed to refer to each of them in a simple and consistent manner.

link prediction, prediction, relation, (12 more...)

2312.04997

Country:

Europe > Austria > Vienna (0.14)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.14)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
(42 more...)

Genre:

Overview (1.00)
Research Report (0.81)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.72)
(4 more...)

arXiv.org Machine LearningDec-8-2023

Construction of Hierarchical Neural Architecture Search Spaces based on Context-free Grammars

Schrodi, Simon, Stoll, Danny, Ru, Binxin, Sukthanker, Rhea, Brox, Thomas, Hutter, Frank

The discovery of neural architectures from simple building blocks is a long-standing goal of Neural Architecture Search (NAS). Hierarchical search spaces are a promising step towards this goal but lack a unifying search space design framework and typically only search over some limited aspect of architectures. In this work, we introduce a unifying search space design framework based on context-free grammars that can naturally and compactly generate expressive hierarchical search spaces that are 100s of orders of magnitude larger than common spaces from the literature. By enhancing and using their properties, we effectively enable search over the complete architecture and can foster regularity. Further, we propose an efficient hierarchical kernel design for a Bayesian Optimization search strategy to efficiently search over such huge spaces. We demonstrate the versatility of our search space design framework and show that our search strategy can be superior to existing NAS approaches. Code is available at https://github.com/automl/hierarchical_nas_construction.

machine learning, natural language, sequential3, (19 more...)

arXiv.org Machine Learning

2211.01842

Country:

North America > Canada > Ontario > Toronto (0.04)
Europe > Germany > Baden-Württemberg > Freiburg (0.04)
Oceania > Australia > New South Wales (0.04)
(4 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Stumme, Gerd, Dürrschnabel, Dominik, Hanika, Tom

Towards Ordinal Data Science

arXiv.org Artificial IntelligenceDec-6-2023

Order is one of the main instruments to measure the relationship between objects in (empirical) data. However, compared to methods that use numerical properties of objects, the amount of ordinal methods developed is rather small. One reason for this is the limited availability of computational resources in the last century that would have been required for ordinal computations. Another reason -- particularly important for this line of research -- is that order-based methods are often seen as too mathematically rigorous for applying them to real-world data. In this paper, we will therefore discuss different means for measuring and 'calculating' with ordinal structures -- a specific class of directed graphs -- and show how to infer knowledge from them. Our aim is to establish Ordinal Data Science as a fundamentally new research agenda. Besides cross-fertilization with other cornerstone machine learning and knowledge representation methods, a broad range of disciplines will benefit from this endeavor, including, psychology, sociology, economics, web science, knowledge engineering, scientometrics.

lattice, relation, stumme, (16 more...)

doi: 10.4230/TGDK.1.1.6

2307.09477

Country:

North America > Canada > Ontario > Toronto (0.14)
Europe > Hungary > Budapest > Budapest (0.04)
Europe > Germany > Saxony > Leipzig (0.04)
(35 more...)

Genre: Research Report > New Finding (0.93)

Industry:

Transportation > Passenger (1.00)
Transportation > Air (1.00)
Information Technology > Services (1.00)
(5 more...)

Technology:

Information Technology > Knowledge Management > Knowledge Engineering (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications > Social Media (1.00)
(5 more...)

Shaj, Vaisakh, Zadeh, Saleh Gholam, Demir, Ozan, Douat, Luiz Ricardo, Neumann, Gerhard

Multi Time Scale World Models

Intelligent agents use internal world models to reason and make predictions about different courses of their actions at many scales [22]. Devising learning paradigms and architectures that allow machines to learn world models that operate at multiple levels of temporal abstractions while dealing with complex uncertainty predictions is a major technical hurdle [18]. In this work, we propose a probabilistic formalism to learn multi-time scale world models which we call the Multi Time Scale State Space (MTS3) model. Our model uses a computationally efficient inference scheme on multiple time scales for highly accurate long-horizon predictions and uncertainty estimates over several seconds into the future. Our experiments, which focus on action conditional long horizon future predictions, show that MTS3 outperforms recent methods on several system identification benchmarks including complex simulated and real-world dynamical systems. Code is available at this repository: https://github.com/ALRhub/MTS3.

diag, prediction, trajectory, (14 more...)

2310.18534

Country:

Europe > Germany > Baden-Württemberg > Karlsruhe Region > Karlsruhe (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Suglia, Alessandro, Konstas, Ioannis, Lemon, Oliver

Visually Grounded Language Learning: a review of language games, datasets, tasks, and models

In recent years, several machine learning models have been proposed. They are trained with a language modelling objective on large-scale text-only data. With such pretraining, they can achieve impressive results on many Natural Language Understanding and Generation tasks. However, many facets of meaning cannot be learned by ``listening to the radio" only. In the literature, many Vision+Language (V+L) tasks have been defined with the aim of creating models that can ground symbols in the visual modality. In this work, we provide a systematic literature review of several tasks and models proposed in the V+L field. We rely on Wittgenstein's idea of `language games' to categorise such tasks into 3 different families: 1) discriminative games, 2) generative games, and 3) interactive games. Our analysis of the literature provides evidence that future work should be focusing on interactive games where communication in Natural Language is important to resolve ambiguities about object referents and action plans and that physical embodiment is essential to understand the semantics of situations and events. Overall, these represent key requirements for developing grounded meanings in neural models.

agent, proceedings, representation, (9 more...)

2312.02431

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
Europe > United Kingdom > Scotland > City of Edinburgh > Edinburgh (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
(7 more...)

Genre:

Research Report (1.00)
Overview (1.00)
Workflow (0.92)

Industry:

Leisure & Entertainment > Games > Computer Games (1.00)
Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(4 more...)

Zhang, Xingyuan, Becker-Ehmck, Philip, van der Smagt, Patrick, Karl, Maximilian

Action Inference by Maximising Evidence: Zero-Shot Imitation from Observation with World Models

Unlike most reinforcement learning agents which require an unrealistic amount of environment interactions to learn a new behaviour, humans excel at learning quickly by merely observing and imitating others. This ability highly depends on the fact that humans have a model of their own embodiment that allows them to infer the most likely actions that led to the observed behaviour. In this paper, we propose Action Inference by Maximising Evidence (AIME) to replicate this behaviour using world models. AIME consists of two distinct phases. In the first phase, the agent learns a world model from its past experience to understand its own body by maximising the ELBO. While in the second phase, the agent is given some observation-only demonstrations of an expert performing a novel task and tries to imitate the expert's behaviour. AIME achieves this by defining a policy as an inference model and maximising the evidence of the demonstration under the policy and world model. Our method is "zero-shot" in the sense that it does not require further training for the world model or online interactions with the environment after given the demonstration. We empirically validate the zero-shot imitation performance of our method on the Walker and Cheetah embodiment of the DeepMind Control Suite and find it outperforms the state-of-the-art baselines. Code is available at: https://github.com/argmax-ai/aime.

dataset, embodiment, world model, (15 more...)

2312.02019

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > Canada > Alberta > Census Division No. 15 > Improvement District No. 9 > Banff (0.04)
Europe > Hungary > Budapest > Budapest (0.04)
(2 more...)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
(2 more...)

Maximising Quantum-Computing Expressive Power through Randomised Circuits

Yang, Yingli, Zhang, Zongkang, Wang, Anbang, Xu, Xiaosi, Wang, Xiaoting, Li, Ying

In the noisy intermediate-scale quantum era, variational quantum algorithms (VQAs) have emerged as a promising avenue to obtain quantum advantage. However, the success of VQAs depends on the expressive power of parameterised quantum circuits, which is constrained by the limited gate number and the presence of barren plateaus. In this work, we propose and numerically demonstrate a novel approach for VQAs, utilizing randomised quantum circuits to generate the variational wavefunction. We parameterize the distribution function of these random circuits using artificial neural networks and optimize it to find the solution. This random-circuit approach presents a trade-off between the expressive power of the variational wavefunction and time cost, in terms of the sampling cost of quantum circuits. Given a fixed gate number, we can systematically increase the expressive power by extending the quantum-computing time. With a sufficiently large permissible time cost, the variational wavefunction can approximate any quantum state with arbitrary accuracy. Furthermore, we establish explicit relationships between expressive power, time cost, and gate number for variational quantum eigensolvers. These results highlight the promising potential of the random-circuit approach in achieving a high expressive power in quantum computing.

neural network, time cost, variational wavefunction, (16 more...)

2312.01947

Country:

North America > United States > Georgia > Fulton County > Atlanta (0.04)
Europe > Hungary (0.04)
Asia > Middle East > Jordan (0.04)
(2 more...)

Genre:

Research Report (0.84)
Overview (0.66)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)