AITopics

2304.14326

Country:

South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.50)

Industry: Education > Educational Setting > Online (0.55)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Constraint-Based Reasoning (0.51)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.46)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.34)

The Closeness of In-Context Learning and Weight Shifting for Softmax Regression

Li, Shuai, Song, Zhao, Xia, Yu, Yu, Tong, Zhou, Tianyi

Large language models (LLMs) are known for their exceptional performance in natural language processing, making them highly effective in many human life-related or even job-related tasks. The attention mechanism in the Transformer architecture is a critical component of LLMs, as it allows the model to selectively focus on specific input parts. The softmax unit, which is a key part of the attention mechanism, normalizes the attention scores. Hence, the performance of LLMs in various NLP tasks depends significantly on the crucial role played by the attention mechanism with the softmax unit. In-context learning, as one of the celebrated abilities of recent LLMs, is an important concept in querying LLMs such as ChatGPT. Without further parameter updates, Transformers can learn to predict based on few in-context examples. However, the reason why Transformers becomes in-context learners is not well understood. Recently, several works [ASA+22,GTLV22,ONR+22] have studied the in-context learning from a mathematical perspective based on a linear regression formulation $\min_x\| Ax - b \|_2$, which show Transformers' capability of learning linear functions in context. In this work, we study the in-context learning based on a softmax regression formulation $\min_{x} \| \langle \exp(Ax), {\bf 1}_n \rangle^{-1} \exp(Ax) - b \|_2$ of Transformer's attention mechanism. We show the upper bounds of the data transformations induced by a single self-attention layer and by gradient-descent on a $\ell_2$ regression loss for softmax prediction function, which imply that when training self-attention-only Transformers for fundamental regression tasks, the models learned by gradient-descent and Transformers show great similarity.

large language model, machine learning, natural language, (18 more...)

2304.13276

Country:

Europe > Italy > Tuscany > Florence (0.04)
North America > United States > Michigan (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
(2 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Biemer, Colan F., Cooper, Seth

Level Assembly as a Markov Decision Process

Many games feature a progression of levels that doesn't adapt to the player. This can be problematic because some players may get stuck if the progression is too difficult, while others may find it boring if the progression is too slow to get to more challenging levels. This can be addressed by building levels based on the player's performance and preferences. In this work, we formulate the problem of generating levels for a player as a Markov Decision Process (MDP) and use adaptive dynamic programming (ADP) to solve the MDP before assembling a level. We tested with two case studies and found that using an ADP outperforms two baselines. Furthermore, we experimented with player proxies and switched them in the middle of play, and we show that a simple modification prior to running ADP results in quick adaptation. By using ADP, which searches the entire MDP, we produce a dynamic progression of levels that adapts to the player.

artificial intelligence, machine learning, mdp, (16 more...)

2304.13922

Country: North America > United States > Massachusetts > Suffolk County > Boston (0.04)

Genre: Research Report (0.64)

Industry: Leisure & Entertainment > Games > Computer Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.84)

Learning Agile Soccer Skills for a Bipedal Robot with Deep Reinforcement Learning

Haarnoja, Tuomas, Moran, Ben, Lever, Guy, Huang, Sandy H., Tirumala, Dhruva, Wulfmeier, Markus, Humplik, Jan, Tunyasuvunakool, Saran, Siegel, Noah Y., Hafner, Roland, Bloesch, Michael, Hartikainen, Kristian, Byravan, Arunkumar, Hasenclever, Leonard, Tassa, Yuval, Sadeghi, Fereshteh, Batchelor, Nathan, Casarini, Federico, Saliceti, Stefano, Game, Charles, Sreendra, Neil, Patel, Kushal, Gwira, Marlon, Huber, Andrea, Hurley, Nicole, Nori, Francesco, Hadsell, Raia, Heess, Nicolas

We investigate whether Deep Reinforcement Learning (Deep RL) is able to synthesize sophisticated and safe movement skills for a low-cost, miniature humanoid robot that can be composed into complex behavioral strategies in dynamic environments. We used Deep RL to train a humanoid robot with 20 actuated joints to play a simplified one-versus-one (1v1) soccer game. We first trained individual skills in isolation and then composed those skills end-to-end in a self-play setting. The resulting policy exhibits robust and dynamic movement skills such as rapid fall recovery, walking, turning, kicking and more; and transitions between them in a smooth, stable, and efficient manner - well beyond what is intuitively expected from the robot. The agents also developed a basic strategic understanding of the game, and learned, for instance, to anticipate ball movements and to block opponent shots. The full range of behaviors emerged from a small set of simple rewards. Our agents were trained in simulation and transferred to real robots zero-shot. We found that a combination of sufficiently high-frequency control, targeted dynamics randomization, and perturbations during training in simulation enabled good-quality transfer, despite significant unmodeled effects and variations across robot instances. Although the robots are inherently fragile, minor hardware modifications together with basic regularization of the behavior during training led the robots to learn safe and effective movements while still performing in a dynamic and agile way. Indeed, even though the agents were optimized for scoring, in experiments they walked 156% faster, took 63% less time to get up, and kicked 24% faster than a scripted baseline, while efficiently combining the skills to achieve the longer term objectives. Examples of the emergent behaviors and full 1v1 matches are available on the supplementary website.

artificial intelligence, machine learning, reinforcement learning, (12 more...)

2304.13653

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
(3 more...)

Genre:

Research Report (0.64)
Instructional Material (0.46)

Industry: Leisure & Entertainment > Sports > Soccer (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.45)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.34)

Li, Xinchen, Guvenc, Levent, Aksun-Guvenc, Bilin

Decision Making for Autonomous Vehicles

When autonomous vehicles also use on-board units which are vehicular to everything communication modems, they become connected and autonomous vehicles (CAV) [1]. Due to their increasing availability, CAVs have been the focus of both academic and industry research for a while and, as a result, there are is lot of research on autonomous driving function controls [2], [3], [4], [5], [6], [7], [8], [9],;10], [11] and their higher level decision-making algorithms [12], [13], [14], [15]. Planning and decision making are the core functions for an autonomous vehicle for driving on the road safely and efficiently under different traffic scenarios. As discussed in [16], the decision making and planning algorithms for autonomous vehicles are aiming at solving significant problems in autonomous driving, like (a) determining the future path, (b) utilizing observations of the surrounding environment from the perception system, (c) acting properly when interacting with other road users, (d) instructing lowlevel controller of the vehicle and (e) ensuring autonomous driving is safe and efficient. Therefore, the planning and decision making affects the autonomous vehicle decisively. Depending on the traffic scenario, autonomous driving functions are designed for highway driving, off-road driving and urban driving. The research for highway driving and off-road driving have been going on for a long time and with many results on planning and decision making. Yet, due to the complexity of the urban traffic scenario, the decision making and planning for urban traffic environment has always been very challenging with many unsolved problems. The complexity of the urban traffic scenario is manifested in the following aspects which are discussed next.

artificial intelligence, machine learning, vehicle, (16 more...)

2304.13908

Country: North America > United States (0.46)

Genre: Research Report (0.50)

Industry:

Transportation > Ground > Road (1.00)
Automobiles & Trucks (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.50)

On the Generalization of Learned Structured Representations

Dittadi, Andrea

Despite tremendous progress over the past decade, deep learning methods generally fall short of human-level systematic generalization. It has been argued that explicitly capturing the underlying structure of data should allow connectionist systems to generalize in a more predictable and systematic manner. Indeed, evidence in humans suggests that interpreting the world in terms of symbol-like compositional entities may be crucial for intelligent behavior and high-level reasoning. Another common limitation of deep learning systems is that they require large amounts of training data, which can be expensive to obtain. In representation learning, large datasets are leveraged to learn generic data representations that may be useful for efficient learning of arbitrary downstream tasks. This thesis is about structured representation learning. We study methods that learn, with little or no supervision, representations of unstructured data that capture its hidden structure. In the first part of the thesis, we focus on representations that disentangle the explanatory factors of variation of the data. We scale up disentangled representation learning to a novel robotic dataset, and perform a systematic large-scale study on the role of pretrained representations for out-of-distribution generalization in downstream robotic tasks. The second part of this thesis focuses on object-centric representations, which capture the compositional structure of the input in terms of symbol-like entities, such as objects in visual scenes. Object-centric learning methods learn to form meaningful entities from unstructured input, enabling symbolic information processing on a connectionist substrate. In this study, we train a selection of methods on several common datasets, and investigate their usefulness for downstream tasks and their ability to generalize out of distribution.

artificial intelligence, conv layer interpolation scale 2, machine learning, (19 more...)

2304.13001

Country:

Asia (0.92)
Europe > Germany > Baden-Württemberg (0.27)
Europe > Denmark > Capital Region (0.27)
North America > United States > Minnesota (0.27)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Overview (1.00)

Industry:

Energy > Oil & Gas (0.92)
Leisure & Entertainment > Games > Computer Games (0.92)
Education (0.67)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.67)

Partially Observable Mean Field Multi-Agent Reinforcement Learning Based on Graph-Attention

Yang, Min, Liu, Guanjun, Zhou, Ziyuan

Traditional multi-agent reinforcement learning algorithms are difficultly applied in a large-scale multi-agent environment. The introduction of mean field theory has enhanced the scalability of multi-agent reinforcement learning in recent years. This paper considers partially observable multi-agent reinforcement learning (MARL), where each agent can only observe other agents within a fixed range. This partial observability affects the agent's ability to assess the quality of the actions of surrounding agents. This paper focuses on developing a method to capture more effective information from local observations in order to select more effective actions. Previous work in this field employs probability distributions or weighted mean field to update the average actions of neighborhood agents, but it does not fully consider the feature information of surrounding neighbors and leads to a local optimum. In this paper, we propose a novel multi-agent reinforcement learning algorithm, Partially Observable Mean Field Multi-Agent Reinforcement Learning based on Graph--Attention (GAMFQ) to remedy this flaw. GAMFQ uses a graph attention module and a mean field module to describe how an agent is influenced by the actions of other agents at each time step. This graph attention module consists of a graph attention encoder and a differentiable attention mechanism, and this mechanism outputs a dynamic graph to represent the effectiveness of neighborhood agents against central agents. The mean--field module approximates the effect of a neighborhood agent on a central agent as the average effect of effective neighborhood agents. We evaluate GAMFQ on three challenging tasks in the MAgents framework. Experiments show that GAMFQ outperforms baselines including the state-of-the-art partially observable mean-field reinforcement learning algorithms.

agent, algorithm, survey article, (14 more...)

2304.12653

Country:

Europe > United Kingdom (0.04)
Europe > Sweden > Stockholm > Stockholm (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
(4 more...)

Genre:

Research Report (0.82)
Overview (0.67)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)

Dynamic Datasets and Market Environments for Financial Reinforcement Learning

Liu, Xiao-Yang, Xia, Ziyi, Yang, Hongyang, Gao, Jiechao, Zha, Daochen, Zhu, Ming, Wang, Christina Dan, Wang, Zhaoran, Guo, Jian

The financial market is a particularly challenging playground for deep reinforcement learning due to its unique feature of dynamic datasets. Building high-quality market environments for training financial reinforcement learning (FinRL) agents is difficult due to major factors such as the low signal-to-noise ratio of financial data, survivorship bias of historical data, and model overfitting. In this paper, we present FinRL-Meta, a data-centric and openly accessible library that processes dynamic datasets from real-world markets into gym-style market environments and has been actively maintained by the AI4Finance community. First, following a DataOps paradigm, we provide hundreds of market environments through an automatic data curation pipeline. Second, we provide homegrown examples and reproduce popular research papers as stepping stones for users to design new trading strategies. We also deploy the library on cloud platforms so that users can visualize their own results and assess the relative performance via community-wise competitions. Third, we provide dozens of Jupyter/Python demos organized into a curriculum and a documentation website to serve the rapidly growing community. The open-source codes for the data curation pipeline are available at https://github.com/AI4Finance-Foundation/FinRL-Meta

artificial intelligence, machine learning, reinforcement learning, (12 more...)

2304.13174

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > Virginia (0.04)
North America > United States > New York > New York County > New York City (0.04)
Asia > China > Shanghai > Shanghai (0.04)

Genre:

Instructional Material (1.00)
Overview (0.93)
Research Report > New Finding (0.45)

Industry:

Information Technology > Security & Privacy (1.00)
Banking & Finance > Trading (1.00)
Information Technology > Services (0.87)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

A optimization framework for herbal prescription planning based on deep reinforcement learning

Yang, Kuo, Yu, Zecong, Su, Xin, He, Xiong, Wang, Ning, Zheng, Qiguang, Yu, Feidie, Liu, Zhuang, Wen, Tiancai, Zhou, Xuezhong

Treatment planning for chronic diseases is a critical task in medical artificial intelligence, particularly in traditional Chinese medicine (TCM). However, generating optimized sequential treatment strategies for patients with chronic diseases in different clinical encounters remains a challenging issue that requires further exploration. In this study, we proposed a TCM herbal prescription planning framework based on deep reinforcement learning for chronic disease treatment (PrescDRL). PrescDRL is a sequential herbal prescription optimization model that focuses on long-term effectiveness rather than achieving maximum reward at every step, thereby ensuring better patient outcomes. We constructed a high-quality benchmark dataset for sequential diagnosis and treatment of diabetes and evaluated PrescDRL against this benchmark. Our results showed that PrescDRL achieved a higher curative effect, with the single-step reward improving by 117% and 153% compared to doctors. Furthermore, PrescDRL outperformed the benchmark in prescription prediction, with precision improving by 40.5% and recall improving by 63%. Overall, our study demonstrates the potential of using artificial intelligence to improve clinical intelligent diagnosis and treatment in TCM.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

2304.12828

Country:

Asia > China > Beijing > Beijing (0.05)
Asia > China > Guangdong Province > Guangzhou (0.04)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (0.93)
Health & Medicine > Therapeutic Area > Oncology (0.68)
Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (0.49)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Wagenmaker, Andrew, Foster, Dylan J.

Instance-Optimality in Interactive Decision Making: Toward a Non-Asymptotic Theory

arXiv.org Artificial IntelligenceApr-24-2023

We consider the development of adaptive, instance-dependent algorithms for interactive decision making (bandits, reinforcement learning, and beyond) that, rather than only performing well in the worst case, adapt to favorable properties of real-world instances for improved performance. We aim for instance-optimality, a strong notion of adaptivity which asserts that, on any particular problem instance, the algorithm under consideration outperforms all consistent algorithms. Instance-optimality enjoys a rich asymptotic theory originating from the work of \citet{lai1985asymptotically,graves1997asymptotically}, but non-asymptotic guarantees have remained elusive outside of certain special cases. Even for problems as simple as tabular reinforcement learning, existing algorithms do not attain instance-optimal performance until the number of rounds of interaction is doubly exponential in the number of states. In this paper, we take the first step toward developing a non-asymptotic theory of instance-optimal decision making with general function approximation. We introduce a new complexity measure, the Allocation-Estimation Coefficient (AEC), and provide a new algorithm, $\mathsf{AE}^2$, which attains non-asymptotic instance-optimal performance at a rate controlled by the AEC. Our results recover the best known guarantees for well-studied problems such as finite-armed and linear bandits and, when specialized to tabular reinforcement learning, attain the first instance-optimal regret bounds with polynomial dependence on all problem parameters, improving over prior work exponentially. We complement these results with lower bounds that show that i) existing notions of statistical complexity are insufficient to derive non-asymptotic guarantees, and ii) under certain technical conditions, boundedness of the AEC is necessary to learn an instance-optimal allocation of decisions in finite time.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

2304.12466

Country:

North America > United States > Michigan (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.74)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.67)