AITopics

2406.19051

Genre: Research Report (0.85)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.35)

Yuan, Wei, Wang, Guanyang

Markov chain Monte Carlo without evaluating the target: an auxiliary variable approach

arXiv.org Machine LearningJun-27-2024

In sampling tasks, it is common for target distributions to be known up to a normalising constant. However, in many situations, evaluating even the unnormalised distribution can be costly or infeasible. This issue arises in scenarios such as sampling from the Bayesian posterior for tall datasets and the 'doubly-intractable' distributions. In this paper, we begin by observing that seemingly different Markov chain Monte Carlo (MCMC) algorithms, such as the exchange algorithm, PoissonMH, and TunaMH, can be unified under a simple common procedure. We then extend this procedure into a novel framework that allows the use of auxiliary variables in both the proposal and acceptance-rejection steps. We develop the theory of the new framework, applying it to existing algorithms to simplify and extend their results. Several new algorithms emerge from this framework, with improved performance demonstrated on both synthetic and real datasets.

acceptance rate, algorithm, poissonmh, (16 more...)

2406.05242

Country:

North America > United States > Florida > Hillsborough County > University (0.04)
North America > United States > California > Orange County > Irvine (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.61)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.45)

arXiv.org Machine LearningJun-27-2024

Local to Global: Learning Dynamics and Effect of Initialization for Transformers

Makkuva, Ashok Vardhan, Bondaschi, Marco, Ekbote, Chanakya, Girish, Adway, Nagle, Alliot, Kim, Hyeji, Gastpar, Michael

In recent years, transformer-based models have revolutionized deep learning, particularly in sequence modeling. To better understand this phenomenon, there is a growing interest in using Markov input processes to study transformers. However, our current understanding in this regard remains limited with many fundamental questions about how transformers learn Markov chains still unanswered. In this paper, we address this by focusing on first-order Markov chains and single-layer transformers, providing a comprehensive characterization of the learning dynamics in this context. Specifically, we prove that transformer parameters trained on next-token prediction loss can either converge to global or local minima, contingent on the initialization and the Markovian data properties, and we characterize the precise conditions under which this occurs. To the best of our knowledge, this is the first result of its kind highlighting the role of initialization. We further demonstrate that our theoretical findings are corroborated by empirical evidence. Based on these insights, we provide guidelines for the initialization of transformer parameters and demonstrate their effectiveness. Finally, we outline several open problems in this arena. Code is available at: https://github.com/Bond1995/Markov.

critical point, initialization, minima, (16 more...)

2406.03072

Country: Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.45)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.68)

Kwon, Jeongyeol, Mannor, Shie, Caramanis, Constantine, Efroni, Yonathan

RL in Latent MDPs is Tractable: Online Guarantees via Off-Policy Evaluation

arXiv.org Artificial IntelligenceJun-26-2024

In many real-world decision problems there is partially observed, hidden or latent information that remains fixed throughout an interaction. Such decision problems can be modeled as Latent Markov Decision Processes (LMDPs), where a latent variable is selected at the beginning of an interaction and is not disclosed to the agent. In the last decade, there has been significant progress in solving LMDPs under different structural assumptions. However, for general LMDPs, there is no known learning algorithm that provably matches the existing lower bound (Kwon et al., 2021). We introduce the first sample-efficient algorithm for LMDPs without any additional structural assumptions. Our result builds off a new perspective on the role of off-policy evaluation guarantees and coverage coefficients in LMDPs, a perspective, that has been overlooked in the context of exploration in partially observed environments. Specifically, we establish a novel off-policy evaluation lemma and introduce a new coverage coefficient for LMDPs. Then, we show how these can be used to derive near-optimal guarantees of an optimistic exploration algorithm. These results, we believe, can be valuable for a wide range of interactive learning problems beyond LMDPs, and especially, for partially observed environments.

algorithm, international conference, lmdp, (12 more...)

2406.01389

Country:

North America > United States > Wisconsin (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > New Finding (0.34)

Industry:

Health & Medicine (0.67)
Education (0.54)
Energy (0.45)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Saha, Anish, Ramakrishnan, A. G.

Automatic Speech Recognition for Hindi

arXiv.org Artificial IntelligenceJun-26-2024

Automatic speech recognition (ASR) is a key area in computational linguistics, focusing on developing technologies that enable computers to convert spoken language into text. This field combines linguistics and machine learning. ASR models, which map speech audio to transcripts through supervised learning, require handling real and unrestricted text. Text-to-speech systems directly work with real text, while ASR systems rely on language models trained on large text corpora. High-quality transcribed data is essential for training predictive models. The research involved two main components: developing a web application and designing a web interface for speech recognition. The web application, created with JavaScript and Node.js, manages large volumes of audio files and their transcriptions, facilitating collaborative human correction of ASR transcripts. It operates in real-time using a client-server architecture. The web interface for speech recognition records 16 kHz mono audio from any device running the web app, performs voice activity detection (VAD), and sends the audio to the recognition engine. VAD detects human speech presence, aiding efficient speech processing and reducing unnecessary processing during non-speech intervals, thus saving computation and network bandwidth in VoIP applications. The final phase of the research tested a neural network for accurately aligning the speech signal to hidden Markov model (HMM) states. This included implementing a novel backpropagation method that utilizes prior statistics of node co-activations.

activity detection, detection, transcript, (14 more...)

2406.18135

Country:

Europe > Austria > Vienna (0.14)
Asia > India > Karnataka > Bengaluru (0.14)
Asia > Indonesia > Bali (0.05)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.89)

arXiv.org Artificial IntelligenceJun-26-2024

Sample Complexity of Offline Distributionally Robust Linear Markov Decision Processes

Wang, He, Shi, Laixi, Chi, Yuejie

In reinforcement learning (RL), agents aim to learn an optim al policy that maximizes the expected total rewards, by actively interacting with an unknown environment. However, online data collection may be prohibitively expensive or potentially risky in many real-wor ld applications, e.g., autonomous driving ( Gu et al., 2022), healthcare ( Yu et al., 2021), and wireless security ( Uprety and Rawat, 2020). This motivates the study of offline RL, which leverages existing historical data (aka batch data) collected in the past to improve policy learning, and has attracted growing attention ( Levine et al., 2020). Nonetheless, the performance of the learned policy invoking standard offline RL techniques co uld drop dramatically when the deployed environment shifts from the one experienced by the historical da ta even slightly, necessitating the development of robust RL algorithms that are resilient against environm ental uncertainty. In response, recent years have witnessed a surge of interest s in distributionally robust offline RL ( Zhou et al., 2021b; Yang et al., 2022; Shi and Chi, 2022; Blanchet et al., 2024). In particular, given only historical data from a nominal environment, distributionally robust offline RL aims to learn a policy that optimizes the worst-case performance when the environment falls into some prescribed uncertaint y set around the nominal one. Such a framework ensures that the performance of the lea rned policy does not fail drastically, provided that the distribution shift between the nominal and deployment environments is not exce ssively large. Nevertheless, most existing provable algorithms in distri butionally robust offline RL only focus on the tabular setting with finite state and action spaces ( Zhou et al., 2021b; Yang et al., 2022; Shi and Chi, 2022), where the sample complexity scales linearly with the size of the state-action space, which is prohibitive when the problem is high-dimensional.

arxiv preprint arxiv, inequality, probability, (11 more...)

2403.12946

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.50)

Industry:

Health & Medicine (0.34)
Transportation > Ground > Road (0.34)
Information Technology (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.50)

arXiv.org Machine LearningJun-26-2024

Unbiased least squares regression via averaged stochastic gradient descent

Kahalé, Nabil

We consider an on-line least squares regression problem with optimal solution $\theta^*$ and Hessian matrix H, and study a time-average stochastic gradient descent estimator of $\theta^*$. For $k\ge2$, we provide an unbiased estimator of $\theta^*$ that is a modification of the time-average estimator, runs with an expected number of time-steps of order k, with O(1/k) expected excess risk. The constant behind the O notation depends on parameters of the regression and is a poly-logarithmic function of the smallest eigenvalue of H. We provide both a biased and unbiased estimator of the expected excess risk of the time-average estimator and of its unbiased counterpart, without requiring knowledge of either H or $\theta^*$. We describe an "average-start" version of our estimators with similar properties. Our approach is based on randomized multilevel Monte Carlo. Our numerical experiments confirm our theoretical findings.

equation follow, estimator, unbiased estimator, (15 more...)

2406.18623

Country:

North America > United States > New York (0.04)
Europe > France > Île-de-France > Paris > Paris (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Ruhdorfer, Constantin, Bortoletto, Matteo, Penzkofer, Anna, Bulling, Andreas

The Overcooked Generalisation Challenge

arXiv.org Artificial IntelligenceJun-25-2024

We introduce the Overcooked Generalisation Challenge (OGC) - the first benchmark to study agents' zero-shot cooperation abilities when faced with novel partners and levels in the Overcooked-AI environment. This perspective starkly contrasts a large body of previous work that has trained and evaluated cooperating agents only on the same level, failing to capture generalisation abilities required for real-world human-AI cooperation. Our challenge interfaces with state-of-the-art dual curriculum design (DCD) methods to generate auto-curricula for training general agents in Overcooked. It is the first cooperative multi-agent environment specially designed for DCD methods and, consequently, the first benchmarked with state-of-the-art methods. It is fully GPU-accelerated, built on the DCD benchmark suite minimax, and freely available under an open-source license: https://git.hcics.simtech.uni-stuttgart.de/public-projects/OGC. We show that current DCD algorithms struggle to produce useful policies in this novel challenge, even if combined with recent network architectures that were designed for scalability and generalisability. The OGC pushes the boundaries of real-world human-AI cooperation by enabling the research community to study the impact of generalisation on cooperating agents.

agent, layout, proceedings, (9 more...)

2406.17949

Country:

Europe > Germany > Baden-Württemberg > Stuttgart Region > Stuttgart (0.24)
North America > Canada > Quebec > Montreal (0.14)
Oceania > New Zealand > North Island > Auckland Region > Auckland (0.04)
(11 more...)

Genre:

Research Report (1.00)
Instructional Material > Course Syllabus & Notes (0.34)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.67)

Yang-Zhao, Samuel, Ng, Kee Siong

Privacy Preserving Reinforcement Learning for Population Processes

arXiv.org Artificial IntelligenceJun-25-2024

We consider the problem of privacy protection in Reinforcement Learning (RL) algorithms that operate over population processes, a practical but understudied setting that includes, for example, the control of epidemics in large populations of dynamically interacting individuals. In this setting, the RL algorithm interacts with the population over $T$ time steps by receiving population-level statistics as state and performing actions which can affect the entire population at each time step. An individual's data can be collected across multiple interactions and their privacy must be protected at all times. We clarify the Bayesian semantics of Differential Privacy (DP) in the presence of correlated data in population processes through a Pufferfish Privacy analysis. We then give a meta algorithm that can take any RL algorithm as input and make it differentially private. This is achieved by taking an approach that uses DP mechanisms to privatize the state and reward signal at each time step before the RL algorithm receives them as input. Our main theoretical result shows that the value-function approximation error when applying standard RL algorithms directly to the privatized states shrinks quickly as the population size and privacy budget increase. This highlights that reasonable privacy-utility trade-offs are possible for differentially private RL algorithms in population processes. Our theoretical findings are validated by experiments performed on a simulated epidemic control problem over large population sizes.

algorithm, differential privacy, privacy, (16 more...)

2406.17649

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > Czechia > Prague (0.04)
(2 more...)

Genre: Research Report > New Finding (0.87)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.69)
Health & Medicine > Therapeutic Area > Immunology (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

arXiv.org Artificial IntelligenceJun-25-2024

Reinforcement Learning from Delayed Observations via World Models

Karamzade, Armin, Kim, Kyungmin, Kalsi, Montek, Fox, Roy

In standard reinforcement learning settings, agents typically assume immediate feedback about the effects of their actions after taking them. However, in practice, this assumption may not hold true due to physical constraints and can significantly impact the performance of learning algorithms. In this paper, we address observation delays in partially observable environments. We propose leveraging world models, which have shown success in integrating past observations and learning dynamics, to handle observation delays. By reducing delayed POMDPs to delayed MDPs with world models, our methods can effectively handle partial observability, where existing approaches achieve sub-optimal performance or degrade quickly as observability decreases. Experiments suggest that one of our methods can outperform a naive model-based approach by up to 250%. Moreover, we evaluate our methods on visual delayed environments, for the first time showcasing delay-aware reinforcement learning continuous control with visual observations.

agent, observation delay, world model, (14 more...)

2403.12309

Country:

North America > United States > California > Orange County > Irvine (0.04)
Europe > Poland > Masovia Province > Warsaw (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.93)
(2 more...)