AITopics

Genre: Summary/Review (0.37)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Neural Information Processing SystemsOct-7-2024, 09:41:33 GMT

Reviews: Temporal Regularization for Markov Decision Process

This paper is very interesting. One previous assumption in TD learning is that reward are close with states in proximity of the state space, which has been pointed out by many papers is not realistic and have problems for spatial value function regularization. Instead, this paper make the assumption that rewards are close for states. Overall this paper has a very good motivation, and the literature review shows that the author is knowledgable of this field. This paper could open a novel area of temporal regularization that received inadequate attention before.

function approximation, markov decision process, temporal regularization, (8 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.56)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.52)

Neural Information Processing SystemsOct-7-2024, 08:33:21 GMT

Reviews: Near Optimal Exploration-Exploitation in Non-Communicating Markov Decision Processes

This is an excellent theoretical contribution. The analysis is quite heavy and has many subtleties. I do not have enough time to read the appended proofs; also, the subject of the paper is not in my area of research. The comments below are based on the impression I got after reading carefully the first 8 pages of the paper and glancing through the rest in the supplementary file. Summary: This paper is about reinforcement learning in weakly-communicating MDP under the average-reward criterion.

algorithm, artificial intelligence, machine learning, (13 more...)

Industry: Energy > Oil & Gas > Upstream (0.52)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.40)

Neural Information Processing SystemsOct-7-2024, 05:26:35 GMT

Reviews: On Markov Chain Gradient Descent

POST REBUTTAL: I do think that the edit to the proof suggested by the authors could work, but would lead to some exorbitant constant C4, a subject not addressed by the authors. Still, I have increased my score from "clear reject" to "accept" in the light of the fact that I am now happy with the validity of the proofs.

application, markov chain gradient descent, review, (5 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.52)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.43)

Transformers learn variable-order Markov chains in-context

Zhou, Ruida, Tian, Chao, Diggavi, Suhas

Large language models have demonstrated impressive in-context learning (ICL) capability. However, it is still unclear how the underlying transformers accomplish it, especially in more complex scenarios. Toward this goal, several recent works studied how transformers learn fixed-order Markov chains (FOMC) in context, yet natural languages are more suitably modeled by variable-order Markov chains (VOMC), i.e., context trees (CTs). In this work, we study the ICL of VOMC by viewing language modeling as a form of data compression and focus on small alphabets and low-order VOMCs. This perspective allows us to leverage mature compression algorithms, such as context-tree weighting (CTW) and prediction by partial matching (PPM) algorithms as baselines, the former of which is Bayesian optimal for a class of CTW priors. We empirically observe a few phenomena: 1) Transformers can indeed learn to compress VOMC in-context, while PPM suffers significantly; 2) The performance of transformers is not very sensitive to the number of layers, and even a two-layer transformer can learn in-context quite well; and 3) Transformers trained and tested on non-CTW priors can significantly outperform the CTW algorithm. To explain these phenomena, we analyze the attention map of the transformers and extract two mechanisms, on which we provide two transformer constructions: 1) A construction with $D+2$ layers that can mimic the CTW algorithm accurately for CTs of maximum order $D$, 2) A 2-layer transformer that utilizes the feed-forward network for probability blending. One distinction from the FOMC setting is that a counting mechanism appears to play an important role. We implement these synthetic transformer layers and show that such hybrid transformers can match the ICL performance of transformers, and more interestingly, some of them can perform even better despite the much-reduced parameter sets.

algorithm, sequence, transformer, (13 more...)

2410.05493

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.28)
North America > United States > Texas > Brazos County > College Station (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.91)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Choudhury, Sanjiban, Sodhi, Paloma

Better than Your Teacher: LLM Agents that learn from Privileged AI Feedback

While large language models (LLMs) show impressive decision-making abilities, current methods lack a mechanism for automatic self-improvement from errors during task execution. We propose LEAP, an iterative fine-tuning framework that continually improves LLM agents using feedback from AI expert teachers. Our key insight is to equip the expert teachers with a privileged state -- information that is available during training but hidden at test time. This allows even weak experts to provide precise guidance, significantly improving the student agent's performance without access to privileged information at test time. We evaluate LEAP on diverse decision-making benchmarks, including text-based games (ALFWorld), web navigation (WebShop), and interactive coding (Intercode Bash). Our experiments show that LEAP (1) outperforms behavior cloning and ReAct baselines (2) enables weak student models (e.g., Llama3-8B) to exceed the performance of strong teacher models (GPT4-o), and (3) allows weak models to self-improve using privileged versions of themselves. We also provide a theoretical analysis showing that LEAP's success hinges on balancing privileged information with the student's realizability, which we empirically validate. Our code is available at https://leap-llm.github.io

agent, student, timestep, (15 more...)

2410.05434

Country: North America > United States (0.04)

Genre: Research Report > New Finding (0.45)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

A Review of Artificial Intelligence based Biological-Tree Construction: Priorities, Methods, Applications and Trends

Zang, Zelin, Xu, Yongjie, Duan, Chenrui, Wu, Jinlin, Li, Stan Z., Lei, Zhen

classical single-cell lineage inference, gene-based phylogenetic tree construction method, single cell trajectory inference method, (14 more...)

Biological tree analysis serves as a pivotal tool in uncovering the evolutionary and differentiation relationships among organisms, genes, and cells. Its applications span diverse fields including phylogenetics, developmental biology, ecology, and medicine. Traditional tree inference methods, while foundational in early studies, face increasing limitations in processing the large-scale, complex datasets generated by modern high-throughput technologies. Recent advances in deep learning offer promising solutions, providing enhanced data processing and pattern recognition capabilities. However, challenges remain, particularly in accurately representing the inherently discrete and non-Euclidean nature of biological trees. In this review, we first outline the key biological priors fundamental to phylogenetic and differentiation tree analyses, facilitating a deeper interdisciplinary understanding between deep learning researchers and biologists. We then systematically examine the commonly used data formats and databases, serving as a comprehensive resource for model testing and development. We provide a critical analysis of traditional tree generation methods, exploring their underlying biological assumptions, technical characteristics, and limitations. Current developments in deep learning-based tree generation are reviewed, highlighting both recent advancements and existing challenges. Furthermore, we discuss the diverse applications of biological trees across various biological domains. Finally, we propose potential future directions and trends in leveraging deep learning for biological tree research, aiming to guide further exploration and innovation in this field.

2410.04815

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Asia > Indonesia > Bali (0.04)
North America > United States > California (0.04)
(10 more...)

Genre:

Research Report > Promising Solution (1.00)
Overview (1.00)

Industry:

Health & Medicine > Therapeutic Area > Pulmonary/Respiratory Diseases (1.00)
Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.45)

Levin, Ilya, Belomestny, Denis, Naumov, Alexey, Samsonov, Sergey

UVIP: Model-Free Approach to Evaluate Reinforcement Learning Algorithms

Policy evaluation is an important instrument for the comparison of different algorithms in Reinforcement Learning (RL). Yet even a precise knowledge of the value function $V^{\pi}$ corresponding to a policy $\pi$ does not provide reliable information on how far is the policy $\pi$ from the optimal one. We present a novel model-free upper value iteration procedure $({\sf UVIP})$ that allows us to estimate the suboptimality gap $V^{\star}(x) - V^{\pi}(x)$ from above and to construct confidence intervals for $V^\star$. Our approach relies on upper bounds to the solution of the Bellman optimality equation via martingale approach. We provide theoretical guarantees for ${\sf UVIP}$ under general assumptions and illustrate its performance on a number of benchmark RL problems.

algorithm, conference paper, icomp 2024, (15 more...)

2105.02135

Country:

North America > United States > New York > New York County > New York City (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Xiong, Guojun, Dinesha, Ujwal, Mukherjee, Debajoy, Li, Jian, Shakkottai, Srinivas

DOPL: Direct Online Preference Learning for Restless Bandits with Preference Feedback

arXiv.org Machine LearningOct-7-2024

Restless multi-armed bandits (RMAB) has been widely used to model constrained sequential decision making problems, where the state of each restless arm evolves according to a Markov chain and each state transition generates a scalar reward. However, the success of RMAB crucially relies on the availability and quality of reward signals. Unfortunately, specifying an exact reward function in practice can be challenging and even infeasible. In this paper, we introduce Pref-RMAB, a new RMAB model in the presence of preference signals, where the decision maker only observes pairwise preference feedback rather than scalar reward from the activated arms at each decision epoch. Preference feedback, however, arguably contains less information than the scalar reward, which makes Pref-RMAB seemingly more difficult. To address this challenge, we present a direct online preference learning (DOPL) algorithm for Pref-RMAB to efficiently explore the unknown environments, adaptively collect preference data in an online manner, and directly leverage the preference feedback for decision-makings. We prove that DOPL yields a sublinear regret. To our best knowledge, this is the first algorithm to ensure $\tilde{\mathcal{O}}(\sqrt{T\ln T})$ regret for RMAB with preference feedback. Experimental results further demonstrate the effectiveness of DOPL.

algorithm, preference feedback, ref -rmab, (16 more...)

arXiv.org Machine Learning

2410.05527

Country:

North America > United States > New York > Suffolk County > Stony Brook (0.04)
Oceania > New Zealand (0.04)
North America > United States > Texas (0.04)
Asia > Middle East > Jordan (0.04)

Genre:

Research Report (1.00)
Instructional Material > Online (0.61)

Industry:

Health & Medicine > Therapeutic Area (0.46)
Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.66)

Kartik Ahuja, William Zame, Mihaela van der Schaar

DPSCREEN: Dynamic Personalized Screening

Neural Information Processing SystemsOct-6-2024, 19:46:36 GMT

Screening is important for the diagnosis and treatment of a wide variety of diseases. A good screening policy should be personalized to the features of the patient and to the dynamic history of the patient (including the history of screening). The growth of electronic health records data has led to the development of many models to predict the onset and progression of different diseases. However, there has been limited work to address the personalized screening for these different diseases. In this work, we develop the first framework to construct screening policies for a large class of disease models.

cancer, screening, screening policy, (14 more...)

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
North America > United States > New York (0.04)
North America > United States > Massachusetts (0.04)
(5 more...)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Diagnostic Medicine (1.00)
Health & Medicine > Therapeutic Area > Gastroenterology (0.94)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.94)
Information Technology > Decision Support Systems (0.68)