AITopics | Seraj, Raihan

Collaborating Authors

Seraj, Raihan

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Contextual bandits with entropy-based human feedback

Seraj, Raihan, Meng, Lili, Sylvain, Tristan

arXiv.org Artificial IntelligenceFeb-12-2025

This work investigates how explicit human feedback can enhance CB performance. Building on successful integrations In recent years, preference-based human feedback of human guidance in reinforcement learning (Christiano mechanisms have become essential for enhancing et al., 2017; MacGlashan et al., 2017) and conversational model performance across diverse applications, AI (Achiam et al., 2023), we distinguish two primary feedback including conversational AI systems such as Chat-paradigms: (1) action-based feedback, where experts GPT. However, existing approaches often neglect directly prescribe optimal actions for specific contexts (Osa critical aspects, such as model uncertainty and et al., 2018; Li et al., 2023), and (2) preference-based feedback, the variability in feedback quality. To address where humans compare pairs of learner-generated actions these challenges, we introduce an entropy-based to express relative preferences (Christiano et al., 2017; human feedback framework for contextual bandits, Saha et al., 2023). While action-based methods require precise which dynamically balances exploration and expert knowledge, we focus on preference feedback for exploitation by soliciting expert feedback only its practical advantages in scalable data collection, notably when model entropy exceeds a predefined threshold.

bandit, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2502.08759

Country: North America > Canada > Quebec > Montreal (0.14)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.89)

Add feedback

Generalizing Multi-Step Inverse Models for Representation Learning to Finite-Memory POMDPs

Wu, Lili, Evans, Ben, Islam, Riashat, Seraj, Raihan, Efroni, Yonathan, Lamb, Alex

arXiv.org Artificial IntelligenceApr-22-2024

Discovering an informative, or agent-centric, state representation that encodes only the relevant information while discarding the irrelevant is a key challenge towards scaling reinforcement learning algorithms and efficiently applying them to downstream tasks. Prior works studied this problem in high-dimensional Markovian environments, when the current observation may be a complex object but is sufficient to decode the informative state. In this work, we consider the problem of discovering the agent-centric state in the more challenging high-dimensional non-Markovian setting, when the state can be decoded from a sequence of past observations. We establish that generalized inverse models can be adapted for learning agent-centric state representation for this task. Our results include asymptotic theory in the deterministic dynamics setting as well as counter-examples for alternative intuitive algorithms. We complement these findings with a thorough empirical study on the agent-centric state discovery abilities of the different alternatives we put forward. Particularly notable is our analysis of past actions, where we show that these can be a double-edged sword: making the algorithms more successful when used correctly and causing dramatic failure when used incorrectly.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

arXiv.org Artificial Intelligence

2404.14552

Country:

Oceania > Australia (0.14)
North America > United States (0.14)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.86)

Add feedback

PcLast: Discovering Plannable Continuous Latent States

Koul, Anurag, Sujit, Shivakanth, Chen, Shaoru, Evans, Ben, Wu, Lili, Xu, Byron, Chari, Rajan, Islam, Riashat, Seraj, Raihan, Efroni, Yonathan, Molu, Lekan, Dudik, Miro, Langford, John, Lamb, Alex

arXiv.org Artificial IntelligenceNov-6-2023

Goal-conditioned planning benefits from learned low-dimensional representations of rich, high-dimensional observations. While compact latent representations, typically learned from variational autoencoders or inverse dynamics, enable goal-conditioned planning they ignore state affordances, thus hampering their sample-efficient planning capabilities. In this paper, we learn a representation that associates reachable states together for effective onward planning. We first learn a latent representation with multi-step inverse dynamics (to remove distracting information); and then transform this representation to associate reachable states together in $\ell_2$ space. Our proposals are rigorously tested in various simulation testbeds. Numerical results in reward-based and reward-free settings show significant improvements in sampling efficiency, and yields layered state abstractions that enable computationally efficient hierarchical planning.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

2311.03534

Country: North America > Canada > Quebec > Montreal (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.69)

Add feedback

AutoCast++: Enhancing World Event Prediction with Zero-shot Ranking-based Context Retrieval

Yan, Qi, Seraj, Raihan, He, Jiawei, Meng, Lili, Sylvain, Tristan

arXiv.org Artificial IntelligenceOct-3-2023

Machine-based prediction of real-world events is garnering attention due to its potential for informed decision-making. Whereas traditional forecasting predominantly hinges on structured data like time-series, recent breakthroughs in language models enable predictions using unstructured text. In particular, (Zou et al., 2022) unveils AutoCast, a new benchmark that employs news articles for answering forecasting queries. Nevertheless, existing methods still trail behind human performance. The cornerstone of accurate forecasting, we argue, lies in identifying a concise, yet rich subset of news snippets from a vast corpus. With this motivation, we introduce AutoCast++, a zero-shot ranking-based context retrieval system, tailored to sift through expansive news document collections for event forecasting. Our approach first re-ranks articles based on zero-shot question-passage relevance, honing in on semantically pertinent news. Following this, the chosen articles are subjected to zero-shot summarization to attain succinct context. Leveraging a pre-trained language model, we conduct both the relevance evaluation and article summarization without needing domain-specific training. Notably, recent articles can sometimes be at odds with preceding ones due to new facts or unanticipated incidents, leading to fluctuating temporal dynamics. To tackle this, our re-ranking mechanism gives preference to more recent articles, and we further regularize the multi-passage representation learning to align with human forecaster responses made on different dates. Empirical results underscore marked improvements across multiple metrics, improving the performance for multiple-choice questions (MCQ) by 48% and true/false (TF) questions by up to 8%.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2310.0188

Country:

North America > United States (0.46)
North America > Canada > British Columbia (0.14)
Asia > Middle East > UAE (0.14)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Tsetlin Machine for Solving Contextual Bandit Problems

Seraj, Raihan, Sharma, Jivitesh, Granmo, Ole-Christoffer

arXiv.org Artificial IntelligenceFeb-3-2022

This paper introduces an interpretable contextual bandit algorithm using Tsetlin Machines, which solves complex pattern recognition tasks using propositional logic. The proposed bandit learning algorithm relies on straightforward bit manipulation, thus simplifying computation and interpretation. We then present a mechanism for performing Thompson sampling with Tsetlin Machine, given its non-parametric nature. Our empirical analysis shows that Tsetlin Machine as a base contextual bandit learner outperforms other popular base learners on eight out of nine datasets. We further analyze the interpretability of our learner, investigating how arms are selected based on propositional expressions that model the context.

data mining, machine learning, pattern recognition, (21 more...)

arXiv.org Artificial Intelligence

2202.01914

Country:

North America > Canada (0.14)
Europe > Norway (0.14)

Genre: Research Report (0.85)

Industry: Health & Medicine (0.70)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (0.68)

Add feedback