AITopics | Farahmand, Amir-massoud

Collaborating Authors

Farahmand, Amir-massoud

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Efficient and Accurate Optimal Transport with Mirror Descent and Conjugate Gradients

Kemertas, Mete, Jepson, Allan D., Farahmand, Amir-massoud

arXiv.org Artificial IntelligenceOct-31-2023

We design a novel algorithm for optimal transport by drawing from the entropic optimal transport, mirror descent and conjugate gradients literatures. Our scalable and GPU parallelizable algorithm is able to compute the Wasserstein distance with extreme precision, reaching relative error rates of $10^{-8}$ without numerical stability issues. Empirically, the algorithm converges to high precision solutions more quickly in terms of wall-clock time than a variety of algorithms including log-domain stabilized Sinkhorn's Algorithm. We provide careful ablations with respect to algorithm and problem parameters, and present benchmarking over upsampled MNIST images, comparing to various recent algorithms over high-dimensional problems. The results suggest that our algorithm can be a useful addition to the practitioner's optimal transport toolkit.

algorithm, artificial intelligence, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2307.08507

Country:

North America > Canada > Quebec (0.14)
North America > Canada > Ontario > Toronto (0.14)
North America > United States > Virginia (0.14)

Genre: Research Report > New Finding (0.34)

Industry: Health & Medicine > Diagnostic Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)

Add feedback

$\lambda$-AC: Learning latent decision-aware models for reinforcement learning in continuous state-spaces

Voelcker, Claas A, Ahmadian, Arash, Abachi, Romina, Gilitschenski, Igor, Farahmand, Amir-massoud

arXiv.org Artificial IntelligenceSep-20-2023

The idea of decision-aware model learning, that models should be accurate where it matters for decision-making, has gained prominence in model-based reinforcement learning. While promising theoretical results have been established, the empirical performance of algorithms leveraging a decision-aware loss has been lacking, especially in continuous control problems. In this paper, we present a study on the necessary components for decision-aware reinforcement learning models and we showcase design choices that enable well-performing algorithms. To this end, we provide a theoretical and empirical investigation into prominent algorithmic ideas in the field. We highlight that empirical design decisions established in the MuZero line of works are vital to achieving good performance for related algorithms, and we showcase differences in behavior between different instantiations of value-aware algorithms in stochastic environments. Using these insights, we propose the Latent Model-Based Decision-Aware Actor-Critic framework ($\lambda$-AC) for decision-aware model-based reinforcement learning in continuous state-spaces and highlight important design choices in different environments.

learning latent decision-aware model, machine learning, reinforcement learning, (1 more...)

arXiv.org Artificial Intelligence

2306.17366

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.60)

Add feedback

Value Gradient weighted Model-Based Reinforcement Learning

Voelcker, Claas, Liao, Victor, Garg, Animesh, Farahmand, Amir-massoud

arXiv.org Artificial IntelligenceJun-20-2023

Model-based reinforcement learning (MBRL) is a sample efficient technique to obtain control policies, yet unavoidable modeling errors often lead performance deterioration. The model in MBRL is often solely fitted to reconstruct dynamics, state observations in particular, while the impact of model error on the policy is not captured by the training objective. This leads to a mismatch between the intended goal of MBRL, enabling good policy and value learning, and the target of the loss function employed in practice, future state prediction. Naive intuition would suggest that value-aware model learning would fix this problem and, indeed, several solutions to this objective mismatch problem have been proposed based on theoretical analysis. However, they tend to be inferior in practice to commonly used maximum likelihood (MLE) based approaches. In this paper we propose the Value-gradient weighted Model Learning (VaGraM), a novel method for value-aware model learning which improves the performance of MBRL in challenging settings, such as small model capacity and the presence of distracting state dimensions. We analyze both MLE and value-aware approaches and demonstrate how they fail to account for exploration and the behavior of function approximation when learning value-aware models and highlight the additional goals that must be met to stabilize optimization in the deep learning setting. We verify our analysis by showing that our loss function is able to achieve high returns on the Mujoco benchmark suite while being more robust than maximum likelihood based approaches.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2204.01464

Country: North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)
(2 more...)

Add feedback

Operator Splitting Value Iteration

Rakhsha, Amin, Wang, Andrew, Ghavamzadeh, Mohammad, Farahmand, Amir-massoud

arXiv.org Artificial IntelligenceNov-25-2022

We introduce new planning and reinforcement learning algorithms for discounted MDPs that utilize an approximate model of the environment to accelerate the convergence of the value function. Inspired by the splitting approach in numerical linear algebra, we introduce Operator Splitting Value Iteration (OS-VI) for both Policy Evaluation and Control problems. OS-VI achieves a much faster convergence rate when the model is accurate enough. We also introduce a sample-based version of the algorithm called OS-Dyna.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

2211.13937

Country: North America > Canada > Ontario (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

The act of remembering: a study in partially observable reinforcement learning

Icarte, Rodrigo Toro, Valenzano, Richard, Klassen, Toryn Q., Christoffersen, Phillip, Farahmand, Amir-massoud, McIlraith, Sheila A.

arXiv.org Artificial IntelligenceOct-4-2020

Reinforcement Learning (RL) agents typically learn memoryless policies---policies that only consider the last observation when selecting actions. Learning memoryless policies is efficient and optimal in fully observable environments. However, some form of memory is necessary when RL agents are faced with partial observability. In this paper, we study a lightweight approach to tackle partial observability in RL. We provide the agent with an external memory and additional actions to control what, if anything, is written to the memory. At every step, the current memory state is part of the agent's observation, and the agent selects a tuple of actions: one action that modifies the environment and another that modifies the memory. When the external memory is sufficiently expressive, optimal memoryless policies yield globally optimal solutions. Unfortunately, previous attempts to use external memory in the form of binary memory have produced poor results in practice. Here, we investigate alternative forms of memory in support of learning effective memoryless policies. Our novel forms of memory outperform binary and LSTM-based memory in well-established partially observable domains.

agent, deep learning, neural network, (22 more...)

arXiv.org Artificial Intelligence

2010.01753

Country: North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.48)

Add feedback

Beyond Prioritized Replay: Sampling States in Model-Based RL via Simulated Priorities

Mei, Jincheng, Pan, Yangchen, White, Martha, Farahmand, Amir-massoud, Yao, Hengshuai

arXiv.org Artificial IntelligenceJul-18-2020

Model-based reinforcement learning (MBRL) can significantly improve sample efficiency, particularly when carefully choosing the states from which to sample hypothetical transitions. Such prioritization has been empirically shown to be useful for both experience replay (ER) and Dyna-style planning. However, there is as yet little theoretical understanding in RL about such prioritization strategies, and why they help. In this work, we revisit prioritized ER and, in an ideal setting, show an equivalence to minimizing cubic loss, providing theoretical insight into why it improves upon uniform sampling. This ideal setting, however, cannot be realized in practice, due to insufficient coverage of the sample space and outdated priorities of training samples. This motivates our model-based approach, which does not suffer from these limitations. Our key idea is to actively search for high priority states using gradient ascent. Under certain conditions, we prove that the distribution of hypothetical experiences generated from these states provides a diverse set of states, sampled proportionally to approximately true priorities. Our experiments on both benchmark and application-oriented domain show that our approach achieves superior performance over both the model-free prioritized ER method and several closely related model-based baselines.

artificial intelligence, objective, reinforcement learning, (19 more...)

arXiv.org Artificial Intelligence

2007.09569

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > Canada > Alberta (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Iterative Value-Aware Model Learning

Farahmand, Amir-massoud

Neural Information Processing SystemsFeb-14-2020, 20:27:39 GMT

This paper introduces a model-based reinforcement learning (MBRL) framework that incorporates the underlying decision problem in learning the transition model of the environment. This is in contrast with conventional approaches to MBRL that learn the model of the environment, for example by finding the maximum likelihood estimate, without taking into account the decision problem. Value-Aware Model Learning (VAML) framework argues that this might not be a good idea, especially if the true model of the environment does not belong to the model class from which we are estimating the model. The original VAML framework, however, may result in an optimization problem that is difficult to solve. This paper introduces a new MBRL class of algorithms, called Iterative VAML, that benefits from the structure of how the planning is performed (i.e., through approximate value iteration) to devise a simpler optimization problem.

artificial intelligence, iterative value-aware model, machine learning, (5 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.90)
Information Technology > Artificial Intelligence > Machine Learning (0.90)

Add feedback

Hill Climbing on Value Estimates for Search-control in Dyna

Pan, Yangchen, Yao, Hengshuai, Farahmand, Amir-massoud, White, Martha

arXiv.org Artificial IntelligenceJun-18-2019

Dyna is an architecture for model-based reinforcement learning (RL), where simulated experience from a model is used to update policies or value functions. A key component of Dyna is search-control, the mechanism to generate the state and action from which the agent queries the model, which remains largely unexplored. In this work, we propose to generate such states by using the trajectory obtained from Hill Climbing (HC) the current estimate of the value function. This has the effect of propagating value from high-value regions and of preemptively updating value estimates of the regions that the agent is likely to visit next. We derive a noisy stochastic projected gradient ascent algorithm for hill climbing, and highlight a connection to Langevin dynamics. We provide an empirical demonstration on four classical domains that our algorithm, HC-Dyna, can obtain significant sample efficiency improvements. We study the properties of different sampling distributions for search-control, and find that there appears to be a benefit specifically from using the samples generated by climbing on current value estimates from low-value to high-value region.

artificial intelligence, hill climbing, optimization problem, (18 more...)

arXiv.org Artificial Intelligence

1906.07791

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > Canada > Alberta (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Improving Skin Condition Classification with a Visual Symptom Checker trained using Reinforcement Learning

Akrout, Mohamed, Farahmand, Amir-massoud, Jarmain, Tory, Abid, Latif

arXiv.org Artificial IntelligenceMar-8-2019

We present a visual symptom checker that combines a pre-trained Convolutional Neural Network (CNN) with a Reinforcement Learning (RL) agent as a Question Answering (QA) model. This method enables us to not only increase the classification confidence and accuracy of the visual symptom checker, but also decreases the average number of relevant questions asked to narrow down the differential diagnosis. By combining the CNN output in the form of classification probabilities as a part of the state structure of the simulated patient's environment, a DQN-based RL agent learns to ask the best symptom that maximizes its expected return over symptoms. We demonstrate that our RL approach increases the accuracy more than 20% as compared to the CNN alone, and up to 10% as compared to the decision tree model. We finally show that the RL approach not only outperforms the performance of the decision tree approach but also narrows down the diagonosis faster in terms of the average number of asked questions.

deep learning, neural network, symptom checker, (20 more...)

arXiv.org Artificial Intelligence

1903.03495

Country: North America > Canada > Ontario > Toronto (0.15)

Genre: Research Report (0.40)

Industry: Health & Medicine > Therapeutic Area > Dermatology (1.00)

Technology: