Farsad, Nariman
Multi-Task Reinforcement Learning Enables Parameter Scaling
McLean, Reginald, Chatzaroulas, Evangelos, Terry, Jordan, Woungang, Isaac, Farsad, Nariman, Castro, Pablo Samuel
Multi-task reinforcement learning (MTRL) aims to endow a single agent with the ability to perform well on multiple tasks. Recent works have focused on developing novel sophisticated architectures to improve performance, often resulting in larger models; it is unclear, however, whether the performance gains are a consequence of the architecture design itself or the extra parameters. We argue that gains are mostly due to scale by demonstrating that naรฏvely scaling up a simple MTRL baseline to match parameter counts outperforms the more sophisticated architectures, and these gains benefit most from scaling the critic over the actor. Additionally, we explore the training stability advantages that come with task diversity, demonstrating that increasing the number of tasks can help mitigate plasticity loss. Our findings suggest that MTRL's simultaneous training across multiple tasks provides a natural framework for beneficial parameter scaling in reinforcement learning, challenging the need for complex architectural innovations.
Video-Language Critic: Transferable Reward Functions for Language-Conditioned Robotics
Alakuijala, Minttu, McLean, Reginald, Woungang, Isaac, Farsad, Nariman, Kaski, Samuel, Marttinen, Pekka, Yuan, Kai
Natural language is often the easiest and most convenient modality for humans to specify tasks for robots. However, learning to ground language to behavior typically requires impractical amounts of diverse, language-annotated demonstrations collected on each target robot. In this work, we aim to separate the problem of what to accomplish from how to accomplish it, as the former can benefit from substantial amounts of external observation-only data, and only the latter depends on a specific robot embodiment. To this end, we propose Video-Language Critic, a reward model that can be trained on readily available cross-embodiment data using contrastive learning and a temporal ranking objective, and use it to score behavior traces from a separate reinforcement learning actor. When trained on Open X-Embodiment data, our reward model enables 2x more sample-efficient policy training on Meta-World tasks than a sparse reward only, despite a significant domain gap. Using in-domain data but in a challenging task generalization setting on Meta-World, we further demonstrate more sample-efficient training than is possible with prior language-conditioned reward models that are either trained with binary classification, use static images, or do not leverage the temporal information present in video data.
Model-Based Machine Learning for Communications
Shlezinger, Nir, Farsad, Nariman, Eldar, Yonina C., Goldsmith, Andrea J.
Traditional communication systems design is dominated by methods that are based on statistical models. These statistical-model-based algorithms, which we refer to henceforth as model-based methods, rely on mathematical models that describe the transmission process, signal propagation, receiver noise, interference, and many other components of the system that affect the end-to-end signal transmission and reception. Such mathematical models use parameters that vary over time as the channel conditions, the environment, network traffic, or network topology change. Therefore, for optimal operation, many of the algorithms used in communication systems rely on the underlying mathematical models as well as the estimation of the model parameters. However, there are cases where this approach fails, in particular when the mathematical models for one or more of the system components are highly complex, hard to estimate, poorly understood, do not well-capture the underlying physics of the system, or do not lend themselves to computationally-efficient algorithms.
Inference from Stationary Time Sequences via Learned Factor Graphs
Shlezinger, Nir, Farsad, Nariman, Eldar, Yonina C., Goldsmith, Andrea J.
The design of methods for inference from time sequences has traditionally relied on statistical models that describe the relation between a latent desired sequence and the observed one. A broad family of model-based algorithms have been derived to carry out inference at controllable complexity using recursive computations over the factor graph representing the underlying distribution. An alternative model-agnostic approach utilizes machine learning (ML) methods. Here we propose a framework that combines model-based inference algorithms and data-driven ML tools for stationary time sequences. In the proposed approach, neural networks are developed to separately learn specific components of a factor graph describing the distribution of the time sequence, rather than the complete inference task. By exploiting stationary properties of this distribution, the resulting approach can be applied to sequences of varying temporal duration. Additionally, this approach facilitates the use of compact neural networks which can be trained with small training sets, or alternatively, can be used to improve upon existing deep inference systems. We present an inference algorithm based on learned stationary factor graphs, referred to as StaSPNet, which learns to implement the sum product scheme from labeled data, and can be applied to sequences of different lengths. Our experimental results demonstrate the ability of the proposed StaSPNet to learn to carry out accurate inference from small training sets for sleep stage detection using the Sleep-EDF dataset, as well as for symbol detection in digital communications with unknown channels.
Data-Driven Symbol Detection via Model-Based Machine Learning
Farsad, Nariman, Shlezinger, Nir, Goldsmith, Andrea J., Eldar, Yonina C.
The design of symbol detectors in digital communication systems has traditionally relied on statistical channel models that describe the relation between the transmitted symbols and the observed signal at the receiver. Here we review a data-driven framework to symbol detection design which combines machine learning (ML) and model-based algorithms. In this hybrid approach, well-known channelmodel-based algorithms such as the Viterbi method, BCJR detection, and multiple-input multiple-output (MIMO) soft interference cancellation (SIC) are augmented with MLbased algorithms to remove their channel-model-dependence, allowing the receiver to learn to implement these algorithms solely from data. The resulting data-driven receivers are most suitable for systems where the underlying channel models are poorly understood, highly complex, or do not well-capture the underlying physics. Our approach is unique in that it only replaces the channel-model-based computations with dedicated neural networks that can be trained from a small amount of data, while keeping the general algorithm intact. Our results demonstrate that these techniques can yield near-optimal performance of model-based algorithms without knowing the exact channel input-output statistical relationship and in the presence of channel state information uncertainty. I. INTRODUCTION In digital communication systems, the receiver is required to reliably recover the transmitted symbols from the observed channel output. This task is commonly referred to as symbol detection. Conventional symbol detection algorithms, such as those based on the maximum a-posteriori probability (MAP) rule, require complete knowledge of the underlying channel model and its parameters [1], [2]. This work was supported in part by the US - Israel Binational Science Foundation under grant No. 2026094, by the Israel Science Foundation under grant No. 0100101, and by the Office of the Naval Research under grant No. 18-1-2191. N. Shlezinger and Y. C. Eldar are with the Faculty of Math and CS, Weizmann Institute of Science, Rehovot, Israel (email: nirshlezinger1@gmail.com; yonina@weizmann.ac.il). Furthermore, when the channel models are known, many detection algorithms rely on channel state information (CSI), i.e., the instantaneous parameters of the channel model, for detection. Therefore, conventional channel-model-based techniques require the instantaneous CSI to be estimated.
ViterbiNet: A Deep Learning Based Viterbi Algorithm for Symbol Detection
Shlezinger, Nir, Farsad, Nariman, Eldar, Yonina C., Goldsmith, Andrea J.
Symbol detection plays an important role in the implementation of digital receivers. In this work, we propose ViterbiNet, which is a data-driven symbol detector that does not require channel state information (CSI). ViterbiNet is obtained by integrating deep neural networks (DNNs) into the Viterbi algorithm. We identify the specific parts of the Viterbi algorithm that are channel-model-based, and design a DNN to implement only those computations, leaving the rest of the algorithm structure intact. We then propose a meta-learning based approach to train ViterbiNet online based on recent decisions, allowing the receiver to track dynamic channel conditions without requiring new training samples for every coherence block. Our numerical evaluations demonstrate that the performance of ViterbiNet, which is ignorant of the CSI, approaches that of the CSI-based Viterbi algorithm, and is capable of tracking time-varying channels without needing instantaneous CSI or additional training data. Moreover, unlike conventional Viterbi detection, ViterbiNet is robust to CSI uncertainty, and it can be reliably implemented in complex channel models with constrained computational burden. More broadly, our results demonstrate the conceptual benefit of designing communication systems to that integrate DNNs into established algorithms.