Not enough data to create a plot.
Try a different view from the menu above.
Li, Dexun
Diversity Induced Environment Design via Self-Play
Li, Dexun, Li, Wenjun, Varakantham, Pradeep
Recent work on designing an appropriate distribution of environments has shown promise for training effective generally capable agents. Its success is partly because of a form of adaptive curriculum learning that generates environment instances (or levels) at the frontier of the agent's capabilities. However, such an environment design framework often struggles to find effective levels in challenging design spaces and requires costly interactions with the environment. In this paper, we aim to introduce diversity in the Unsupervised Environment Design (UED) framework. Specifically, we propose a task-agnostic method to identify observed/hidden states that are representative of a given level. The outcome of this method is then utilized to characterize the diversity between two levels, which as we show can be crucial to effective performance. In addition, to improve sampling efficiency, we incorporate the self-play technique that allows the environment generator to automatically generate environments that are of great benefit to the training agent. Quantitatively, our approach, Diversity-induced Environment Design via Self-Play (MBeDED), shows compelling performance over existing methods.
Hidden State Approximation in Recurrent Neural Networks Using Continuous Particle Filtering
Li, Dexun
Using historical data to predict future events has many applications in the real world, such as stock price prediction; the robot localization. In the past decades, the Convolutional long short-term memory (LSTM) networks have achieved extraordinary success with sequential data in the related field. However, traditional recurrent neural networks (RNNs) keep the hidden states in a deterministic way. In this paper, we use the particles to approximate the distribution of the latent state and show how it can extend into a more complex form, i.e., the Encoder-Decoder mechanism. With the proposed continuous differentiable scheme, our model is capable of adaptively extracting valuable information and updating the latent state according to the Bayes rule. Our empirical studies demonstrate the effectiveness of our method in the prediction tasks.
Towards Soft Fairness in Restless Multi-Armed Bandits
Li, Dexun, Varakantham, Pradeep
Restless multi-armed bandits (RMAB) is a framework for allocating limited resources under uncertainty. It is an extremely useful model for monitoring beneficiaries and executing timely interventions to ensure maximum benefit in public health settings (e.g., ensuring patients take medicines in tuberculosis settings, ensuring pregnant mothers listen to automated calls about good pregnancy practices). Due to the limited resources, typically certain communities or regions are starved of interventions that can have follow-on effects. To avoid starvation in the executed interventions across individuals/regions/communities, we first provide a soft fairness constraint and then provide an approach to enforce the soft fairness constraint in RMABs. The soft fairness constraint requires that an algorithm never probabilistically favor one arm over another if the long-term cumulative reward of choosing the latter arm is higher. Our approach incorporates softmax based value iteration method in the RMAB setting to design selection algorithms that manage to satisfy the proposed fairness constraint. Our method, referred to as SoftFair, also provides theoretical performance guarantees and is asymptotically optimal. Finally, we demonstrate the utility of our approaches on simulated benchmarks and show that the soft fairness constraint can be handled without a significant sacrifice on value.
CLAIM: Curriculum Learning Policy for Influence Maximization in Unknown Social Networks
Li, Dexun, Lowalekar, Meghna, Varakantham, Pradeep
Influence maximization is the problem of finding a small subset of nodes in a network that can maximize the diffusion of information. Recently, it has also found application in HIV prevention, substance abuse prevention, micro-finance adoption, etc., where the goal is to identify the set of peer leaders in a real-world physical social network who can disseminate information to a large group of people. Unlike online social networks, real-world networks are not completely known, and collecting information about the network is costly as it involves surveying multiple people. In this paper, we focus on this problem of network discovery for influence maximization. The existing work in this direction proposes a reinforcement learning framework. As the environment interactions in real-world settings are costly, so it is important for the reinforcement learning algorithms to have minimum possible environment interactions, i.e, to be sample efficient. In this work, we propose CLAIM - Curriculum LeArning Policy for Influence Maximization to improve the sample efficiency of RL methods. We conduct experiments on real-world datasets and show that our approach can outperform the current best approach.