Wang, Hao

Efficient Average Reward Reinforcement Learning Using Constant Shifting Values

AAAI Conferences

There are two classes of average reward reinforcement learning (RL) algorithms: model-based ones that explicitly maintain MDP models and model-free ones that do not learn such models. Though model-free algorithms are known to be more efficient, they often cannot converge to optimal policies due to the perturbation of parameters. In this paper, a novel model-free algorithm is proposed, which makes use of constant shifting values (CSVs) estimated from prior knowledge. To encourage exploration during the learning process, the algorithm constantly subtracts the CSV from the rewards. A terminating condition is proposed to handle the unboundedness of Q-values caused by such substraction. The convergence of the proposed algorithm is proved under very mild assumptions. Furthermore, linear function approximation is investigated to generalize our method to handle large-scale tasks. Extensive experiments on representative MDPs and the popular game Tetris show that the proposed algorithms significantly outperform the state-of-the-art ones.

Collaborative Topic Regression with Social Regularization for Tag Recommendation

AAAI Conferences

Recently, tag recommendation (TR) has become a very hot research topic in data mining and related areas. However, neither co-occurrence based methods which only use the item-tag matrix nor content based methods which only use the item content information can achieve satisfactory performance in real TR applications. Hence, how to effectively combine the item-tag matrix, item content information, and other auxiliary information into the same recommendation framework is the key challenge for TR. In this paper, we first adapt the collaborative topic regression (CTR) model, which has been successfully applied for article recommendation, to combine both item-tag matrix and item content information for TR. Furthermore, by extending CTR we propose a novel hierarchical Bayesian model, called CTR with social regularization (CTR-SR), to seamlessly integrate the item-tag matrix, item content information, and social networks between items into the same principled model. Experiments on real data demonstrate the effectiveness of our proposed models.

Online Egocentric Models for Citation Networks

AAAI Conferences

With the emergence of large-scale evolving (time-varying)networks, dynamic network analysis (DNA) has become a very hot research topic in recent years. Although a lot of DNA methods have been proposed by researchers from different communities, most of them can only model snapshot data recorded at a very rough temporal granularity. Recently, some models have been proposed for DNA which can be used to model large-scale citation networks at a fine temporal granularity. However, they suffer from a significant decrease of accuracy over time because the learned parameters or node features are static (fixed) during the prediction process for evolving citation networks. In this paper, we propose a novel model,called online egocentric model (OEM), to learn time-varying parameters and node features for evolving citation networks. Experimental results on real-world citation networks show that our OEM can not only prevent the prediction accuracy from decreasing over time but also uncover the evolution of topics in citation networks.