prediction reward
Multi-agent active perception with prediction rewards
Multi-agent active perception is a task where a team of agents cooperatively gathers observations to compute a joint estimate of a hidden variable. The task is decentralized and the joint estimate can only be computed after the task ends by fusing observations of all agents. The objective is to maximize the accuracy of the estimate. The accuracy is quantified by a centralized prediction reward determined by a centralized decision-maker who perceives the observations gathered by all agents after the task ends. In this paper, we model multi-agent active perception as a decentralized partially observable Markov decision process (Dec-POMDP) with a convex centralized prediction reward. We prove that by introducing individual prediction actions for each agent, the problem is converted into a standard Dec-POMDP with a decentralized prediction reward. The loss due to decentralization is bounded, and we give a sufficient condition for when it is zero. Our results allow application of any Dec-POMDP solution algorithm to multi-agent active perception problems, and enable planning to reduce uncertainty without explicit computation of joint estimates. We demonstrate the empirical usefulness of our results by applying a standard Dec-POMDP algorithm to multi-agent active perception problems, showing increased scalability in the planning horizon.
- Europe > Netherlands > South Holland > Delft (0.04)
- North America > Canada (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Germany > Hamburg (0.04)
- Europe > Netherlands > South Holland > Delft (0.04)
- North America > Canada (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Germany > Hamburg (0.04)
PreferThinker: Reasoning-based Personalized Image Preference Assessment
Xu, Shengqi, Zhou, Xinpeng, Zhang, Yabo, Liu, Ming, Liang, Tao, Zhang, Tianyu, Bai, Yalong, Wu, Zuxuan, Zuo, Wangmeng
Personalized image preference assessment aims to evaluate an individual user's image preferences by relying only on a small set of reference images as prior information. Existing methods mainly focus on general preference assessment, training models with large-scale data to tackle well-defined tasks such as text-image alignment. However, these approaches struggle to handle personalized preference because user-specific data are scarce and not easily scalable, and individual tastes are often diverse and complex. To overcome these challenges, we introduce a common preference profile that serves as a bridge across users, allowing large-scale user data to be leveraged for training profile prediction and capturing complex personalized preferences. Building on this idea, we propose a reasoning-based personalized image preference assessment framework that follows a \textit{predict-then-assess} paradigm: it first predicts a user's preference profile from reference images, and then provides interpretable, multi-dimensional scores and assessments of candidate images based on the predicted profile. To support this, we first construct a large-scale Chain-of-Thought (CoT)-style personalized assessment dataset annotated with diverse user preference profiles and high-quality CoT-style reasoning, enabling explicit supervision of structured reasoning. Next, we adopt a two-stage training strategy: a cold-start supervised fine-tuning phase to empower the model with structured reasoning capabilities, followed by reinforcement learning to incentivize the model to explore more reasonable assessment paths and enhance generalization. Furthermore, we propose a similarity-aware prediction reward to encourage better prediction of the user's preference profile, which facilitates more reasonable assessments exploration. Extensive experiments demonstrate the superiority of the proposed method.
- Europe > Netherlands > South Holland > Delft (0.04)
- North America > Canada (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Germany > Hamburg (0.04)
- Europe > Netherlands > South Holland > Delft (0.04)
- North America > Canada (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Germany > Hamburg (0.04)
Review for NeurIPS paper: Multi-agent active perception with prediction rewards
Weaknesses: The paper is well written and easy to follow. The problem is active perception is also interesting. There are a few areas where more clarification is needed as pointed below: -- The authors have highlighted a number of previous models for the problem of active perception such as Dec-\rhoPOMDP, POMDP-IR etc. Given the focus on converting this problem to a decentralized framework, it is not clearly conveyed why decentralizing the problem is significant? There are hints available in the paper such as less communication overhead, but there is no empirical evidence presented towards justifying decentralized approaches over such previous approaches (e.g., how much communication overhead is reduced) -- The technical approach presented by the authors is elegant and simple, but it is essentially a heuristic approach. The bound provided in theorem 1 would seem to be loose in the worst case (and its values in experiments is not shown).
Review for NeurIPS paper: Multi-agent active perception with prediction rewards
This paper addresses the problem of multiagent active perception, a somewhat nascent area, and proposes a new reformulation of Dec-rho-POMDPs into a DEC-POMDP though the addition of a final-stage "predictive action." The reviewers appreciated the novelty of this contribution as well as the theoretical analysis/loss bounds. The original reviews raised a number of questions however, and the author response addressed many of these. However, there remain some issues that undercut the significance of the contribution, including: the somewhat incremental combination/adaptation of existing techniques; the fact that the claimed scalability is not demonstrated very convincingly in the experiments; among others. On my reading of the paper, I largely concur and do not reiterate the positive contributions in the other reviews, but point out some concerns about importance/impact: 1.
Multi-agent active perception with prediction rewards
Multi-agent active perception is a task where a team of agents cooperatively gathers observations to compute a joint estimate of a hidden variable. The task is decentralized and the joint estimate can only be computed after the task ends by fusing observations of all agents. The objective is to maximize the accuracy of the estimate. The accuracy is quantified by a centralized prediction reward determined by a centralized decision-maker who perceives the observations gathered by all agents after the task ends. In this paper, we model multi-agent active perception as a decentralized partially observable Markov decision process (Dec-POMDP) with a convex centralized prediction reward.
Dynamic feature selection in medical predictive monitoring by reinforcement learning
Chen, Yutong, Gao, Jiandong, Wu, Ji
In this paper, we investigate dynamic feature selection within multivariate time-series scenario, a common occurrence in clinical prediction monitoring where each feature corresponds to a bio-test result. Many existing feature selection methods fall short in effectively leveraging time-series information, primarily because they are designed for static data. Our approach addresses this limitation by enabling the selection of time-varying feature subsets for each patient. Specifically, we employ reinforcement learning to optimize a policy under maximum cost restrictions. The prediction model is subsequently updated using synthetic data generated by trained policy. Our method can seamlessly integrate with non-differentiable prediction models. We conducted experiments on a sizable clinical dataset encompassing regression and classification tasks. The results demonstrate that our approach outperforms strong feature selection baselines, particularly when subjected to stringent cost limitations. Code will be released once paper is accepted.
- Asia > China > Beijing > Beijing (0.04)
- North America > United States > Nevada > Clark County > Las Vegas (0.04)
- Health & Medicine > Therapeutic Area (0.93)
- Health & Medicine > Diagnostic Medicine (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)