auc
DDO-RM for LLM Preference Optimization: A Minimal Held-Out Benchmark against DPO
Zhang, Tiantian, Zuo, Jierui, Wang, Wenping
This paper reorganizes the current manuscript around the DPO versus DDO-RM preference-optimization project and focuses on two parts: the algorithmic view and the preliminary held-out benchmark. The benchmark asks a narrow question: even in a minimal pairwise chosen-versus-rejected setting, can a reward-guided decision-distribution update outperform a direct pairwise objective? We compare Direct Preference Optimization (DPO) against DDO-RM on EleutherAI/pythia-410m using HuggingFaceH4/ultrafeedback\_binarized, evaluate on the held-out test\_prefs split, and report results for seeds 42, 13, and 3407. Algorithmically, DDO-RM treats each prompt as a finite decision problem over candidate responses. Instead of optimizing only a binary chosen-rejected relation, it forms a policy distribution over candidates, centers reward-model scores under that distribution, and distills a reward-guided target distribution back into the policy. In the current public benchmark, DDO-RM improves mean pair accuracy from 0.5238 to 0.5602, AUC from 0.5315 to 0.5382, and mean margin from 0.1377 to 0.5353 relative to DPO. These are encouraging but still preliminary results: the study covers one model family, one dataset, one held-out evaluation split, and three seeds.
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
- North America > United States > New York > New York County > New York City (0.04)
- Asia > China > Beijing > Beijing (0.04)
Bayesian Inference of Psychometric Variables From Brain and Behavior in Implicit Association Tests
Kothe, Christian A., Mullen, Sean, Bronstein, Michael V., Hanada, Grant, Cicconet, Marcelo, McInnes, Aaron N., Mullen, Tim, Aafjes, Marc, Sponheim, Scott R., Widge, Alik S.
Objective. We establish a principled method for inferring mental health related psychometric variables from neural and behavioral data using the Implicit Association Test (IAT) as the data generation engine, aiming to overcome the limited predictive performance (typically under 0.7 AUC) of the gold-standard D-score method, which relies solely on reaction times. Approach. We propose a sparse hierarchical Bayesian model that leverages multi-modal data to predict experiences related to mental illness symptoms in new participants. The model is a multivariate generalization of the D-score with trainable parameters, engineered for parameter efficiency in the small-cohort regime typical of IAT studies. Data from two IAT variants were analyzed: a suicidality-related E-IAT ($n=39$) and a psychosis-related PSY-IAT ($n=34$). Main Results. Our approach overcomes a high inter-individual variability and low within-session effect size in the dataset, reaching AUCs of 0.73 (E-IAT) and 0.76 (PSY-IAT) in the best modality configurations, though corrected 95% confidence intervals are wide ($\pm 0.18$) and results are marginally significant after FDR correction ($q=0.10$). Restricting the E-IAT to MDD participants improves AUC to 0.79 $[0.62, 0.97]$ (significant at $q=0.05$). Performance is on par with the best reference methods (shrinkage LDA and EEGNet) for each task, even when the latter were adapted to the task, while the proposed method was not. Accuracy was substantially above near-chance D-scores (0.50-0.53 AUC) in both tasks, with more consistent cross-task performance than any single reference method. Significance. Our framework shows promise for enhancing IAT-based assessment of experiences related to entrapment and psychosis, and potentially other mental health conditions, though further validation on larger and independent cohorts will be needed to establish clinical utility.
- North America > United States > California > San Diego County > La Jolla (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.04)
- (3 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Asia > Middle East > Israel (0.04)
- North America > United States > Massachusetts (0.04)
- North America > United States (0.15)
- Europe > Germany > Lower Saxony (0.04)
- Asia > Taiwan (0.04)
- Asia > Middle East > Jordan (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
- Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning (0.34)
- Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.34)
- Asia > China (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > France > Normandy > Seine-Maritime > Rouen (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.94)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Multi-GranularityCross-modalAlignmentfor GeneralizedMedicalVisualRepresentationLearning (SupplementaryMaterial)
We use the open-source mimic-cxr repository4 to extract impression and findings for each report. Following [9], we pick out sequences of alphanumeric characters and drop all other characters and symbols for all reports, and remove reports which contain less than3 tokens. Following common practice in ViT [5], we split the radiograph with patch size16 16,which results in 196 visual tokens for each image. The instance-level projection layer is a two-layer MultiLayer Perceptron (MLP) with Batch Normalization [10] and ReLU activation function. Additionally, we use a frozen Batch Normalization layer after the MLP toobtain instance-levelembeddings.
- Health & Medicine > Nuclear Medicine (0.49)
- Health & Medicine > Diagnostic Medicine > Imaging (0.49)