Goto

Collaborating Authors

 Shi, Yingdan


HODDI: A Dataset of High-Order Drug-Drug Interactions for Computational Pharmacovigilance

arXiv.org Artificial Intelligence

Drug-side effect research is vital for understanding adverse reactions arising in complex multi-drug therapies. However, the scarcity of higher-order datasets that capture the combinatorial effects of multiple drugs severely limits progress in this field. Existing resources such as TWOSIDES primarily focus on pairwise interactions. To fill this critical gap, we introduce HODDI, the first Higher-Order Drug-Drug Interaction Dataset, constructed from U.S. Food and Drug Administration (FDA) Adverse Event Reporting System (FAERS) records spanning the past decade, to advance computational pharmacovigilance. HODDI contains 109,744 records involving 2,506 unique drugs and 4,569 unique side effects, specifically curated to capture multi-drug interactions and their collective impact on adverse effects. Comprehensive statistical analyses demonstrate HODDI's extensive coverage and robust analytical metrics, making it a valuable resource for studying higher-order drug relationships. Evaluating HODDI with multiple models, we found that simple Multi-Layer Perceptron (MLP) can outperform graph models, while hypergraph models demonstrate superior performance in capturing complex multi-drug interactions, further validating HODDI's effectiveness. Our findings highlight the inherent value of higher-order information in drug-side effect prediction and position HODDI as a benchmark dataset for advancing research in pharmacovigilance, drug safety, and personalized medicine. The dataset and codes are available at https://github.com/TIML-Group/HODDI.


Redefining Machine Unlearning: A Conformal Prediction-Motivated Approach

arXiv.org Artificial Intelligence

Machine unlearning seeks to systematically remove specified data from a trained model, effectively achieving a state as though the data had never been encountered during training. While metrics such as Unlearning Accuracy (UA) and Membership Inference Attack (MIA) provide a baseline for assessing unlearning performance, they fall short of evaluating the completeness and reliability of forgetting. This is because the ground truth labels remain potential candidates within the scope of uncertainty quantification, leaving gaps in the evaluation of true forgetting. In this paper, we identify critical limitations in existing unlearning metrics and propose enhanced evaluation metrics inspired by conformal prediction. Our metrics can effectively capture the extent to which ground truth labels are excluded from the prediction set. Furthermore, we observe that many existing machine unlearning methods do not achieve satisfactory forgetting performance when evaluated with our new metrics. To address this, we propose an unlearning framework that integrates conformal prediction insights into Carlini & Wagner adversarial attack loss. Extensive experiments on the image classification task demonstrate that our enhanced metrics offer deeper insights into unlearning effectiveness, and that our unlearning framework significantly improves the forgetting quality of unlearning methods.


DySuse: Susceptibility Estimation in Dynamic Social Networks

arXiv.org Artificial Intelligence

As a fundamental task in the field of social Influence estimation has been studied along with influence computing, influence estimation focuses on mining such maximization for years, and most work estimates the complex and rich information from the macro perspective influence by repetitively simulating the influence diffusion to support many social applications such as viral marketing process. To simulate the influence diffusion process, Independent (Zhou et al., 2019a), social recommendation (Chen and Cascade (IC) diffusion model and Linear Threshold Wong, 2021), etc. Given a social network and an initial set of (LT) diffusion model have been proposed by Kempe et seed users, traditional influence estimation (Wu and Wang, al. (Kempe et al., 2003). As two simple but fundamental 2020) aims at predicting how many users are influenced by diffusion models, the IC model assumes that a user may the initial set of seed users (i.e., influence spread), while be influenced by one of its neighbors, and each influenced neglects individual susceptibility. Susceptibility estimation user has a certain probability of influencing its neighbors. In focuses on predicting the probability of an individual user contrast, in the LT model, a user will be influenced once the being influenced from the microscopic perspective and has total influence of all its neighbors exceeds a threshold. Based extensive applications in practice. For instance, in marketing on these two diffusion models, Kempe et al. (Kempe et al., activities, enterprises can use susceptibility estimation to 2003) generalize them to a model called the triggering (TR) identify the potential users who are most likely to purchase diffusion model.