Chen, Ying
MC-LLaVA: Multi-Concept Personalized Vision-Language Model
An, Ruichuan, Yang, Sihan, Lu, Ming, Zhang, Renrui, Zeng, Kai, Luo, Yulin, Cao, Jiajun, Liang, Hao, Chen, Ying, She, Qi, Zhang, Shanghang, Zhang, Wentao
Current vision-language models (VLMs) show exceptional abilities across diverse tasks, such as visual question answering. To enhance user experience, recent studies investigate VLM personalization to understand user-provided concepts. However, they mainly focus on single-concept personalization, neglecting the existence and interplay of multiple concepts, which limits real-world applicability. This paper proposes the first multi-concept personalization paradigm, MC-LLaVA. Specifically, MC-LLaVA employs a multi-concept instruction tuning strategy, effectively integrating multiple concepts in a single training step. To reduce the costs related to joint training, we propose a personalized textual prompt that uses visual token information to initialize concept tokens. Additionally, we introduce a personalized visual prompt during inference, aggregating location confidence maps for enhanced recognition and grounding capabilities. To advance multi-concept personalization research, we further contribute a high-quality instruction tuning dataset. We carefully collect images with multiple characters and objects from movies and manually generate question-answer samples for multi-concept scenarios, featuring superior diversity. Comprehensive qualitative and quantitative experiments demonstrate that MC-LLaVA can achieve impressive multi-concept personalized responses, paving the way for VLMs to become better user-specific assistants. The code and dataset will be publicly available at https://github.com/arctanxarc/MC-LLaVA}.
Squeeze Out Tokens from Sample for Finer-Grained Data Governance
Lin, Weixiong, Ju, Chen, Wang, Haicheng, Hu, Shengchao, Xiao, Shuai, Chen, Mengting, Jiao, Yuheng, Yao, Mingshuai, Lan, Jinsong, Liu, Qingwen, Chen, Ying
Widely observed data scaling laws, in which error falls off as a power of the training size, demonstrate the diminishing returns of unselective data expansion. Hence, data governance is proposed to downsize datasets through pruning non-informative samples. Yet, isolating the impact of a specific sample on overall model performance is challenging, due to the vast computation required for tryout all sample combinations. Current data governors circumvent this complexity by estimating sample contributions through heuristic-derived scalar scores, thereby discarding low-value ones. Despite thorough sample sieving, retained samples contain substantial undesired tokens intrinsically, underscoring the potential for further compression and purification. In this work, we upgrade data governance from a 'sieving' approach to a 'juicing' one. Instead of scanning for least-flawed samples, our dual-branch DataJuicer applies finer-grained intra-sample governance. It squeezes out informative tokens and boosts image-text alignments. Specifically, the vision branch retains salient image patches and extracts relevant object classes, while the text branch incorporates these classes to enhance captions. Consequently, DataJuicer yields more refined datasets through finer-grained governance. Extensive experiments across datasets demonstrate that DataJuicer significantly outperforms existing DataSieve in image-text retrieval, classification, and dense visual reasoning.
Hybrid Quantum Neural Networks with Amplitude Encoding: Advancing Recovery Rate Predictions
Chen, Ying, Griffin, Paul, Recchia, Paolo, Zhou, Lei, Zhang, Hongrui
Recovery rate prediction plays a pivotal role in bond investment strategies, enhancing risk assessment, optimizing portfolio allocation, improving pricing accuracy, and supporting effective credit risk management. However, forecasting faces challenges like high-dimensional features, small sample sizes, and overfitting. We propose a hybrid Quantum Machine Learning model incorporating Parameterized Quantum Circuits (PQC) within a neural network framework. PQCs inherently preserve unitarity, avoiding computationally costly orthogonality constraints, while amplitude encoding enables exponential data compression, reducing qubit requirements logarithmically. Applied to a global dataset of 1,725 observations (1996-2023), our method achieved superior accuracy (RMSE 0.228) compared to classical neural networks (0.246) and quantum models with angle encoding (0.242), with efficient computation times. This work highlights the potential of hybrid quantum-classical architectures in advancing recovery rate forecasting.
CLIP-DQA: Blindly Evaluating Dehazed Images from Global and Local Perspectives Using CLIP
Zeng, Yirui, Fu, Jun, Amirpour, Hadi, Wang, Huasheng, Yue, Guanghui, Liu, Hantao, Chen, Ying, Zhou, Wei
Blind dehazed image quality assessment (BDQA), which aims to accurately predict the visual quality of dehazed images without any reference information, is essential for the evaluation, comparison, and optimization of image dehazing algorithms. Existing learning-based BDQA methods have achieved remarkable success, while the small scale of DQA datasets limits their performance. To address this issue, in this paper, we propose to adapt Contrastive Language-Image Pre-Training (CLIP), pre-trained on large-scale image-text pairs, to the BDQA task. Specifically, inspired by the fact that the human visual system understands images based on hierarchical features, we take global and local information of the dehazed image as the input of CLIP. To accurately map the input hierarchical information of dehazed images into the quality score, we tune both the vision branch and language branch of CLIP with prompt learning. Experimental results on two authentic DQA datasets demonstrate that our proposed approach, named CLIP-DQA, achieves more accurate quality predictions over existing BDQA methods. The code is available at https://github.com/JunFu1995/CLIP-DQA.
SELMA3D challenge: Self-supervised learning for 3D light-sheet microscopy image segmentation
Chen, Ying, Al-Maskari, Rami, Horvath, Izabela, Ali, Mayar, Hoher, Luciano, Yang, Kaiyuan, Lin, Zengming, Zhai, Zhiwei, Shen, Mengzhe, Xun, Dejin, Wang, Yi, Xu, Tony, Goubran, Maged, Wu, Yunheng, Mori, Kensaku, Paetzold, Johannes C., Erturk, Ali
Recent innovations in light sheet microscopy, paired with developments in tissue clearing techniques, enable the 3D imaging of large mammalian tissues with cellular resolution. Combined with the progress in large-scale data analysis, driven by deep learning, these innovations empower researchers to rapidly investigate the morphological and functional properties of diverse biological samples. Segmentation, a crucial preliminary step in the analysis process, can be automated using domain-specific deep learning models with expert-level performance. However, these models exhibit high sensitivity to domain shifts, leading to a significant drop in accuracy when applied to data outside their training distribution. To address this limitation, and inspired by the recent success of self-supervised learning in training generalizable models, we organized the SELMA3D Challenge during the MICCAI 2024 conference. SELMA3D provides a vast collection of light-sheet images from cleared mice and human brains, comprising 35 large 3D images-each with over 1000^3 voxels-and 315 annotated small patches for finetuning, preliminary testing and final testing. The dataset encompasses diverse biological structures, including vessel-like and spot-like structures. Five teams participated in all phases of the challenge, and their proposed methods are reviewed in this paper. Quantitative and qualitative results from most participating teams demonstrate that self-supervised learning on large datasets improves segmentation model performance and generalization. We will continue to support and extend SELMA3D as an inaugural MICCAI challenge focused on self-supervised learning for 3D microscopy image segmentation.
Analyzing Country-Level Vaccination Rates and Determinants of Practical Capacity to Administer COVID-19 Vaccines
Hegde, Sharika J., Ng, Max T. M., Rios, Marcos, Mahmassani, Hani S., Chen, Ying, Smilowitz, Karen
The COVID-19 vaccine development, manufacturing, transportation, and administration proved an extreme logistics operation of global magnitude. Global vaccination levels, however, remain a key concern in preventing the emergence of new strains and minimizing the impact of the pandemic's disruption of daily life. In this paper, country-level vaccination rates are analyzed through a queuing framework to extract service rates that represent the practical capacity of a country to administer vaccines. These rates are further characterized through regression and interpretable machine learning methods with country-level demographic, governmental, and socio-economic variates. Model results show that participation in multi-governmental collaborations such as COVAX may improve the ability to vaccinate. Similarly, improved transportation and accessibility variates such as roads per area for low-income countries and rail lines per area for high-income countries can improve rates. It was also found that for low-income countries specifically, improvements in basic and health infrastructure (as measured through spending on healthcare, number of doctors and hospital beds per 100k, population percent with access to electricity, life expectancy, and vehicles per 1000 people) resulted in higher vaccination rates. Of the high-income countries, those with larger 65-plus populations struggled to vaccinate at high rates, indicating potential accessibility issues for the elderly. This study finds that improving basic and health infrastructure, focusing on accessibility in the last mile, particularly for the elderly, and fostering global partnerships can improve logistical operations of such a scale. Such structural impediments and inequities in global health care must be addressed in preparation for future global public health crises.
Group Shapley with Robust Significance Testing and Its Application to Bond Recovery Rate Prediction
Wang, Jingyi, Chen, Ying, Giudici, Paolo
We propose Group Shapley, a metric that extends the classical individual-level Shapley value framework to evaluate the importance of feature groups, addressing the structured nature of predictors commonly found in business and economic data. More importantly, we develop a significance testing procedure based on a three-cumulant chi-square approximation and establish the asymptotic properties of the test statistics for Group Shapley values. Our approach can effectively handle challenging scenarios, including sparse or skewed distributions and small sample sizes, outperforming alternative tests such as the Wald test. Simulations confirm that the proposed test maintains robust empirical size and demonstrates enhanced power under diverse conditions. To illustrate the method's practical relevance in advancing Explainable AI, we apply our framework to bond recovery rate predictions using a global dataset (1996-2023) comprising 2,094 observations and 98 features, grouped into 16 subgroups and five broader categories: bond characteristics, firm fundamentals, industry-specific factors, market-related variables, and macroeconomic indicators. Our results identify the market-related variables group as the most influential. Furthermore, Lorenz curves and Gini indices reveal that Group Shapley assigns feature importance more equitably compared to individual Shapley values.
AdaMixup: A Dynamic Defense Framework for Membership Inference Attack Mitigation
Chen, Ying, Chen, Jiajing, Weng, Yijie, Chang, ChiaHua, Yu, Dezhi, Lin, Guanbiao
Membership inference attacks have emerged as a significant privacy concern in the training of deep learning models, where attackers can infer whether a data point was part of the training set based on the model's outputs. To address this challenge, we propose a novel defense mechanism, AdaMixup. AdaMixup employs adaptive mixup techniques to enhance the model's robustness against membership inference attacks by dynamically adjusting the mixup strategy during training. This method not only improves the model's privacy protection but also maintains high performance. Experimental results across multiple datasets demonstrate that AdaMixup significantly reduces the risk of membership inference attacks while achieving a favorable trade-off between defensive efficiency and model accuracy. This research provides an effective solution for data privacy protection and lays the groundwork for future advancements in mixup training methods.
Imitate Before Detect: Aligning Machine Stylistic Preference for Machine-Revised Text Detection
Chen, Jiaqi, Zhu, Xiaoye, Liu, Tianyang, Chen, Ying, Chen, Xinhui, Yuan, Yiwen, Leong, Chak Tou, Li, Zuchao, Long, Tang, Zhang, Lei, Yan, Chenyu, Mei, Guanghao, Zhang, Jie, Zhang, Lefei
Large Language Models (LLMs) have revolutionized text generation, making detecting machine-generated text increasingly challenging. Although past methods have achieved good performance on detecting pure machine-generated text, those detectors have poor performance on distinguishing machine-revised text (rewriting, expansion, and polishing), which can have only minor changes from its original human prompt. As the content of text may originate from human prompts, detecting machine-revised text often involves identifying distinctive machine styles, e.g., worded favored by LLMs. However, existing methods struggle to detect machine-style phrasing hidden within the content contributed by humans. We propose the "Imitate Before Detect" (ImBD) approach, which first imitates the machine-style token distribution, and then compares the distribution of the text to be tested with the machine-style distribution to determine whether the text has been machine-revised. To this end, we introduce style preference optimization (SPO), which aligns a scoring LLM model to the preference of text styles generated by machines. The aligned scoring model is then used to calculate the style-conditional probability curvature (Style-CPC), quantifying the log probability difference between the original and conditionally sampled texts for effective detection. We conduct extensive comparisons across various scenarios, encompassing text revisions by six LLMs, four distinct text domains, and three machine revision types. Compared to existing state-of-the-art methods, our method yields a 13% increase in AUC for detecting text revised by open-source LLMs, and improves performance by 5% and 19% for detecting GPT-3.5 and GPT-4o revised text, respectively. Notably, our method surpasses the commercially trained GPT-Zero with just $1,000$ samples and five minutes of SPO, demonstrating its efficiency and effectiveness.
Practical Performative Policy Learning with Strategic Agents
Chen, Qianyi, Chen, Ying, Li, Bo
This paper studies the performative policy learning problem, where agents adjust their features in response to a released policy to improve their potential outcomes, inducing an endogenous distribution shift. There has been growing interest in training machine learning models in strategic environments, including strategic classification and performative prediction. However, existing approaches often rely on restrictive parametric assumptions: micro-level utility models in strategic classification and macro-level data distribution maps in performative prediction, severely limiting scalability and generalizability. We approach this problem as a complex causal inference task, relaxing parametric assumptions on both micro-level agent behavior and macro-level data distribution. Leveraging bounded rationality, we uncover a practical low-dimensional structure in distribution shifts and construct an effective mediator in the causal path from the deployed model to the shifted data. We then propose a gradient-based policy optimization algorithm with a differentiable classifier as a substitute for the high-dimensional distribution map. Our algorithm efficiently utilizes batch feedback and limited manipulation patterns. Our approach achieves high sample efficiency compared to methods reliant on bandit feedback or zero-order optimization. We also provide theoretical guarantees for algorithmic convergence. Extensive and challenging experiments on high-dimensional settings demonstrate our method's practical efficacy.