AITopics | Chen, Yukun

Collaborating Authors

Chen, Yukun

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

REFINE: Inversion-Free Backdoor Defense via Model Reprogramming

Chen, Yukun, Shao, Shuo, Huang, Enhao, Li, Yiming, Chen, Pin-Yu, Qin, Zhan, Ren, Kui

arXiv.org Artificial IntelligenceFeb-22-2025

Backdoor attacks on deep neural networks (DNNs) have emerged as a significant security threat, allowing adversaries to implant hidden malicious behaviors during the model training phase. Pre-processing-based defense, which is one of the most important defense paradigms, typically focuses on input transformations or backdoor trigger inversion (BTI) to deactivate or eliminate embedded backdoor triggers during the inference process. However, these methods suffer from inherent limitations: transformation-based defenses often fail to balance model utility and defense performance, while BTI-based defenses struggle to accurately reconstruct trigger patterns without prior knowledge. In this paper, we propose REFINE, an inversion-free backdoor defense method based on model reprogramming. REFINE consists of two key components: \textbf{(1)} an input transformation module that disrupts both benign and backdoor patterns, generating new benign features; and \textbf{(2)} an output remapping module that redefines the model's output domain to guide the input transformations effectively. By further integrating supervised contrastive loss, REFINE enhances the defense capabilities while maintaining model utility. Extensive experiments on various benchmark datasets demonstrate the effectiveness of our REFINE and its resistance to potential adaptive attacks.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2502.18508

Country: Asia > China (0.14)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

Add feedback

PersonaMath: Enhancing Math Reasoning through Persona-Driven Data Augmentation

Luo, Jing, Luo, Run, Chen, Longze, Zhu, Liang, Ao, Chang, Li, Jiaming, Chen, Yukun, Cheng, Xin, Yang, Wen, Su, Jiayuan, Li, Chengming, Yang, Min

arXiv.org Artificial IntelligenceOct-2-2024

While closed-source Large Language Models (LLMs) demonstrate strong mathematical problem-solving abilities, open-source models continue to struggle with such tasks. To bridge this gap, we propose a data augmentation approach and introduce PersonaMathQA, a dataset derived from MATH and GSM8K, on which we train the PersonaMath models. Our approach consists of two stages: the first stage is learning from Persona Diversification, and the second stage is learning from Reflection. In the first stage, we regenerate detailed chain-of-thought (CoT) solutions as instructions using a closed-source LLM and introduce a novel personadriven data augmentation technique to enhance the dataset's quantity and diversity. In the second stage, we incorporate reflection to fully leverage more challenging and valuable questions. Evaluation of our PersonaMath models on MATH and GSM8K reveals that the PersonaMath-7B model (based on LLaMA-2-7B) achieves an accuracy of 24.2% on MATH and 68.7% on GSM8K, surpassing all baseline methods and achieving state-of-the-art performance. Notably, our dataset contains only 70.3K data points--merely 17.8% of MetaMathQA and 27% of MathInstruct--yet our model outperforms these baselines, demonstrating the high quality and diversity of our dataset, which enables more efficient model training. "There are a thousand Hamlets in a thousand people's eyes" Among these tasks, solving math problems stands out as particularly challenging due to its complexity and the requirement for multi-step reasoning to reach a solution. While some closed-source models, such as GPT-4o (OpenAI, 2024a), Claude 3.5 Sonnet (Anthropic, 2024), and Gemini 1.5 Pro (Reid et al., 2024), have demonstrated strong math-solving capabilities, current open-source models (e.g., LLaMA (Touvron et al., 2023; Dubey et al., 2024)) continue to struggle in this area. Therefore, enhancing the math problem-solving abilities of open-source models is a prominent desiderata. A widely adopted and effective approach for improving the math-solving capabilities of open-source models is fine-tuning, owing to the accessibility of their weights (Yuan et al., 2023; Yue et al., 2023; The method consists of two stages: Stage 1 (top) and Stage 2 (bottom). Stage 1 focuses on using closed-source LLMs to automatically generate detailed CoT solutions and apply our persona-driven rewriting method to rephrase the questions.

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2410.01504

Country:

North America > Canada (0.28)
Asia > Middle East > UAE (0.14)
Europe > Middle East > Malta (0.14)

Genre: Research Report > New Finding (0.68)

Industry: Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

RLHFuse: Efficient RLHF Training for Large Language Models with Inter- and Intra-Stage Fusion

Zhong, Yinmin, Zhang, Zili, Wu, Bingyang, Liu, Shengyu, Chen, Yukun, Wan, Changyi, Hu, Hanpeng, Xia, Lei, Ming, Ranchen, Zhu, Yibo, Jin, Xin

arXiv.org Artificial IntelligenceSep-25-2024

Reinforcement Learning from Human Feedback (RLHF) enhances the alignment between LLMs and human preference. The workflow of RLHF typically involves several models and tasks in a series of distinct stages. Existing RLHF training systems view each task as the smallest execution unit thus overlooking the opportunities for subtask-level optimizations. Due to the intrinsic nature of RLHF training, i.e., the data skewness in the generation stage, and the pipeline bubbles in the training stage, existing RLHF systems suffer from low GPU utilization in production deployments. RLHFuse breaks the traditional view of RLHF workflow as a composition of individual tasks, splitting each task into finer-grained subtasks, and performing stage fusion to improve GPU utilization. RLHFuse contains two key ideas. First, for generation and inference tasks, RLHFuse splits them into sample-level subtasks, enabling efficient inter-stage fusion to mitigate the original generation bottleneck dominated by long-tailed samples. Second, for training tasks, RLHFuse breaks them into subtasks of micro-batches. By leveraging the intuition that pipeline execution can be essentially complemented by another pipeline, RLHFuse performs intra-stage fusion to concurrently execute these subtasks in the training stage with a fused pipeline schedule, resulting in fewer pipeline bubbles. In addition, RLHFuse incorporates a series of system optimizations tailored for each stage of RLHF, making it efficient and scalable for our internal product usage. We evaluate RLHFuse on various popular LLMs and the results show that RLHFuse increases the training throughput by up to 3.7x, compared to existing state-of-the-art systems.

artificial intelligence, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2409.13221

Country: North America > United States (0.14)

Genre:

Workflow (0.69)
Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Towards the Better Ranking Consistency: A Multi-task Learning Framework for Early Stage Ads Ranking

Wang, Xuewei, Jin, Qiang, Huang, Shengyu, Zhang, Min, Liu, Xi, Zhao, Zhengli, Chen, Yukun, Zhang, Zhengyu, Yang, Jiyan, Wen, Ellie, Chordia, Sagar, Chen, Wenlin, Huang, Qin

arXiv.org Artificial IntelligenceJul-12-2023

Dividing ads ranking system into retrieval, early, and final stages is a common practice in large scale ads recommendation to balance the efficiency and accuracy. The early stage ranking often uses efficient models to generate candidates out of a set of retrieved ads. The candidates are then fed into a more computationally intensive but accurate final stage ranking system to produce the final ads recommendation. As the early and final stage ranking use different features and model architectures because of system constraints, a serious ranking consistency issue arises where the early stage has a low ads recall, i.e., top ads in the final stage are ranked low in the early stage. In order to pass better ads from the early to the final stage ranking, we propose a multi-task learning framework for early stage ranking to capture multiple final stage ranking components (i.e. ads clicks and ads quality events) and their task relations. With our multi-task learning framework, we can not only achieve serving cost saving from the model consolidation, but also improve the ads recall and ranking consistency. In the online A/B testing, our framework achieves significantly higher click-through rate (CTR), conversion rate (CVR), total value and better ads-quality (e.g. reduced ads cross-out rate) in a large scale industrial ads ranking system.

artificial intelligence, machine learning, multi-task learning framework, (14 more...)

arXiv.org Artificial Intelligence

2307.11096

Country: North America > United States > California (0.15)

Genre: Research Report (0.50)

Industry: Marketing (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Boxhead: A Dataset for Learning Hierarchical Representations

Chen, Yukun, Träuble, Frederik, Dittadi, Andrea, Bauer, Stefan, Schölkopf, Bernhard

arXiv.org Machine LearningOct-7-2021

Disentanglement is hypothesized to be beneficial towards a number of downstream tasks. However, a common assumption in learning disentangled representations is that the data generative factors are statistically independent. As current methods are almost solely evaluated on toy datasets where this ideal assumption holds, we investigate their performance in hierarchical settings, a relevant feature of real-world data. In this work, we introduce Boxhead, a dataset with hierarchically structured ground-truth generative factors. We use this novel dataset to evaluate the performance of state-of-the-art autoencoder-based disentanglement models and observe that hierarchical models generally outperform single-layer VAEs in terms of disentanglement of hierarchically arranged factors.

artificial intelligence, learning hierarchical representation, neural network, (2 more...)

arXiv.org Machine Learning

2110.03628

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.76)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.76)

Add feedback

Aggregated Wasserstein Metric and State Registration for Hidden Markov Models

Chen, Yukun, Ye, Jianbo, Li, Jia

arXiv.org Machine LearningNov-19-2017

We propose a framework, named Aggregated Wasserstein, for computing a dissimilarity measure or distance between two Hidden Markov Models with state conditional distributions being Gaussian. For such HMMs, the marginal distribution at any time position follows a Gaussian mixture distribution, a fact exploited to softly match, aka register, the states in two HMMs. We refer to such HMMs as Gaussian mixture model-HMM (GMM-HMM). The registration of states is inspired by the intrinsic relationship of optimal transport and the Wasserstein metric between distributions. Specifically, the components of the marginal GMMs are matched by solving an optimal transport problem where the cost between components is the Wasserstein metric for Gaussian distributions. The solution of the optimization problem is a fast approximation to the Wasserstein metric between two GMMs. The new Aggregated Wasserstein distance is a semi-metric and can be computed without generating Monte Carlo samples. It is invariant to relabeling or permutation of states. The distance is defined meaningfully even for two HMMs that are estimated from data of different dimensionality, a situation that can arise due to missing variables. This distance quantifies the dissimilarity of GMM-HMMs by measuring both the difference between the two marginal GMMs and that between the two transition matrices. Our new distance is tested on tasks of retrieval, classification, and t-SNE visualization of time series. Experiments on both synthetic and real data have demonstrated its advantages in terms of accuracy as well as efficiency in comparison with existing distances based on the Kullback-Leibler divergence.

artificial intelligence, experiment, machine learning, (18 more...)

arXiv.org Machine Learning

1711.05792

Country:

North America > United States (0.46)
Asia (0.28)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

A Distance for HMMs based on Aggregated Wasserstein Metric and State Registration

Chen, Yukun, Ye, Jianbo, Li, Jia

arXiv.org Machine LearningAug-4-2016

We propose a framework, named Aggregated Wasserstein, for computing a dissimilarity measure or distance between two Hidden Markov Models with state conditional distributions being Gaussian. For such HMMs, the marginal distribution at any time spot follows a Gaussian mixture distribution, a fact exploited to softly match, aka register, the states in two HMMs. We refer to such HMMs as Gaussian mixture model-HMM (GMM-HMM). The registration of states is inspired by the intrinsic relationship of optimal transport and the Wasserstein metric between distributions. Specifically, the components of the marginal GMMs are matched by solving an optimal transport problem where the cost between components is the Wasserstein metric for Gaussian distributions. The solution of the optimization problem is a fast approximation to the Wasserstein metric between two GMMs. The new Aggregated Wasserstein distance is a semi-metric and can be computed without generating Monte Carlo samples. It is invariant to relabeling or permutation of the states. This distance quantifies the dissimilarity of GMM-HMMs by measuring both the difference between the two marginal GMMs and the difference between the two transition matrices. Our new distance is tested on the tasks of retrieval and classification of time series. Experiments on both synthetic data and real data have demonstrated its advantages in terms of accuracy as well as efficiency in comparison with existing distances based on the Kullback-Leibler divergence.

artificial intelligence, machine learning, wasserstein distance, (14 more...)

arXiv.org Machine Learning

1608.01747

Country: North America > United States (0.46)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback