AITopics | Yu, Haining

Collaborating Authors

Yu, Haining

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

The Hidden Dimensions of LLM Alignment: A Multi-Dimensional Safety Analysis

Pan, Wenbo, Liu, Zhichao, Chen, Qiguang, Zhou, Xiangyang, Yu, Haining, Jia, Xiaohua

arXiv.org Artificial IntelligenceFeb-17-2025

Large Language Models' safety-aligned behaviors, such as refusing harmful queries, can be represented by linear directions in activation space. Previous research modeled safety behavior with a single direction, limiting mechanistic understanding to an isolated safety feature. In this work, we discover that safety-aligned behavior is jointly controlled by multi-dimensional directions. Namely, we study the vector space of representation shifts during safety fine-tuning on Llama 3 8B for refusing jailbreaks. By studying orthogonal directions in the space, we first find that a dominant direction governs the model's refusal behavior, while multiple smaller directions represent distinct and interpretable features like hypothetical narrative and role-playing. We then measure how different directions promote or suppress the dominant direction, showing the important role of secondary directions in shaping the model's refusal representation. Finally, we demonstrate that removing certain trigger tokens in harmful queries can mitigate these directions to bypass the learned safety capability, providing new insights on understanding safety alignment vulnerability from a multi-dimensional perspective. Code and artifacts are available at https://github.com/BMPixel/safety-residual-space.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2502.09674

Country: Asia > China (0.28)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Breaking the Context Bottleneck on Long Time Series Forecasting

Ma, Chao, Hou, Yikai, Li, Xiang, Sun, Yinggang, Yu, Haining, Fang, Zhou, Qu, Jiaxing

arXiv.org Artificial IntelligenceDec-21-2024

Long-term time-series forecasting is essential for planning and decision-making in economics, energy, and transportation, where long foresight is required. To obtain such long foresight, models must be both efficient and effective in processing long sequence. Recent advancements have enhanced the efficiency of these models; however, the challenge of effectively leveraging longer sequences persists. This is primarily due to the tendency of these models to overfit when presented with extended inputs, necessitating the use of shorter input lengths to maintain tolerable error margins. In this work, we investigate the multiscale modeling method and propose the Logsparse Decomposable Multiscaling (LDM) framework for the efficient and effective processing of long sequences. We demonstrate that by decoupling patterns at different scales in time series, we can enhance predictability by reducing non-stationarity, improve efficiency through a compact long input representation, and simplify the architecture by providing clear task assignments. Experimental results demonstrate that LDM not only outperforms all baselines in long-term forecasting benchmarks, but also reducing both training time and memory costs.

data mining, forecasting, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2412.16572

Country: Asia > China (0.28)

Genre: Research Report > New Finding (0.48)

Industry: Energy (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Do Contemporary CATE Models Capture Real-World Heterogeneity? Findings from a Large-Scale Benchmark

Yu, Haining, Sun, Yizhou

arXiv.org Machine LearningOct-9-2024

We present unexpected findings from a large-scale benchmark study evaluating Conditional Average Treatment Effect (CATE) estimation algorithms. By running 16 modern CATE models across 43,200 datasets, we find that: (a) 62\% of CATE estimates have a higher Mean Squared Error (MSE) than a trivial zero-effect predictor, rendering them ineffective; (b) in datasets with at least one useful CATE estimate, 80\% still have higher MSE than a constant-effect model; and (c) Orthogonality-based models outperform other models only 30\% of the time, despite widespread optimism about their performance. These findings expose significant limitations in current CATE models and suggest ample opportunities for further research. Our findings stem from a novel application of \textit{observational sampling}, originally developed to evaluate Average Treatment Effect (ATE) estimates from observational methods with experiment data. To adapt observational sampling for CATE evaluation, we introduce a statistical parameter, $Q$, equal to MSE minus a constant and preserves the ranking of models by their MSE. We then derive a family of sample statistics, collectively called $\hat{Q}$, that can be computed from real-world data. We prove that $\hat{Q}$ is a consistent estimator of $Q$ under mild technical conditions. When used in observational sampling, $\hat{Q}$ is unbiased and asymptotically selects the model with the smallest MSE. To ensure the benchmark reflects real-world heterogeneity, we handpick datasets where outcomes come from field rather than simulation. By combining the new observational sampling method, new statistics, and real-world datasets, the benchmark provides a unique perspective on CATE estimator performance and uncover gaps in capturing real-world heterogeneity.

artificial intelligence, machine learning, xgb, (17 more...)

arXiv.org Machine Learning

2410.07021

Country: North America > United States (0.28)

Genre:

Research Report > Strength High (1.00)
Research Report > Experimental Study (1.00)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)

Add feedback

QDA-SQL: Questions Enhanced Dialogue Augmentation for Multi-Turn Text-to-SQL

Sun, Yinggang, Guo, Ziming, Yu, Haining, Liu, Chuanyi, Li, Xiang, Wang, Bingxuan, Yu, Xiangzhan, Zhao, Tiancheng

arXiv.org Artificial IntelligenceJun-15-2024

Fine-tuning large language models (LLMs) for specific domain tasks has achieved great success in Text-to-SQL tasks. However, these fine-tuned models often face challenges with multi-turn Text-to-SQL tasks caused by ambiguous or unanswerable questions. It is desired to enhance LLMs to handle multiple types of questions in multi-turn Text-to-SQL tasks. To address this, we propose a novel data augmentation method, called QDA-SQL, which generates multiple types of multi-turn Q\&A pairs by using LLMs. In QDA-SQL, we introduce a novel data augmentation method incorporating validation and correction mechanisms to handle complex multi-turn Text-to-SQL tasks. Experimental results demonstrate that QDA-SQL enables fine-tuned models to exhibit higher performance on SQL statement accuracy and enhances their ability to handle complex, unanswerable questions in multi-turn Text-to-SQL tasks. The generation script and test set are released at https://github.com/mcxiaoxiao/QDA-SQL.

large language model, natural language, sql, (18 more...)

arXiv.org Artificial Intelligence

2406.10593

Country: Asia > China (0.14)

Genre: Research Report > New Finding (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Scheduling Live Interactive Narratives with Mixed-Integer Linear Programming

Azad, Sasha (Disney Research) | Xu, Jingyang (Decision Science, Walt Disney Parks and Resorts) | Yu, Haining (Decision Science, Walt Disney Parks and Resorts) | Li, Boyang (Disney Research )

AAAI ConferencesOct-1-2017

A live interactive narrative (LIN) is an experience where multiple players take on fictional roles and interact with real-world objects and actors to participate in a pre-authored narrative. Temporal properties of LINs are important to its viability and aesthetic quality and hence deserve special design consideration. In this paper, we tackle the largely overlooked problem of scheduling a multiplayer interactive narrative and propose the Live Interactive Narrative Scheduling Problem (LINSP), which handles reasoning under temporal uncertainty, resource scheduling, and non-linear plot choices. We present a mixed-integer linear programming formulation of the problem and empirically evaluates its scalability over large narrative instances.

mixed-integer linear programming, scheduling live interactive narrative

AAAI Conferences

Thirteenth Artificial Intelligence and Interactive Digital Entertainment Conference

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (1.00)

Add feedback