AITopics | Huang, Haoran

Collaborating Authors

Huang, Haoran

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

DACO: Towards Application-Driven and Comprehensive Data Analysis via Code Generation

Wu, Xueqing, Zheng, Rui, Sha, Jingzhen, Wu, Te-Lin, Zhou, Hanyu, Tang, Mohan, Chang, Kai-Wei, Peng, Nanyun, Huang, Haoran

arXiv.org Artificial IntelligenceMar-4-2024

Data analysis is a crucial analytical process to generate in-depth studies and conclusive insights to comprehensively answer a given user query for tabular data. In this work, we aim to propose new resources and benchmarks to inspire future research on this crucial yet challenging and under-explored task. However, collecting data analysis annotations curated by experts can be prohibitively expensive. We propose to automatically generate high-quality answer annotations leveraging the code-generation capabilities of LLMs with a multi-turn prompting technique. We construct the DACO dataset, containing (1) 440 databases (of tabular data) collected from real-world scenarios, (2) ~2k query-answer pairs that can serve as weak supervision for model training, and (3) a concentrated but high-quality test set with human refined annotations that serves as our main evaluation benchmark. We train a 6B supervised fine-tuning (SFT) model on DACO dataset, and find that the SFT model learns reasonable data analysis capabilities. To further align the models with human preference, we use reinforcement learning to encourage generating analysis perceived by human as helpful, and design a set of dense rewards to propagate the sparse human preference reward to intermediate code generation steps. Our DACO-RL algorithm is evaluated by human annotators to produce more helpful answers than SFT model in 57.72% cases, validating the effectiveness of our proposed algorithm. Data and code are released at https://github.com/shirley-wu/daco

helpfulness, large language model, machine learning, (21 more...)

arXiv.org Artificial Intelligence

2403.02528

Country:

Asia > China (0.28)
North America > United States > California (0.14)
Asia > Middle East > UAE (0.14)

Genre: Research Report (0.41)

Industry:

Banking & Finance (0.93)
Health & Medicine (0.93)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.98)
Information Technology > Artificial Intelligence > Representation & Reasoning > Automatic Programming (0.84)

Add feedback

Improving Generalization of Alignment with Human Preferences through Group Invariant Learning

Zheng, Rui, Shen, Wei, Hua, Yuan, Lai, Wenbin, Dou, Shihan, Zhou, Yuhao, Xi, Zhiheng, Wang, Xiao, Huang, Haoran, Gui, Tao, Zhang, Qi, Huang, Xuanjing

arXiv.org Artificial IntelligenceDec-25-2023

The success of AI assistants based on language models (LLMs) hinges crucially on Reinforcement Learning from Human Feedback (RLHF), which enables the generation of responses more aligned with human preferences. As universal AI assistants, there's a growing expectation for them to perform consistently across various domains. However, previous work shows that Reinforcement Learning (RL) often exploits shortcuts to attain high rewards and overlooks challenging samples. This focus on quick reward gains undermines both the stability in training and the model's ability to generalize to new, unseen data. In this work, we propose a novel approach that can learn a consistent policy via RL across various data groups or domains. Given the challenges associated with acquiring group annotations, our method automatically classifies data into different groups, deliberately maximizing performance variance. Then, we optimize the policy to perform well on challenging groups. Lastly, leveraging the established groups, our approach adaptively adjusts the exploration space, allocating more learning capacity to more challenging data and preventing the model from over-optimizing on simpler data. Experimental results indicate that our approach significantly enhances training stability and model generalization.

large language model, machine learning, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

2310.11971

Country:

Europe (0.67)
North America > United States > New York (0.14)
North America > United States > Idaho (0.14)
(2 more...)

Genre: Research Report > New Finding (0.67)

Industry:

Leisure & Entertainment (0.93)
Media > Music (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.87)

Add feedback

Secrets of RLHF in Large Language Models Part I: PPO

Zheng, Rui, Dou, Shihan, Gao, Songyang, Hua, Yuan, Shen, Wei, Wang, Binghai, Liu, Yan, Jin, Senjie, Liu, Qin, Zhou, Yuhao, Xiong, Limao, Chen, Lu, Xi, Zhiheng, Xu, Nuo, Lai, Wenbin, Zhu, Minghao, Chang, Cheng, Yin, Zhangyue, Weng, Rongxiang, Cheng, Wensen, Huang, Haoran, Sun, Tianxiang, Yan, Hang, Gui, Tao, Zhang, Qi, Qiu, Xipeng, Huang, Xuanjing

arXiv.org Artificial IntelligenceJul-18-2023

Large language models (LLMs) have formulated a blueprint for the advancement of artificial general intelligence. Its primary objective is to function as a human-centric (helpful, honest, and harmless) assistant. Alignment with humans assumes paramount significance, and reinforcement learning with human feedback (RLHF) emerges as the pivotal technological paradigm underpinning this pursuit. Current technical routes usually include \textbf{reward models} to measure human preferences, \textbf{Proximal Policy Optimization} (PPO) to optimize policy model outputs, and \textbf{process supervision} to improve step-by-step reasoning capabilities. However, due to the challenges of reward design, environment interaction, and agent training, coupled with huge trial and error cost of large language models, there is a significant barrier for AI researchers to motivate the development of technical alignment and safe landing of LLMs. The stable training of RLHF has still been a puzzle. In the first report, we dissect the framework of RLHF, re-evaluate the inner workings of PPO, and explore how the parts comprising PPO algorithms impact policy agent training. We identify policy constraints being the key factor for the effective implementation of the PPO algorithm. Therefore, we explore the PPO-max, an advanced version of PPO algorithm, to efficiently improve the training stability of the policy model. Based on our main results, we perform a comprehensive analysis of RLHF abilities compared with SFT models and ChatGPT. The absence of open-source implementations has posed significant challenges to the investigation of LLMs alignment. Therefore, we are eager to release technical reports, reward models and PPO codes, aiming to make modest contributions to the advancement of LLMs.

machine learning, natural language, reinforcement learning, (20 more...)

arXiv.org Artificial Intelligence

2307.04964

Country: North America > United States > New York (0.14)

Genre: Research Report (0.82)

Industry:

Health & Medicine (0.46)
Law (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Query Answering with Inconsistent Existential Rules under Stable Model Semantics

Wan, Hai (Sun Yat-sen University) | Zhang, Heng (Huazhong University of Science and Technology) | Xiao, Peng (Sun Yat-sen University) | Huang, Haoran (Fudan University ) | Zhang, Yan (Western Sydney University)

AAAI ConferencesApr-19-2016

Classical inconsistency-tolerant query answering relies on selecting maximal components of an ABox/database which are consistent with the ontology. However, some rules in ontologies might be unreliable if they are extracted from ontology learning or written by unskillful knowledge engineers. In this paper we present a framework of handling inconsistent existential rules under stable model semantics, which is defined by a notion called rule repairs to select maximal components of the existential rules. Surprisingly, for R-acyclic existential rules with R-stratified or guarded existential rules with stratified negations, both the data complexity and combined complexity of query answering under the rule repair semantics remain the same as that under the conventional query answering semantics. This leads us to propose several approaches to handle the rule repair semantics by calling answer set programming solvers. An experimental evaluation shows that these approaches have good scalability of query answering under rule repairs on realistic cases.

artificial intelligence, logic programming, query answering, (18 more...)

AAAI Conferences

Thirtieth AAAI Conference on Artificial Intelligence

Country:

North America > Canada > Quebec (0.14)
Europe > Austria > Vienna (0.14)
Asia > China > Guangdong Province (0.14)
(5 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (0.92)

Add feedback