AITopics | Liu, Zichen

Collaborating Authors

Liu, Zichen

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Understanding R1-Zero-Like Training: A Critical Perspective

Liu, Zichen, Chen, Changyu, Li, Wenjun, Qi, Penghui, Pang, Tianyu, Du, Chao, Lee, Wee Sun, Lin, Min

arXiv.org Artificial IntelligenceMar-26-2025

DeepSeek-R1-Zero has shown that reinforcement learning (RL) at scale can directly enhance the reasoning capabilities of LLMs without supervised fine-tuning. In this work, we critically examine R1-Zero-like training by analyzing its two core components: base models and RL. We investigate a wide range of base models, including DeepSeek-V3-Base, to understand how pretraining characteristics influence RL performance. Our analysis reveals that DeepSeek-V3-Base already exhibit ''Aha moment'', while Qwen2.5 base models demonstrate strong reasoning capabilities even without prompt templates, suggesting potential pretraining biases. Additionally, we identify an optimization bias in Group Relative Policy Optimization (GRPO), which artificially increases response length (especially for incorrect outputs) during training. To address this, we introduce Dr. GRPO, an unbiased optimization method that improves token efficiency while maintaining reasoning performance. Leveraging these insights, we present a minimalist R1-Zero recipe that achieves 43.3% accuracy on AIME 2024 with a 7B base model, establishing a new state-of-the-art. Our code is available at https://github.com/sail-sg/understand-r1-zero.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2503.20783

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Sailor2: Sailing in South-East Asia with Inclusive Multilingual LLMs

Dou, Longxu, Liu, Qian, Zhou, Fan, Chen, Changyu, Wang, Zili, Jin, Ziqi, Liu, Zichen, Zhu, Tongyao, Du, Cunxiao, Yang, Penghui, Wang, Haonan, Liu, Jiaheng, Zhao, Yongchi, Feng, Xiachong, Mao, Xin, Yeung, Man Tsung, Pipatanakul, Kunat, Koto, Fajri, Thu, Min Si, Kydlíček, Hynek, Liu, Zeyi, Lin, Qunshu, Sripaisarnmongkol, Sittipong, Sae-Khow, Kridtaphad, Thongchim, Nirattisai, Konkaew, Taechawat, Borijindargoon, Narong, Dao, Anh, Maneegard, Matichon, Artkaew, Phakphum, Yong, Zheng-Xin, Nguyen, Quan, Phatthiyaphaibun, Wannaphong, Tran, Hoang H., Zhang, Mike, Chen, Shiqi, Pang, Tianyu, Du, Chao, Wan, Xinyi, Lu, Wei, Lin, Min

arXiv.org Artificial IntelligenceFeb-18-2025

Sailor2 is a family of cutting-edge multilingual language models for South-East Asian (SEA) languages, available in 1B, 8B, and 20B sizes to suit diverse applications. Building on Qwen2.5, Sailor2 undergoes continuous pre-training on 500B tokens (400B SEA-specific and 100B replay tokens) to support 13 SEA languages while retaining proficiency in Chinese and English. Sailor2-20B model achieves a 50-50 win rate against GPT-4o across SEA languages. We also deliver a comprehensive cookbook on how to develop the multilingual model in an efficient manner, including five key aspects: data curation, pre-training, post-training, model customization and evaluation. We hope that Sailor2 model (Apache 2.0 license) will drive language development in the SEA region, and Sailor2 cookbook will inspire researchers to build more inclusive LLMs for other under-served languages.

large language model, machine learning, qwen2, (19 more...)

arXiv.org Artificial Intelligence

2502.12982

Country:

North America > United States (0.45)
Asia > East Asia (0.40)
Asia > Indonesia (0.28)
North America > Mexico (0.27)

Genre: Research Report > New Finding (0.92)

Industry: Education > Educational Setting (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Selective Visual Prompting in Vision Mamba

Yao, Yifeng, Liu, Zichen, Cui, Zhenyu, Peng, Yuxin, Zhou, Jiahuan

arXiv.org Artificial IntelligenceDec-12-2024

Pre-trained Vision Mamba (Vim) models have demonstrated exceptional performance across various computer vision tasks in a computationally efficient manner, attributed to their unique design of selective state space models. To further extend their applicability to diverse downstream vision tasks, Vim models can be adapted using the efficient fine-tuning technique known as visual prompting. However, existing visual prompting methods are predominantly tailored for Vision Transformer (ViT)-based models that leverage global attention, neglecting the distinctive sequential token-wise compression and propagation characteristics of Vim. Specifically, existing prompt tokens prefixed to the sequence are insufficient to effectively activate the input and forget gates across the entire sequence, hindering the extraction and propagation of discriminative information. To address this limitation, we introduce a novel Selective Visual Prompting (SVP) method specifically for the efficient fine-tuning of Vim. To prevent the loss of discriminative information during state space propagation, SVP employs lightweight selective prompters for token-wise prompt generation, ensuring adaptive activation of the update and forget gates within Mamba blocks to promote discriminative information propagation. Moreover, considering that Vim propagates both shared cross-layer information and specific inner-layer information, we further refine SVP with a dual-path structure: Cross-Prompting and Inner-Prompting. Cross-Prompting utilizes shared parameters across layers, while Inner-Prompting employs distinct parameters, promoting the propagation of both shared and specific information, respectively. Extensive experimental results on various large-scale benchmarks demonstrate that our proposed SVP significantly outperforms state-of-the-art methods. Our code is available at https://github.com/zhoujiahuan1991/AAAI2025-SVP.

artificial intelligence, information, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2412.08947

Country: Europe > Switzerland (0.28)

Genre: Research Report > Promising Solution (0.48)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

Sample-Efficient Alignment for LLMs

Liu, Zichen, Chen, Changyu, Du, Chao, Lee, Wee Sun, Lin, Min

arXiv.org Artificial IntelligenceNov-9-2024

We study methods for efficiently aligning large language models (LLMs) with human preferences given budgeted online feedback. We first formulate the LLM alignment problem in the frame of contextual dueling bandits. This formulation, subsuming recent paradigms such as online RLHF and online DPO, inherently quests for sample-efficient algorithms that incorporate online active exploration. Leveraging insights from bandit theory, we introduce a unified algorithm based on Thompson sampling and highlight its applications in two distinct LLM alignment scenarios. The practical agent that efficiently implements this algorithm, named SEA (Sample-Efficient Alignment), is empirically validated through extensive experiments across three model scales (1B, 2.8B, 6.9B) and three preference learning algorithms (DPO, IPO, SLiC). The results demonstrate that SEA achieves highly sample-efficient alignment with oracle's preferences, outperforming recent active exploration methods for LLMs. Additionally, we release the implementation of SEA together with an efficient codebase designed for online alignment of LLMs, aiming to accelerate future research in this field.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2411.01493

Genre: Research Report > New Finding (0.48)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Bootstrapping Language Models with DPO Implicit Rewards

Chen, Changyu, Liu, Zichen, Du, Chao, Pang, Tianyu, Liu, Qian, Sinha, Arunesh, Varakantham, Pradeep, Lin, Min

arXiv.org Artificial IntelligenceJun-14-2024

Human alignment in large language models (LLMs) is an active area of research. A recent groundbreaking work, direct preference optimization (DPO), has greatly simplified the process from past work in reinforcement learning from human feedback (RLHF) by bypassing the reward learning stage in RLHF. DPO, after training, provides an implicit reward model. In this work, we make a novel observation that this implicit reward model can by itself be used in a bootstrapping fashion to further align the LLM. Our approach is to use the rewards from a current LLM model to construct a preference dataset, which is then used in subsequent DPO rounds. We incorporate refinements that debias the length of the responses and improve the quality of the preference dataset to further improve our approach. Our approach, named self-alignment with DPO ImpliCit rEwards (DICE), shows great improvements in alignment and achieves superior performance than Gemini Pro on AlpacaEval 2, reaching 27.55% length-controlled win rate against GPT-4 Turbo, but with only 8B parameters and no external feedback. Our code is available at https://github.com/sail-sg/dice.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2406.0976

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Open RL Benchmark: Comprehensive Tracked Experiments for Reinforcement Learning

Huang, Shengyi, Gallouédec, Quentin, Felten, Florian, Raffin, Antonin, Dossa, Rousslan Fernand Julien, Zhao, Yanxiao, Sullivan, Ryan, Makoviychuk, Viktor, Makoviichuk, Denys, Danesh, Mohamad H., Roumégous, Cyril, Weng, Jiayi, Chen, Chufan, Rahman, Md Masudur, Araújo, João G. M., Quan, Guorui, Tan, Daniel, Klein, Timo, Charakorn, Rujikorn, Towers, Mark, Berthelot, Yann, Mehta, Kinal, Chakraborty, Dipam, KG, Arjun, Charraut, Valentin, Ye, Chang, Liu, Zichen, Alegre, Lucas N., Nikulin, Alexander, Hu, Xiao, Liu, Tianlin, Choi, Jongwook, Yi, Brent

arXiv.org Artificial IntelligenceFeb-5-2024

In many Reinforcement Learning (RL) papers, learning curves are useful indicators to measure the effectiveness of RL algorithms. However, the complete raw data of the learning curves are rarely available. As a result, it is usually necessary to reproduce the experiments from scratch, which can be time-consuming and error-prone. We present Open RL Benchmark, a set of fully tracked RL experiments, including not only the usual data such as episodic return, but also all algorithm-specific and system metrics. Open RL Benchmark is community-driven: anyone can download, use, and contribute to the data. At the time of writing, more than 25,000 runs have been tracked, for a cumulative duration of more than 8 years. Open RL Benchmark covers a wide range of RL libraries and reference implementations. Special care is taken to ensure that each experiment is precisely reproducible by providing not only the full parameters, but also the versions of the dependencies used to generate it. In addition, Open RL Benchmark comes with a command-line interface (CLI) for easy fetching and generating figures to present the results. In this document, we include two case studies to demonstrate the usefulness of Open RL Benchmark in practice. To the best of our knowledge, Open RL Benchmark is the first RL benchmark of its kind, and the authors hope that it will improve and facilitate the work of researchers in the field.

artificial intelligence, machine learning, reinforcement learning, (12 more...)

arXiv.org Artificial Intelligence

2402.03046

Country:

North America > United States > New York (0.14)
North America > United States > Louisiana (0.14)
Europe > Austria > Vienna (0.14)
(2 more...)

Genre: Research Report (1.00)

Industry:

Information Technology (0.67)
Leisure & Entertainment (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

Locality Sensitive Sparse Encoding for Learning World Models Online

Liu, Zichen, Du, Chao, Lee, Wee Sun, Lin, Min

arXiv.org Artificial IntelligenceJan-27-2024

Acquiring an accurate world model online for model-based reinforcement learning (MBRL) is challenging due to data nonstationarity, which typically causes catastrophic forgetting for neural networks (NNs). From the online learning perspective, a Follow-The-Leader (FTL) world model is desirable, which optimally fits all previous experiences at each round. Unfortunately, NN-based models need re-training on all accumulated data at every interaction step to achieve FTL, which is computationally expensive for lifelong agents. In this paper, we revisit models that can achieve FTL with incremental updates. Specifically, our world model is a linear regression model supported by nonlinear random features. The linear part ensures efficient FTL update while the nonlinear random feature empowers the fitting of complex environments. To best trade off model capacity and computation efficiency, we introduce a locality sensitive sparse encoding, which allows us to conduct efficient sparse updates even with very high dimensional nonlinear features. We validate the representation power of our encoding and verify that it allows efficient online learning under data covariate shift. We also show, in the Dyna MBRL setting, that our world models learned online using a single pass of trajectory data either surpass or match the performance of deep world models trained with replay and other continual learning methods.

machine learning, reinforcement learning, world model, (18 more...)

arXiv.org Artificial Intelligence

2401.13034

Country: Asia (0.14)

Genre: Research Report (0.64)

Industry: Education (0.69)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
(2 more...)

Add feedback

AnyHome: Open-Vocabulary Generation of Structured and Textured 3D Homes

Wen, Zehao, Liu, Zichen, Sridhar, Srinath, Fu, Rao

arXiv.org Artificial IntelligenceDec-11-2023

We introduce AnyHome, a framework that translates open-vocabulary descriptions, ranging from simple labels to elaborate paragraphs, into well-structured and textured 3D indoor scenes at a house-scale. Inspired by cognition theories, AnyHome employs an amodal structured representation to capture 3D spatial cues from textual narratives and then uses egocentric inpainting to enrich these scenes. To this end, we begin by using specially designed template prompts for Large Language Models (LLMs), which enable precise control over the textual input. We then utilize intermediate representations to maintain the spatial structure's consistency, ensuring that the 3D scenes align closely with the textual description. Then, we apply a Score Distillation Sampling process to refine the placement of objects. Lastly, an egocentric inpainting process is incorporated to enhance the realism and appearance of the scenes. AnyHome stands out due to its hierarchical structured representation combined with the versatility of open-vocabulary text interpretation. This allows for extensive customization of indoor scenes at various levels of granularity. We demonstrate that AnyHome can reliably generate a range of diverse indoor scenes, characterized by their detailed spatial structures and textures, all corresponding to the free-form textual inputs.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2312.06644

Country:

Asia (0.28)
Oceania > Australia (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Adversarial Demonstration Attacks on Large Language Models

Wang, Jiongxiao, Liu, Zichen, Park, Keun Hee, Jiang, Zhuojun, Zheng, Zhaoheng, Wu, Zhuofeng, Chen, Muhao, Xiao, Chaowei

arXiv.org Artificial IntelligenceOct-14-2023

With the emergence of more powerful large language models (LLMs), such as ChatGPT and GPT-4, in-context learning (ICL) has gained significant prominence in leveraging these models for specific tasks by utilizing data-label pairs as precondition prompts. While incorporating demonstrations can greatly enhance the performance of LLMs across various tasks, it may introduce a new security concern: attackers can manipulate only the demonstrations without changing the input to perform an attack. In this paper, we investigate the security concern of ICL from an adversarial perspective, focusing on the impact of demonstrations. We propose a novel attack method named advICL, which aims to manipulate only the demonstration without changing the input to mislead the models. Our results demonstrate that as the number of demonstrations increases, the robustness of in-context learning would decrease. Additionally, we also identify the intrinsic property of the demonstrations is that they can be used (prepended) with different inputs. As a result, it introduces a more practical threat model in which an attacker can attack the test input example even without knowing and manipulating it. To achieve it, we propose the transferable version of advICL, named Transferable-advICL. Our experiment shows that the adversarial demonstration generated by Transferable-advICL can successfully attack the unseen test input examples. We hope that our study reveals the critical security risks associated with ICL and underscores the need for extensive research on the robustness of ICL, particularly given its increasing significance in the advancement of LLMs.

demonstration, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2305.1495

Country: North America > United States > California (0.28)

Genre: Research Report > New Finding (0.68)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

TreeMAN: Tree-enhanced Multimodal Attention Network for ICD Coding

Liu, Zichen, Liu, Xuyuan, Wen, Yanlong, Zhao, Guoqing, Xia, Fen, Yuan, Xiaojie

arXiv.org Artificial IntelligenceMay-29-2023

ICD coding is designed to assign the disease codes to electronic health records (EHRs) upon discharge, which is crucial for billing and clinical statistics. In an attempt to improve the effectiveness and efficiency of manual coding, many methods have been proposed to automatically predict ICD codes from clinical notes. However, most previous works ignore the decisive information contained in structured medical data in EHRs, which is hard to be captured from the noisy clinical notes. In this paper, we propose a Tree-enhanced Multimodal Attention Network (TreeMAN) to fuse tabular features and textual features into multimodal representations by enhancing the text representations with tree-based features via the attention mechanism. Tree-based features are constructed according to decision trees learned from structured multimodal medical data, which capture the decisive information about ICD coding. We can apply the same multi-label classifier from previous text models to the multimodal representations to predict ICD codes. Experiments on two MIMIC datasets show that our method outperforms prior state-of-the-art ICD coding approaches. The code is available at https://github.com/liu-zichen/TreeMAN.

information, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2305.18576

Country:

North America > United States > New York (0.15)
North America > United States > Louisiana (0.14)
Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report (0.64)

Industry:

Health & Medicine > Health Care Technology > Medical Record (1.00)
Health & Medicine > Health Care Providers & Services (1.00)
Health & Medicine > Therapeutic Area (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback