AITopics | Sun, Haoran

Collaborating Authors

Sun, Haoran

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Amulet: ReAlignment During Test Time for Personalized Preference Adaptation of LLMs

Zhang, Zhaowei, Bai, Fengshuo, Chen, Qizhi, Ma, Chengdong, Wang, Mingzhi, Sun, Haoran, Zheng, Zilong, Yang, Yaodong

arXiv.org Artificial IntelligenceFeb-26-2025

How to align large language models (LLMs) with user preferences from a static general dataset has been frequently studied. However, user preferences are usually personalized, changing, and diverse regarding culture, values, or time. This leads to the problem that the actual user preferences often do not coincide with those trained by the model developers in the practical use of LLMs. Since we cannot collect enough data and retrain for every demand, researching efficient real-time preference adaptation methods based on the backbone LLMs during test time is important. To this end, we introduce Amulet, a novel, training-free framework that formulates the decoding process of every token as a separate online learning problem with the guidance of simple user-provided prompts, thus enabling real-time optimization to satisfy users' personalized preferences. To reduce the computational cost brought by this optimization process for each token, we additionally provide a closed-form solution for each iteration step of the optimization process, thereby reducing the computational time cost to a negligible level. The detailed experimental results demonstrate that Amulet can achieve significant performance improvements in rich settings with combinations of different LLMs, datasets, and user preferences, while maintaining acceptable computational efficiency.

arxiv preprint arxiv, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2502.19148

Country:

North America > United States (0.14)
Asia > China (0.14)

Genre: Research Report > New Finding (0.48)

Industry:

Information Technology (0.93)
Education > Educational Setting > Online (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Probing Perceptual Constancy in Large Vision Language Models

Sun, Haoran, Yu, Suyang, Li, Yijiang, Gao, Qingying, Lyu, Haiyun, Deng, Hokin, Luo, Dezhi

arXiv.org Artificial IntelligenceFeb-14-2025

Perceptual constancy is the ability to maintain stable perceptions of objects despite changes in sensory input, such as variations in distance, angle, or lighting. This ability is crucial for recognizing visual information in a dynamic world, making it essential for Vision-Language Models (VLMs). However, whether VLMs are currently and theoretically capable of mastering this ability remains underexplored. In this study, we evaluated 33 VLMs using 253 experiments across three domains: color, size, and shape constancy. The experiments included single-image and video adaptations of classic cognitive tasks, along with novel tasks in in-the-wild conditions, to evaluate the models' recognition of object properties under varying conditions. We found significant variability in VLM performance, with models performance in shape constancy clearly dissociated from that of color and size constancy.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2502.10273

Country: North America > United States (1.00)

Genre: Research Report > New Finding (1.00)

Industry:

Transportation (0.47)
Government > Military (0.46)
Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Game Theory Meets Large Language Models: A Systematic Survey

Sun, Haoran, Wu, Yusen, Cheng, Yukun, Chu, Xu

arXiv.org Artificial IntelligenceFeb-13-2025

Game theory establishes a fundamental framework for analyzing strategic interactions among rational decision-makers. The rapid advancement of large language models (LLMs) has sparked extensive research exploring the intersection of these two fields. Specifically, game-theoretic methods are being applied to evaluate and enhance LLM capabilities, while LLMs themselves are reshaping classic game models. This paper presents a comprehensive survey of the intersection of these fields, exploring a bidirectional relationship from three perspectives: (1) Establishing standardized game-based benchmarks for evaluating LLM behavior; (2) Leveraging game-theoretic methods to improve LLM performance through algorithmic innovations; (3) Characterizing the societal impacts of LLMs through game modeling. Among these three aspects, we also highlight how the equilibrium analysis for traditional game models is impacted by LLMs' advanced language understanding, which in turn extends the study of game theory. Finally, we identify key challenges and future research directions, assessing their feasibility based on the current state of the field. By bridging theoretical rigor with emerging AI capabilities, this survey aims to foster interdisciplinary collaboration and drive progress in this evolving research area.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2502.09053

Genre:

Overview (1.00)
Research Report > New Finding (0.93)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.46)

Add feedback

Curiosity-Driven Reinforcement Learning from Human Feedback

Sun, Haoran, Chai, Yekun, Wang, Shuohuan, Sun, Yu, Wu, Hua, Wang, Haifeng

arXiv.org Artificial IntelligenceJan-20-2025

Reinforcement learning from human feedback (RLHF) has proven effective in aligning large language models (LLMs) with human preferences, but often at the cost of reduced output diversity. This trade-off between diversity and alignment quality remains a significant challenge. Drawing inspiration from curiosity-driven exploration in reinforcement learning, we introduce curiosity-driven RLHF (CD-RLHF), a framework that incorporates intrinsic rewards for novel states, alongside traditional sparse extrinsic rewards, to optimize both output diversity and alignment quality. We demonstrate the effectiveness of CD-RLHF through extensive experiments on a range of tasks, including text summarization and instruction following. Our approach achieves significant gains in diversity on multiple diversity-oriented metrics while maintaining alignment with human preferences comparable to standard RLHF. We make our code publicly available at https://github.com/ernie-research/CD-RLHF.

artificial intelligence, deep learning, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2501.11463

Country:

Asia > Japan > Honshū (0.14)
North America > United States (0.14)
North America > Canada (0.14)
(3 more...)

Genre: Research Report (0.82)

Industry:

Education (0.68)
Materials > Metals & Mining > Steel (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Vision Language Models Know Law of Conservation without Understanding More-or-Less

Luo, Dezhi, Lyu, Haiyun, Gao, Qingying, Sun, Haoran, Li, Yijiang, Deng, Hokin

arXiv.org Artificial IntelligenceDec-22-2024

Conservation is a critical milestone of cognitive development considered to be supported by both the understanding of quantitative concepts and the reversibility of mental operations. To assess whether this critical component of human intelligence has emerged in Vision Language Models, we have curated the ConserveBench, a battery of 365 cognitive experiments across four dimensions of physical quantities: volume, solid quantity, length, and number. The former two involve only transformational tasks, whereas the latter two involve non-transformational tasks assessing the understanding of quantitative concepts alone. Surprisingly, we find that while Vision Language Models are generally capable of conserving, they tend to fail at non-transformational tasks whose successes are typically considered to be evidence of the ability to conserve. This implies that the law of conservation, at least in concrete domains, may exist without corresponding conceptual understanding of quantity.

artificial intelligence, natural language, trapezoid, (18 more...)

arXiv.org Artificial Intelligence

2410.00332

Country: North America > United States (0.68)

Genre: Research Report (0.64)

Industry: Health & Medicine (0.96)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Vision (0.91)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.35)

Add feedback

Vision Language Models See What You Want but not What You See

Gao, Qingying, Li, Yijiang, Lyu, Haiyun, Sun, Haoran, Luo, Dezhi, Deng, Hokin

arXiv.org Artificial IntelligenceDec-22-2024

Knowing others' intentions and taking others' perspectives are two core components of human intelligence typically considered as instantiations of theory of mind. Infiltrating machines with these abilities is an important step towards building human-level artificial intelligence. We here investigate intentionality understanding and perspective-taking in Vision Language Models and, for the purpose, we have created IntentBench and PerspectBench datasets, which contain over 400 cognitive experiments grounded in real-world scenarios and classic cognitive tasks. Surprisingly, we find that VLMs achieve high performance in intentionality understanding but lower performance in perspective-taking using our two datasets. This challenges the common belief in the cognitive science literature that perspective-taking at the corresponding modality is necessary for intentionality understanding.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2410.00324

Country: North America > United States (1.00)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Neurology (0.47)
Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Medical Multimodal Foundation Models in Clinical Diagnosis and Treatment: Applications, Challenges, and Future Directions

Sun, Kai, Xue, Siyan, Sun, Fuchun, Sun, Haoran, Luo, Yu, Wang, Ling, Wang, Siyuan, Guo, Na, Liu, Lei, Zhao, Tian, Wang, Xinzhou, Yang, Lei, Jin, Shuo, Yan, Jun, Dong, Jiahong

arXiv.org Artificial IntelligenceDec-3-2024

Recent advancements in deep learning have significantly revolutionized the field of clinical diagnosis and treatment, offering novel approaches to improve diagnostic precision and treatment efficacy across diverse clinical domains, thus driving the pursuit of precision medicine. The growing availability of multi-organ and multimodal datasets has accelerated the development of large-scale Medical Multimodal Foundation Models (MMFMs). These models, known for their strong generalization capabilities and rich representational power, are increasingly being adapted to address a wide range of clinical tasks, from early diagnosis to personalized treatment strategies. This review offers a comprehensive analysis of recent developments in MMFMs, focusing on three key aspects: datasets, model architectures, and clinical applications. We also explore the challenges and opportunities in optimizing multimodal representations and discuss how these advancements are shaping the future of healthcare by enabling improved patient outcomes and more efficient clinical workflows.

data mining, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2412.02621

Country:

Europe (0.92)
Asia > China (0.46)
North America > Canada (0.28)

Genre:

Research Report > New Finding (1.00)
Overview (1.00)
Research Report > Promising Solution (0.87)
Research Report > Experimental Study (0.67)

Industry:

Health & Medicine > Therapeutic Area > Ophthalmology/Optometry (1.00)
Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Therapeutic Area > Neurology (1.00)
(6 more...)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
(3 more...)

Add feedback

Multilingual Large Language Models: A Systematic Survey

Zhu, Shaolin, Supryadi, null, Xu, Shaoyang, Sun, Haoran, Pan, Leiyu, Cui, Menglong, Du, Jiangcun, Jin, Renren, Branco, António, Xiong, Deyi

arXiv.org Artificial IntelligenceNov-19-2024

This paper provides a comprehensive survey of the latest research on multilingual large language models (MLLMs). MLLMs not only are able to understand and generate language across linguistic boundaries, but also represent an important advancement in artificial intelligence. We first discuss the architecture and pre-training objectives of MLLMs, highlighting the key components and methodologies that contribute to their multilingual capabilities. We then discuss the construction of multilingual pre-training and alignment datasets, underscoring the importance of data quality and diversity in enhancing MLLM performance. An important focus of this survey is on the evaluation of MLLMs. We present a detailed taxonomy and roadmap covering the assessment of MLLMs' cross-lingual knowledge, reasoning, alignment with human values, safety, interpretability and specialized applications. Specifically, we extensively discuss multilingual evaluation benchmarks and datasets, and explore the use of LLMs themselves as multilingual evaluators. To enhance MLLMs from black to white boxes, we also address the interpretability of multilingual capabilities, cross-lingual transfer and language bias within these models. Finally, we provide a comprehensive review of real-world applications of MLLMs across diverse domains, including biology, medicine, computer science, mathematics and law. We showcase how these models have driven innovation and improvements in these specialized fields while also highlighting the challenges and opportunities in deploying MLLMs within diverse language communities and application scenarios. We listed the paper related in this survey and publicly available at https://github.com/tjunlp-lab/Awesome-Multilingual-LLMs-Papers.

large language model, machine learning, neural information processing system 33, (23 more...)

arXiv.org Artificial Intelligence

2411.11072

Country:

Asia > China (0.92)
Asia > Middle East (0.68)
Europe > Middle East > Malta (0.28)
(3 more...)

Genre:

Research Report > New Finding (1.00)
Overview (1.00)

Industry:

Law (1.00)
Health & Medicine (1.00)
Education > Educational Setting (1.00)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Model and Deep learning based Dynamic Range Compression Inversion

Sun, Haoran, Fourer, Dominique, Maaref, Hichem

arXiv.org Artificial IntelligenceNov-6-2024

Dynamic Range Compression (DRC) is a fundamental process in audio signal processing which aims at changing the dynamic range of a signal. This technique is widely used in various stages of audio production, such as recording, mixing, and mastering, to control the loudness of an audio signal and prevent clipping or distortion [1]. However, the application of DRC often leads to changes in the audio's timbre and perceived quality, making its inversion a challenging task. Thus, inverting DRC is full of interest in the context of audio reverse engineering [2] since it aims at recovering the original dynamic range and audio quality of a signal. This task could find many applications such as signal restoration, remixing, and enhancing creative control. Inverting DRC is a challenging problem which often requires side information with an explicit DRC model and prior knowledge about the DRC parameters to be efficiently processed. There only exist a few studies which directly address the problem of DRC inversion. In [3], the authors consider DRC inversion as a rate-distortion optimization problem using a coder-decoder framework which minimizes both the side-information and the reconstruction error when combined with a specific estimator applied to the compressed signal. In [4], the authors propose a specific DRC model which provides promising reconstruction approximation but require to know exactly the DRC parameters of the compressed signal.

artificial intelligence, deep learning, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2411.04337

Genre:

Research Report > New Finding (0.68)
Research Report > Promising Solution (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

CogDevelop2K: Reversed Cognitive Development in Multimodal Large Language Models

Li, Yijiang, Gao, Qingying, Sun, Haoran, Lyu, Haiyun, Luo, Dezhi, Deng, Hokin

arXiv.org Artificial IntelligenceNov-2-2024

Are Multi-modal Large Language Models (MLLMs) stochastic parrots? Do they genuinely understand? This paper aims to explore the core cognitive abilities that human intelligence builds upon to perceive, comprehend, and reason in MLLMs. To this end, we propose CogDevelop2K, a comprehensive benchmark that spans 12 sub-concepts from primitive knowledge like object permanence and boundary to more complex abilities like intentionality understanding, structured via the developmental trajectory of a human mind. We evaluate 46 MLLMs on our benchmarks. Surprisingly, we observe a reversed cognitive developmental trajectory compared to humans. Comprehensively, we further evaluate the influence of evaluation strategies and prompting techniques. Website with this $\href{https://growing-ai-like-a-child.github.io/}{link}$.

arxiv preprint arxiv, large language model, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2410.10855

Country: North America > United States (1.00)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Neurology (0.67)
Education > Educational Setting > Higher Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback