AITopics | Deng, Naihao

Collaborating Authors

Deng, Naihao

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Are Human Interactions Replicable by Generative Agents? A Case Study on Pronoun Usage in Hierarchical Interactions

Deng, Naihao, Mihalcea, Rada

arXiv.org Artificial IntelligenceJan-25-2025

As Large Language Models (LLMs) advance in their capabilities, researchers have increasingly employed them for social simulation. In this paper, we investigate whether interactions among LLM agents resemble those of humans. Specifically, we focus on the pronoun usage difference between leaders and non-leaders, examining whether the simulation would lead to human-like pronoun usage patterns during the LLMs' interactions. Our evaluation reveals the significant discrepancies between LLM-based simulations and human pronoun usage, with prompt-based or specialized agents failing to demonstrate human-like pronoun usage patterns. In addition, we reveal that even if LLMs understand the human pronoun usage patterns, they fail to demonstrate them in the actual interaction process. Our study highlights the limitations of social simulations based on LLM agents, urging caution in using such social simulation in practitioners' decision-making process.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2501.15283

Country:

Asia > Thailand (0.14)
Europe > Ireland (0.14)

Genre: Research Report > Experimental Study (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.74)

Add feedback

Towards Better Understanding Table Instruction Tuning: Decoupling the Effects from Data versus Models

Deng, Naihao, Zhang, Sheng, Zhu, Henghui, Chang, Shuaichen, Zhang, Jiani, Li, Alexander Hanbo, Hang, Chung-Wei, Kobayashi, Hideo, Hu, Yiqun, Ng, Patrick

arXiv.org Artificial IntelligenceJan-24-2025

Recent advances in natural language processing have leveraged instruction tuning to enhance Large Language Models (LLMs) for table-related tasks. However, previous works train different base models with different training data, lacking an apples-to-apples comparison across the result table LLMs. To address this, we fine-tune base models from the Mistral, OLMo, and Phi families on existing public training datasets. Our replication achieves performance on par with or surpassing existing table LLMs, establishing new state-of-the-art performance on Hitab, a table question-answering dataset. More importantly, through systematic out-of-domain evaluation, we decouple the contributions of training data and the base model, providing insight into their individual impacts. In addition, we assess the effects of table-specific instruction tuning on general-purpose benchmarks, revealing trade-offs between specialization and generalization.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2501.14717

Country:

Asia (0.93)
Europe (0.68)
North America > United States > Texas (0.14)
North America > Mexico > Mexico City (0.14)

Genre: Research Report > New Finding (0.68)

Industry:

Leisure & Entertainment > Sports (0.46)
Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Rethinking Table Instruction Tuning

Deng, Naihao, Mihalcea, Rada

arXiv.org Artificial IntelligenceJan-24-2025

Recent advances in table understanding have focused on instruction-tuning large language models (LLMs) for table-related tasks. However, existing research has overlooked the impact of hyperparameter choices and lacks a comprehensive evaluation of the out-of-domain table understanding ability and the general capabilities of these table LLMs. In this paper, we evaluate these abilities in existing table LLMs, and reveal significant declines in both out-of-domain table understanding and general capabilities compared to their base models. Through systematic analysis, we show that hyperparameters, such as learning rate, can significantly influence both table-specific and general capabilities. Contrary to the existing table instruction-tuning works, we demonstrate that smaller learning rates and fewer training instances can enhance table understanding while preserving general capabilities. Based on our findings, we introduce TAMA, a TAble LLM instruction-tuned from LLaMA 3.1 8B Instruct, which achieves performance on par with, or surpassing GPT-3.5 and GPT-4 on table tasks, while maintaining strong out-of-domain generalization and general capabilities. Our findings highlight the potential for reduced data annotation costs and more efficient model development through careful hyperparameter selection.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2501.14693

Country:

Asia (0.93)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report > New Finding (1.00)

Industry:

Leisure & Entertainment > Sports (0.67)
Government > Military (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Table as Thought: Exploring Structured Thoughts in LLM Reasoning

Sun, Zhenjie, Deng, Naihao, Yu, Haofei, You, Jiaxuan

arXiv.org Artificial IntelligenceJan-3-2025

Large language models' reasoning abilities benefit from methods that organize their thought processes, such as chain-of-thought prompting, which employs a sequential structure to guide the reasoning process step-by-step. However, existing approaches focus primarily on organizing the sequence of thoughts, leaving structure in individual thought steps underexplored. To address this gap, we propose Table as Thought, a framework inspired by cognitive neuroscience theories on human thought. Table as Thought organizes reasoning within a tabular schema, where rows represent sequential thought steps and columns capture critical constraints and contextual information to enhance reasoning. The reasoning process iteratively populates the table until self-verification ensures completeness and correctness. Our experiments show that Table as Thought excels in planning tasks and demonstrates a strong potential for enhancing LLM performance in mathematical reasoning compared to unstructured thought baselines. This work provides a novel exploration of refining thought representation within LLMs, paving the way for advancements in reasoning and AI cognition.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2501.02152

Country: North America > United States (0.67)

Genre:

Workflow (0.87)
Research Report > New Finding (0.68)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.51)

Add feedback

Chumor 2.0: Towards Benchmarking Chinese Humor Understanding

He, Ruiqi, He, Yushu, Bai, Longju, Liu, Jiarui, Sun, Zhenjie, Tang, Zenghao, Wang, He, Xia, Hanchen, Mihalcea, Rada, Deng, Naihao

arXiv.org Artificial IntelligenceDec-23-2024

Existing humor datasets and evaluations predominantly focus on English, leaving limited resources for culturally nuanced humor in non-English languages like Chinese. To address this gap, we construct Chumor, the first Chinese humor explanation dataset that exceeds the size of existing humor datasets. Chumor is sourced from Ruo Zhi Ba, a Chinese Reddit-like platform known for sharing intellectually challenging and culturally specific jokes. We test ten LLMs through direct and chain-of-thought prompting, revealing that Chumor poses significant challenges to existing LLMs, with their accuracy slightly above random and far below human. In addition, our analysis highlights that human-annotated humor explanations are significantly better than those generated by GPT-4o and ERNIE-4-turbo. We release Chumor at https://huggingface.co/datasets/dnaihao/Chumor, our project page is at https://dnaihao.github.io/Chumor-dataset/, our leaderboard is at https://huggingface.co/spaces/dnaihao/Chumor, and our codebase is at https://github.com/dnaihao/Chumor-dataset.

explanation, large language model, machine learning, (21 more...)

arXiv.org Artificial Intelligence

2412.17729

Country:

Europe (1.00)
North America > Canada (0.68)
Asia > China (0.68)
North America > United States > Minnesota (0.28)

Genre: Research Report (0.82)

Industry:

Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.67)
Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

$R^3$: "This is My SQL, Are You With Me?" A Consensus-Based Multi-Agent System for Text-to-SQL Tasks

Xia, Hanchen, Jiang, Feng, Deng, Naihao, Wang, Cunxiang, Zhao, Guojiang, Mihalcea, Rada, Zhang, Yue

arXiv.org Artificial IntelligenceJul-10-2024

Large Language Models (LLMs) have demonstrated strong performance on various tasks. To unleash their power on the Text-to-SQL task, we propose $R^3$ (Review-Rebuttal-Revision), a consensus-based multi-agent system for Text-to-SQL tasks. $R^3$ outperforms the existing single LLM Text-to-SQL systems as well as the multi-agent Text-to-SQL systems by $1.3\%$ to $8.1\%$ on Spider and Bird. Surprisingly, we find that for Llama-3-8B, $R^3$ outperforms chain-of-thought prompting by over 20\%, even outperforming GPT-3.5 on the development set of Spider.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2402.14851

Country:

Asia (0.46)
North America > United States (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Chumor 1.0: A Truly Funny and Challenging Chinese Humor Understanding Dataset from Ruo Zhi Ba

He, Ruiqi, He, Yushu, Bai, Longju, Liu, Jiarui, Sun, Zhenjie, Tang, Zenghao, Wang, He, Xia, Hanchen, Deng, Naihao

arXiv.org Artificial IntelligenceJun-18-2024

Existing humor datasets and evaluations predominantly focus on English, lacking resources for culturally nuanced humor in non-English languages like Chinese. To address this gap, we construct Chumor, a dataset sourced from Ruo Zhi Ba (RZB), a Chinese Reddit-like platform dedicated to sharing intellectually challenging and culturally specific jokes. We annotate explanations for each joke and evaluate human explanations against two state-of-the-art LLMs, GPT-4o and ERNIE Bot, through A/B testing by native Chinese speakers. Our evaluation shows that Chumor is challenging even for SOTA LLMs, and the human explanations for Chumor jokes are significantly better than explanations generated by the LLMs.

explanation, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2406.12754

Country:

Europe (1.00)
Asia > China (0.69)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.14)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.68)
Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Tables as Texts or Images: Evaluating the Table Reasoning Ability of LLMs and MLLMs

Deng, Naihao, Sun, Zhenjie, He, Ruiqi, Sikka, Aman, Chen, Yulong, Ma, Lin, Zhang, Yue, Mihalcea, Rada

arXiv.org Artificial IntelligenceJun-5-2024

Specifically, we investigate Recent years have witnessed an explosion of Large several research questions, including the effectiveness Language Models (LLMs), with impressive performance of image-based representation of tabular on various Natural Language Processing data and how different text-based or imagebased (NLP) tasks (Brown et al., 2020; Touvron et al., prompt methods affect LLMs' performance 2023; Team et al., 2023). Research to date has on table-related tasks. In addition, we provide analysis examined the performance of LLMs for various and hypothesis of LLMs' behaviors. Our findings aspects and abilities (Bang et al., 2023b; Bubeck include: et al., 2023; Akter et al., 2023), but their effectiveness on structured data such as tables is less explored. LLMs maintain decent performance when we Unlike unstructured text, tables are systematically use image-based table representations. Sometimes, organized structures of a large amount of image-based table representations can information. This characteristic makes tabular make LLMs perform better.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2402.12424

Country:

North America > United States (1.00)
Europe (0.67)
Asia > Middle East > UAE (0.14)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.66)

Industry: Consumer Products & Services (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Task-Adaptive Tokenization: Enhancing Long-Form Text Generation Efficacy in Mental Health and Beyond

Liu, Siyang, Deng, Naihao, Sabour, Sahand, Jia, Yilin, Huang, Minlie, Mihalcea, Rada

arXiv.org Artificial IntelligenceNov-13-2023

We propose task-adaptive tokenization as a way to adapt the generation pipeline to the specifics of a downstream task and enhance long-form generation in mental health. Inspired by insights from cognitive science, our task-adaptive tokenizer samples variable segmentations from multiple outcomes, with sampling probabilities optimized based on task-specific data. We introduce a strategy for building a specialized vocabulary and introduce a vocabulary merging protocol that allows for the integration of task-specific tokens into the pre-trained model's tokenization step. Through extensive experiments on psychological question-answering tasks in both Chinese and English, we find that our task-adaptive tokenization approach brings a significant improvement in generation performance while using up to 60% fewer tokens. Preliminary experiments point to promising results when using our tokenization approach with very large language models.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2310.05317

Country:

Europe (1.00)
North America > United States > Michigan (0.14)
Asia > Middle East > UAE (0.14)

Genre: Research Report > New Finding (0.68)

Industry: Health & Medicine > Therapeutic Area > Psychiatry/Psychology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.70)

Add feedback

HI-TOM: A Benchmark for Evaluating Higher-Order Theory of Mind Reasoning in Large Language Models

He, Yinghui, Wu, Yufan, Jia, Yilin, Mihalcea, Rada, Chen, Yulong, Deng, Naihao

arXiv.org Artificial IntelligenceOct-25-2023

Theory of Mind (ToM) is the ability to reason about one's own and others' mental states. ToM plays a critical role in the development of intelligence, language understanding, and cognitive processes. While previous work has primarily focused on first and second-order ToM, we explore higher-order ToM, which involves recursive reasoning on others' beliefs. We introduce HI-TOM, a Higher Order Theory of Mind benchmark. Our experimental evaluation using various Large Language Models (LLMs) indicates a decline in performance on higher-order ToM tasks, demonstrating the limitations of current LLMs. We conduct a thorough analysis of different failure cases of LLMs, and share our thoughts on the implications of our findings on the future of NLP.

higher-order theory, large language model, natural language, (3 more...)

arXiv.org Artificial Intelligence

2310.16755

Genre: Research Report (0.69)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback