AITopics | Wei, Zhongyu

Plotting

Wei, Zhongyu

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

DELAN: Dual-Level Alignment for Vision-and-Language Navigation by Cross-Modal Contrastive Learning

Du, Mengfei, Wu, Binhao, Zhang, Jiwen, Fan, Zhihao, Li, Zejun, Luo, Ruipu, Huang, Xuanjing, Wei, Zhongyu

arXiv.org Artificial IntelligenceApr-2-2024

Vision-and-Language navigation (VLN) requires an agent to navigate in unseen environment by following natural language instruction. For task completion, the agent needs to align and integrate various navigation modalities, including instruction, observation and navigation history. Existing works primarily concentrate on cross-modal attention at the fusion stage to achieve this objective. Nevertheless, modality features generated by disparate uni-encoders reside in their own spaces, leading to a decline in the quality of cross-modal fusion and decision. To address this problem, we propose a Dual-levEL AligNment (DELAN) framework by cross-modal contrastive learning. This framework is designed to align various navigation-related modalities before fusion, thereby enhancing cross-modal interaction and action decision-making. Specifically, we divide the pre-fusion alignment into dual levels: instruction-history level and landmark-observation level according to their semantic correlations. We also reconstruct a dual-level instruction for adaptation to the dual-level alignment. As the training signals for pre-fusion alignment are extremely limited, self-supervised contrastive learning strategies are employed to enforce the matching between different modalities. Our approach seamlessly integrates with the majority of existing models, resulting in improved navigation performance on various VLN benchmarks, including R2R, R4R, RxR and CVDN.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2404.01994

Country:

Asia > China (0.14)
Asia > Middle East > Israel (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

ALaRM: Align Language Models via Hierarchical Rewards Modeling

Lai, Yuhang, Wang, Siyuan, Liu, Shujun, Huang, Xuanjing, Wei, Zhongyu

arXiv.org Artificial IntelligenceMar-16-2024

We introduce ALaRM, the first framework modeling hierarchical rewards in reinforcement learning from human feedback (RLHF), which is designed to enhance the alignment of large language models (LLMs) with human preferences. The framework addresses the limitations of current alignment approaches, which often struggle with the inconsistency and sparsity of human supervision signals, by integrating holistic rewards with aspect-specific rewards. This integration enables more precise and consistent guidance of language models towards desired outcomes, particularly in complex and open text generation tasks. By employing a methodology that filters and combines multiple rewards based on their consistency, the framework provides a reliable mechanism for improving model alignment. We validate our approach through applications in long-form question answering and machine translation tasks, employing gpt-3.5-turbo for pairwise comparisons, and demonstrate improvements over existing baselines. Our work underscores the effectiveness of hierarchical rewards modeling in refining LLM training processes for better human preference alignment. We release our code at https://ALaRM-fdu.github.io.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2403.06754

Country:

Europe > Croatia (0.14)
Asia > Middle East > Republic of Türkiye (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Android in the Zoo: Chain-of-Action-Thought for GUI Agents

Zhang, Jiwen, Wu, Jihao, Teng, Yihua, Liao, Minghui, Xu, Nuo, Xiao, Xiao, Wei, Zhongyu, Tang, Duyu

arXiv.org Artificial IntelligenceMar-5-2024

Large language model (LLM) leads to a surge of autonomous GUI agents for smartphone, which completes a task triggered by natural language through predicting a sequence of actions of API. Even though the task highly relies on past actions and visual observations, existing studies typical consider little semantic information carried out by intermediate screenshots and screen operations. To address this, this work presents Chain-of-Action-Thought (dubbed CoAT), which takes the description of the previous actions, the current screen, and more importantly the action thinking of what actions should be performed and the outcomes led by the chosen action. We demonstrate that, in a zero-shot setting upon an off-the-shell LLM, CoAT significantly improves the goal progress compared to standard context modeling. To further facilitate the research in this line, we construct a benchmark Android-In-The-Zoo (AitZ), which contains 18,643 screen-action pairs together with chain-of-action-thought annotations. Experiments show that fine-tuning a 200M model on our AitZ dataset achieves on par performance with CogAgent-Chat-18B.

artificial intelligence, large language model, natural language, (17 more...)

arXiv.org Artificial Intelligence

2403.02713

Country:

North America > United States (0.14)
North America > Mexico (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

SoMeLVLM: A Large Vision Language Model for Social Media Processing

Zhang, Xinnong, Kuang, Haoyu, Mou, Xinyi, Lyu, Hanjia, Wu, Kun, Chen, Siming, Luo, Jiebo, Huang, Xuanjing, Wei, Zhongyu

arXiv.org Artificial IntelligenceFeb-20-2024

The growth of social media, characterized by its multimodal nature, has led to the emergence of diverse phenomena and challenges, which calls for an effective approach to uniformly solve automated tasks. The powerful Large Vision Language Models make it possible to handle a variety of tasks simultaneously, but even with carefully designed prompting methods, the general domain models often fall short in aligning with the unique speaking style and context of social media tasks. In this paper, we introduce a Large Vision Language Model for Social Media Processing (SoMeLVLM), which is a cognitive framework equipped with five key capabilities including knowledge & comprehension, application, analysis, evaluation, and creation. SoMeLVLM is designed to understand and generate realistic social media behavior. We have developed a 654k multimodal social media instruction-tuning dataset to support our cognitive framework and fine-tune our model. Our experiments demonstrate that SoMeLVLM achieves state-of-the-art performance in multiple social media tasks. Further analysis shows its significant advantages over baselines in terms of cognitive abilities.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2402.13022

Country:

Europe (0.67)
North America > United States > Minnesota (0.14)
Asia > Middle East > UAE (0.14)

Genre: Research Report (0.64)

Industry:

Media > News (0.69)
Information Technology (0.67)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.34)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

CauESC: A Causal Aware Model for Emotional Support Conversation

Chen, Wei, Lin, Hengxu, Zhang, Qun, Zhang, Xiaojin, Bai, Xiang, Huang, Xuanjing, Wei, Zhongyu

arXiv.org Artificial IntelligenceJan-31-2024

Emotional Support Conversation aims at reducing the seeker's emotional distress through supportive response. Existing approaches have two limitations: (1) They ignore the emotion causes of the distress, which is important for fine-grained emotion understanding; (2) They focus on the seeker's own mental state rather than the emotional dynamics during interaction between speakers. To address these issues, we propose a novel framework CauESC, which firstly recognizes the emotion causes of the distress, as well as the emotion effects triggered by the causes, and then understands each strategy of verbal grooming independently and integrates them skillfully. Experimental results on the benchmark dataset demonstrate the effectiveness of our approach and show the benefits of emotion understanding from cause to effect and independent-integrated strategy modeling.

computational linguistic, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2401.17755

Country:

North America > United States > Texas (0.14)
North America > United States > California (0.14)
Europe > Austria > Vienna (0.14)
Asia > Middle East > UAE (0.14)

Genre: Research Report (0.50)

Industry: Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Mental Health (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.95)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.70)

Add feedback

Argue with Me Tersely: Towards Sentence-Level Counter-Argument Generation

Lin, Jiayu, Ye, Rong, Han, Meng, Zhang, Qi, Lai, Ruofei, Zhang, Xinyu, Cao, Zhao, Huang, Xuanjing, Wei, Zhongyu

arXiv.org Artificial IntelligenceDec-21-2023

Counter-argument generation -- a captivating area in computational linguistics -- seeks to craft statements that offer opposing views. While most research has ventured into paragraph-level generation, sentence-level counter-argument generation beckons with its unique constraints and brevity-focused challenges. Furthermore, the diverse nature of counter-arguments poses challenges for evaluating model performance solely based on n-gram-based metrics. In this paper, we present the ArgTersely benchmark for sentence-level counter-argument generation, drawing from a manually annotated dataset from the ChangeMyView debate forum. We also propose Arg-LlaMA for generating high-quality counter-argument. For better evaluation, we trained a BERT-based evaluator Arg-Judge with human preference data. We conducted comparative experiments involving various baselines such as LlaMA, Alpaca, GPT-3, and others. The results show the competitiveness of our proposed framework and evaluator in counter-argument generation tasks. Code and data are available at https://github.com/amazingljy1206/ArgTersely.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2312.13608

Country:

Europe (1.00)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report > New Finding (0.88)

Industry:

Law (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

K-ESConv: Knowledge Injection for Emotional Support Dialogue Systems via Prompt Learning

Chen, Wei, Zhao, Gang, Zhang, Xiaojin, Bai, Xiang, Huang, Xuanjing, Wei, Zhongyu

arXiv.org Artificial IntelligenceDec-16-2023

Automatic psychological counseling requires mass of professional knowledge that can be found in online counseling forums. Motivated by this, we propose K-ESConv, a novel prompt learning based knowledge injection method for emotional support dialogue system, transferring forum knowledge to response generation. We evaluate our model on an emotional support dataset ESConv, where the model retrieves and incorporates knowledge from external professional emotional Q\&A forum. Experiment results show that the proposed method outperforms existing baselines on both automatic evaluation and human evaluation, which shows that our approach significantly improves the correlation and diversity of responses and provides more comfort and better suggestion for the seeker.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2312.10371

Genre: Research Report > New Finding (0.34)

Industry: Health & Medicine (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.96)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.48)
Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (0.47)

Add feedback

AR-Diffusion: Auto-Regressive Diffusion Model for Text Generation

Wu, Tong, Fan, Zhihao, Liu, Xiao, Gong, Yeyun, Shen, Yelong, Jiao, Jian, Zheng, Hai-Tao, Li, Juntao, Wei, Zhongyu, Guo, Jian, Duan, Nan, Chen, Weizhu

arXiv.org Artificial IntelligenceDec-13-2023

Diffusion models have gained significant attention in the realm of image generation due to their exceptional performance. Their success has been recently expanded to text generation via generating all tokens within a sequence concurrently. However, natural language exhibits a far more pronounced sequential dependency in comparison to images, and the majority of existing language models are trained with a left-to-right auto-regressive approach. To account for the inherent sequential characteristic of natural language, we introduce Auto-Regressive Diffusion (AR-Diffusion). AR-Diffusion ensures that the generation of tokens on the right depends on the generated ones on the left, a mechanism achieved through employing a dynamic number of denoising steps that vary based on token position. This results in tokens on the left undergoing fewer denoising steps than those on the right, thereby enabling them to generate earlier and subsequently influence the generation of tokens on the right. In a series of experiments on various text generation tasks, including text summarization, machine translation, and common sense generation, AR-Diffusion clearly demonstrated its superiority over existing diffusion language models and that it can be $100\times\sim600\times$ faster when achieving comparable results. Our code is available at https://github.com/microsoft/ProphetNet/tree/master/AR-diffusion.

ar-d iffusion, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2305.09515

Country:

Europe > Belgium (0.14)
Asia > Middle East > UAE (0.14)
Asia > China (0.14)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.88)

Add feedback

Hi-ArG: Exploring the Integration of Hierarchical Argumentation Graphs in Language Pretraining

Liang, Jingcong, Ye, Rong, Han, Meng, Zhang, Qi, Lai, Ruofei, Zhang, Xinyu, Cao, Zhao, Huang, Xuanjing, Wei, Zhongyu

arXiv.org Artificial IntelligenceDec-1-2023

The knowledge graph is a structure to store and represent knowledge, and recent studies have discussed its capability to assist language models for various applications. Some variations of knowledge graphs aim to record arguments and their relations for computational argumentation tasks. However, many must simplify semantic types to fit specific schemas, thus losing flexibility and expression ability. In this paper, we propose the Hierarchical Argumentation Graph (Hi-ArG), a new structure to organize arguments. We also introduce two approaches to exploit Hi-ArG, including a text-graph multi-modal model GreaseArG and a new pre-training framework augmented with graph information. Experiments on two argumentation tasks have shown that after further pre-training and fine-tuning, GreaseArG supersedes same-scale language models on these tasks, while incorporating graph information during further pre-training can also improve the performance of vanilla language models. Code for this paper is available at https://github.com/ljcleo/Hi-ArG .

computational linguistic, large language model, machine learning, (21 more...)

arXiv.org Artificial Intelligence

2312.00874

Country:

Europe (1.00)
North America > United States > California (0.14)
Asia > Middle East > Qatar (0.14)
(4 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Explanation & Argumentation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

GPT-4V(ision) as A Social Media Analysis Engine

Lyu, Hanjia, Huang, Jinfa, Zhang, Daoan, Yu, Yongsheng, Mou, Xinyi, Pan, Jinsheng, Yang, Zhengyuan, Wei, Zhongyu, Luo, Jiebo

arXiv.org Artificial IntelligenceNov-13-2023

Recent research has offered insights into the extraordinary capabilities of Large Multimodal Models (LMMs) in various general vision and language tasks. There is growing interest in how LMMs perform in more specialized domains. Social media content, inherently multimodal, blends text, images, videos, and sometimes audio. Understanding social multimedia content remains a challenging problem for contemporary machine learning frameworks. In this paper, we explore GPT-4V(ision)'s capabilities for social multimedia analysis. We select five representative tasks, including sentiment analysis, hate speech detection, fake news identification, demographic inference, and political ideology detection, to evaluate GPT-4V. Our investigation begins with a preliminary quantitative analysis for each task using existing benchmark datasets, followed by a careful review of the results and a selection of qualitative samples that illustrate GPT-4V's potential in understanding multimodal social media content. GPT-4V demonstrates remarkable efficacy in these tasks, showcasing strengths such as joint understanding of image-text pairs, contextual and cultural awareness, and extensive commonsense knowledge. Despite the overall impressive capacity of GPT-4V in the social media domain, there remain notable challenges. GPT-4V struggles with tasks involving multilingual social multimedia comprehension and has difficulties in generalizing to the latest trends in social media. Additionally, it exhibits a tendency to generate erroneous information in the context of evolving celebrity and politician knowledge, reflecting the known hallucination problem. The insights gleaned from our findings underscore a promising future for LMMs in enhancing our comprehension of social media content and its users through the analysis of multimodal information.

large language model, machine learning, natural language, (6 more...)

arXiv.org Artificial Intelligence

2311.07547

Genre: Research Report > New Finding (0.53)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence (1.00)

Add feedback