AITopics | Yang, Cheng

Plotting

Yang, Cheng

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Experiential Co-Learning of Software-Developing Agents

Qian, Chen, Dang, Yufan, Li, Jiahao, Liu, Wei, Chen, Weize, Yang, Cheng, Liu, Zhiyuan, Sun, Maosong

arXiv.org Artificial IntelligenceDec-29-2023

Through large language models (LLMs) have marked a engaging in interactive dialogues, each agent participates transformative shift across numerous domains in instructive and responsive conversations, (Vaswani et al., 2017; Brown et al., 2020; Bubeck collaboratively contributing to the achievement et al., 2023). Despite their impressive abilities, of a cohesive and automated solution for task when dealing with complex situations that extend completion. The development of a more adaptive beyond mere chatting, these models show certain and proactive approach to problem-solving by limitations inherent in their standalone capabilities these agents marks a significant leap in autonomy, (Richards, 2023). Recent research in autonomous going beyond the typical prompt-guided dynamic agents has significantly advanced LLMs in human-computer interactions (Yang et al., by integrating sophisticated features like contextsensitive 2023a) and substantially reducing dependence on memory (Park et al., 2023), multi-step human involvement (Li et al., 2023a; Qian et al., planning (Wei et al., 2022b), and strategic use of external 2023; Wu et al., 2023).

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2312.17025

Country:

North America > Canada (0.14)
Europe > Finland (0.14)
Asia > China (0.14)

Genre: Research Report (0.40)

Industry: Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.68)

Add feedback

Communicative Agents for Software Development

Qian, Chen, Cong, Xin, Liu, Wei, Yang, Cheng, Chen, Weize, Su, Yusheng, Dang, Yufan, Li, Jiahao, Xu, Juyuan, Li, Dahai, Liu, Zhiyuan, Sun, Maosong

arXiv.org Artificial IntelligenceDec-19-2023

Software engineering is a domain characterized by intricate decision-making processes, often relying on nuanced intuition and consultation. Recent advancements in deep learning have started to revolutionize software engineering practices through elaborate designs implemented at various stages of software development. In this paper, we present an innovative paradigm that leverages large language models (LLMs) throughout the entire software development process, streamlining and unifying key processes through natural language communication, thereby eliminating the need for specialized models at each phase. At the core of this paradigm lies ChatDev, a virtual chat-powered software development company that mirrors the established waterfall model, meticulously dividing the development process into four distinct chronological stages: designing, coding, testing, and documenting. Each stage engages a team of "software agents", such as programmers, code reviewers, and test engineers, fostering collaborative dialogue and facilitating a seamless workflow. The chat chain acts as a facilitator, breaking down each stage into atomic subtasks. This enables dual roles, allowing for proposing and validating solutions through context-aware communication, leading to efficient resolution of specific subtasks. The instrumental analysis of ChatDev highlights its remarkable efficacy in software generation, enabling the completion of the entire software development process in under seven minutes at a cost of less than one dollar. It not only identifies and alleviates potential vulnerabilities but also rectifies potential hallucinations while maintaining commendable efficiency and cost-effectiveness. The potential of ChatDev unveils fresh possibilities for integrating LLMs into the realm of software development. Our code is available at https://github.com/OpenBMB/ChatDev.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2307.07924

Country:

Europe (1.00)
North America > United States (0.46)
Asia > China (0.46)
North America > Canada > Ontario (0.14)

Genre: Research Report > New Finding (0.67)

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology:

Information Technology > Software Engineering (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.88)

Add feedback

Graph Invariant Learning with Subgraph Co-mixup for Out-Of-Distribution Generalization

Jia, Tianrui, Li, Haoyang, Yang, Cheng, Tao, Tao, Shi, Chuan

arXiv.org Artificial IntelligenceDec-18-2023

Graph neural networks (GNNs) have been demonstrated to perform well in graph representation learning, but always lacking in generalization capability when tackling out-of-distribution (OOD) data. Graph invariant learning methods, backed by the invariance principle among defined multiple environments, have shown effectiveness in dealing with this issue. However, existing methods heavily rely on well-predefined or accurately generated environment partitions, which are hard to be obtained in practice, leading to sub-optimal OOD generalization performances. In this paper, we propose a novel graph invariant learning method based on invariant and variant patterns co-mixup strategy, which is capable of jointly generating mixed multiple environments and capturing invariant patterns from the mixed graph data. Specifically, we first adopt a subgraph extractor to identify invariant subgraphs. Subsequently, we design one novel co-mixup strategy, i.e., jointly conducting environment Mixup and invariant Mixup. For the environment Mixup, we mix the variant environment-related subgraphs so as to generate sufficiently diverse multiple environments, which is important to guarantee the quality of the graph invariant learning. For the invariant Mixup, we mix the invariant subgraphs, further encouraging to capture invariant patterns behind graphs while getting rid of spurious correlations for OOD generalization. We demonstrate that the proposed environment Mixup and invariant Mixup can mutually promote each other. Extensive experiments on both synthetic and real-world datasets demonstrate that our method significantly outperforms state-of-the-art under various distribution shifts.

artificial intelligence, machine learning, subgraph, (15 more...)

arXiv.org Artificial Intelligence

2312.10988

Country: Asia > China (0.14)

Genre: Research Report > New Finding (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.50)

Add feedback

Towards Graph Foundation Models: A Survey and Beyond

Liu, Jiawei, Yang, Cheng, Lu, Zhiyuan, Chen, Junze, Li, Yibo, Zhang, Mengmei, Bai, Ting, Fang, Yuan, Sun, Lichao, Yu, Philip S., Shi, Chuan

arXiv.org Artificial IntelligenceDec-2-2023

Foundation models have emerged as critical components in a variety of artificial intelligence applications, and showcase significant success in natural language processing and several other domains. Meanwhile, the field of graph machine learning is witnessing a paradigm transition from shallow methods to more sophisticated deep learning approaches. The capabilities of foundation models to generalize and adapt motivate graph machine learning researchers to discuss the potential of developing a new graph learning paradigm. This paradigm envisions models that are pre-trained on extensive graph data and can be adapted for various graph tasks. Despite this burgeoning interest, there is a noticeable lack of clear definitions and systematic analyses pertaining to this new domain. To this end, this article introduces the concept of Graph Foundation Models (GFMs), and offers an exhaustive explanation of their key characteristics and underlying technologies. We proceed to classify the existing work related to GFMs into three distinct categories, based on their dependence on graph neural networks and large language models. In addition to providing a thorough review of the current state of GFMs, this article also outlooks potential avenues for future research in this rapidly evolving domain.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2310.11829

Country:

Asia (0.46)
North America > United States > Illinois (0.14)

Genre:

Research Report (1.00)
Overview (1.00)

Industry:

Information Technology > Services (0.46)
Health & Medicine > Pharmaceuticals & Biotechnology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Specialist or Generalist? Instruction Tuning for Specific NLP Tasks

Shi, Chufan, Su, Yixuan, Yang, Cheng, Yang, Yujiu, Cai, Deng

arXiv.org Artificial IntelligenceOct-23-2023

The potential of large language models (LLMs) to simultaneously perform a wide range of natural language processing (NLP) tasks has been the subject of extensive research. Although instruction tuning has proven to be a data-efficient method for transforming LLMs into such generalist models, their performance still lags behind specialist models trained exclusively for specific tasks. In this paper, we investigate whether incorporating broad-coverage generalist instruction tuning can contribute to building a specialist model. We hypothesize that its efficacy depends on task specificity and skill requirements. Our experiments assess four target tasks with distinct coverage levels, revealing that integrating generalist instruction tuning consistently enhances model performance when the task coverage is broad. The effect is particularly pronounced when the amount of task-specific training data is limited. Further investigation into three target tasks focusing on different capabilities demonstrates that generalist instruction tuning improves understanding and reasoning abilities. However, for tasks requiring factual knowledge, generalist data containing hallucinatory information may negatively affect the model's performance. Overall, our work provides a systematic guide for developing specialist models with general instruction tuning. Our code and other related resources can be found at https://github.com/DavidFanzz/Generalist_or_Specialist.

instruction tuning, large language model, natural language, (3 more...)

arXiv.org Artificial Intelligence

2310.15326

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.73)

Add feedback

AgentVerse: Facilitating Multi-Agent Collaboration and Exploring Emergent Behaviors

Chen, Weize, Su, Yusheng, Zuo, Jingwei, Yang, Cheng, Yuan, Chenfei, Chan, Chi-Min, Yu, Heyang, Lu, Yaxi, Hung, Yi-Hsin, Qian, Chen, Qin, Yujia, Cong, Xin, Xie, Ruobing, Liu, Zhiyuan, Sun, Maosong, Zhou, Jie

arXiv.org Artificial IntelligenceOct-23-2023

Autonomous agents empowered by Large Language Models (LLMs) have undergone significant improvements, enabling them to generalize across a broad spectrum of tasks. However, in real-world scenarios, cooperation among individuals is often required to enhance the efficiency and effectiveness of task accomplishment. Hence, inspired by human group dynamics, we propose a multi-agent framework \framework that can collaboratively and dynamically adjust its composition as a greater-than-the-sum-of-its-parts system. Our experiments demonstrate that \framework framework can effectively deploy multi-agent groups that outperform a single agent. Furthermore, we delve into the emergence of social behaviors among individual agents within a group during collaborative task accomplishment. In view of these behaviors, we discuss some possible strategies to leverage positive ones and mitigate negative ones for improving the collaborative potential of multi-agent groups. Our codes for \framework will soon be released at \url{https://github.com/OpenBMB/AgentVerse}.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2308.10848

Country:

North America > United States (0.68)
Asia > Japan > Honshū > Kansai (0.14)

Genre: Research Report > New Finding (0.92)

Industry:

Transportation > Passenger (1.00)
Transportation > Ground > Road (1.00)
Leisure & Entertainment (1.00)
(8 more...)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (1.00)

Add feedback

Question Answering as Programming for Solving Time-Sensitive Questions

Zhu, Xinyu, Yang, Cheng, Chen, Bei, Li, Siheng, Lou, Jian-Guang, Yang, Yujiu

arXiv.org Artificial IntelligenceOct-20-2023

Question answering plays a pivotal role in human daily life because it involves our acquisition of knowledge about the world. However, due to the dynamic and ever-changing nature of real-world facts, the answer can be completely different when the time constraint in the question changes. Recently, Large Language Models (LLMs) have shown remarkable intelligence in question answering, while our experiments reveal that the aforementioned problems still pose a significant challenge to existing LLMs. This can be attributed to the LLMs' inability to perform rigorous reasoning based on surface-level text semantics. To overcome this limitation, rather than requiring LLMs to directly answer the question, we propose a novel approach where we reframe the $\textbf{Q}$uestion $\textbf{A}$nswering task $\textbf{a}$s $\textbf{P}$rogramming ($\textbf{QAaP}$). Concretely, by leveraging modern LLMs' superior capability in understanding both natural language and programming language, we endeavor to harness LLMs to represent diversely expressed text as well-structured code and select the best matching answer from multiple candidates through programming. We evaluate our QAaP framework on several time-sensitive question answering datasets and achieve decent improvement, up to $14.5$% over strong baselines. Our codes and data are available at https://github.com/TianHongZXY/qaap

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2305.14221

Country:

North America > United States > Oregon > Klamath County (0.15)
North America > United States > Florida > Pinellas County (0.14)

Genre:

Personal (0.93)
Research Report > New Finding (0.46)

Industry:

Education (1.00)
Government > Regional Government > North America Government > United States Government (0.93)
Leisure & Entertainment > Sports > Soccer (0.68)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Graph Mining for Cybersecurity: A Survey

Yan, Bo, Yang, Cheng, Shi, Chuan, Fang, Yong, Li, Qi, Ye, Yanfang, Du, Junping

arXiv.org Artificial IntelligenceOct-16-2023

The explosive growth of cyber attacks nowadays, such as malware, spam, and intrusions, caused severe consequences on society. Securing cyberspace has become an utmost concern for organizations and governments. Traditional Machine Learning (ML) based methods are extensively used in detecting cyber threats, but they hardly model the correlations between real-world cyber entities. In recent years, with the proliferation of graph mining techniques, many researchers investigated these techniques for capturing correlations between cyber entities and achieving high performance. It is imperative to summarize existing graph-based cybersecurity solutions to provide a guide for future studies. Therefore, as a key contribution of this paper, we provide a comprehensive review of graph mining for cybersecurity, including an overview of cybersecurity tasks, the typical graph mining techniques, and the general process of applying them to cybersecurity, as well as various solutions for different cybersecurity tasks. For each task, we probe into relevant methods and highlight the graph types, graph approaches, and task levels in their modeling. Furthermore, we collect open datasets and toolkits for graph-based cybersecurity. Finally, we outlook the potential directions of this field for future research.

artificial intelligence, data mining, machine learning, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3610228

2304.00485

Country:

Asia (1.00)
North America > United States > Indiana (0.14)

Genre:

Research Report (1.00)
Overview (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Military > Cyberwarfare (1.00)
Government > Regional Government > North America Government > United States Government (0.45)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.45)

Add feedback

A Stochastic Online Forecast-and-Optimize Framework for Real-Time Energy Dispatch in Virtual Power Plants under Uncertainty

Jiang, Wei, Yi, Zhongkai, Wang, Li, Zhang, Hanwei, Zhang, Jihai, Lin, Fangquan, Yang, Cheng

arXiv.org Artificial IntelligenceSep-14-2023

Aggregating distributed energy resources in power systems significantly increases uncertainties, in particular caused by the fluctuation of renewable energy generation. This issue has driven the necessity of widely exploiting advanced predictive control techniques under uncertainty to ensure long-term economics and decarbonization. In this paper, we propose a real-time uncertainty-aware energy dispatch framework, which is composed of two key elements: (i) A hybrid forecast-and-optimize sequential task, integrating deep learning-based forecasting and stochastic optimization, where these two stages are connected by the uncertainty estimation at multiple temporal resolutions; (ii) An efficient online data augmentation scheme, jointly involving model pre-training and online fine-tuning stages. In this way, the proposed framework is capable to rapidly adapt to the real-time data distribution, as well as to target on uncertainties caused by data drift, model discrepancy and environment perturbations in the control process, and finally to realize an optimal and robust dispatch solution. The proposed framework won the championship in CityLearn Challenge 2022, which provided an influential opportunity to investigate the potential of AI application in the energy domain. In addition, comprehensive experiments are conducted to interpret its effectiveness in the real-life scenario of smart building energy management.

artificial intelligence, machine learning, predictive control, (15 more...)

arXiv.org Artificial Intelligence

2309.08642

Country:

North America > United States > California (0.28)
Asia (0.28)

Genre: Research Report (0.64)

Industry:

Energy > Renewable (1.00)
Energy > Power Industry (1.00)
Energy > Oil & Gas > Upstream (0.37)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Architecture > Real Time Systems (1.00)

Add feedback

Does Correction Remain A Problem For Large Language Models?

Zhang, Xiaowu, Zhang, Xiaotian, Yang, Cheng, Yan, Hang, Qiu, Xipeng

arXiv.org Artificial IntelligenceAug-14-2023

As large language models, such as GPT, continue to advance the capabilities of natural language processing (NLP), the question arises: does the problem of correction still persist? This paper investigates the role of correction in the context of large language models by conducting two experiments. The first experiment focuses on correction as a standalone task, employing few-shot learning techniques with GPTlike models for error correction. The second experiment explores the notion of correction as Figure 1: The illustration shows the feedback results of a preparatory task for other NLP tasks, examining LLM, humans, and other models (such as Bert) when whether large language models can tolerate encountering the wrong text. LLM can ignore the wrong and perform adequately on texts containing certain text very well. Human beings may be confused when levels of noise or errors. By addressing encountering the wrong text, and if the model encounters these experiments, we aim to shed light on the the wrong text, there is a high probability of error.

artificial intelligence, chatbot, natural language, (19 more...)

arXiv.org Artificial Intelligence

2308.01776

Country:

Europe > Italy (0.14)
Europe > Belgium (0.14)

Genre: Research Report (1.00)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.32)

Add feedback