Overview
Talking with Machines: A Comprehensive Survey of Emergent Dialogue Systems
From the earliest experiments in the 20th century to the utilization of large language models and transformers, dialogue systems research has continued to evolve, playing crucial roles in numerous fields. This paper offers a comprehensive review of these systems, tracing their historical development and examining their fundamental operations. We analyze popular and emerging datasets for training and survey key contributions in dialogue systems research, including traditional systems and advanced machine learning methods. Finally, we consider conventional and transformer-based evaluation metrics, followed by a short discussion of prevailing challenges and future prospects in the field.
A Glimpse in ChatGPT Capabilities and its impact for AI research
Joublin, Frank, Ceravola, Antonello, Deigmoeller, Joerg, Gienger, Michael, Franzius, Mathias, Eggert, Julian
Large language models (LLMs) have recently become a popular topic in the field of Artificial Intelligence (AI) research, with companies such as Google, Amazon, Facebook, Amazon, Tesla, and Apple (GAFA) investing heavily in their development. These models are trained on massive amounts of data and can be used for a wide range of tasks, including language translation, text generation, and question answering. However, the computational resources required to train and run these models are substantial, and the cost of hardware and electricity can be prohibitive for research labs that do not have the funding and resources of the GAFA. In this paper, we will examine the impact of LLMs on AI research. The pace at which such models are generated as well as the range of domains covered is an indication of the trend which not only the public but also the scientific community is currently experiencing. We give some examples on how to use such models in research by focusing on GPT3.5/ChatGPT3.4 and ChatGPT4 at the current state and show that such a range of capabilities in a single system is a strong sign of approaching general intelligence. Innovations integrating such models will also expand along the maturation of such AI systems and exhibit unforeseeable applications that will have important impacts on several aspects of our societies.
Fast Teammate Adaptation in the Presence of Sudden Policy Change
Zhang, Ziqian, Yuan, Lei, Li, Lihe, Xue, Ke, Jia, Chengxing, Guan, Cong, Qian, Chao, Yu, Yang
In cooperative multi-agent reinforcement learning (MARL), where an agent coordinates with teammate(s) for a shared goal, it may sustain non-stationary caused by the policy change of teammates. Prior works mainly concentrate on the policy change during the training phase or teammates altering cross episodes, ignoring the fact that teammates may suffer from policy change suddenly within an episode, which might lead to miscoordination and poor performance as a result. We formulate the problem as an open Dec-POMDP, where we control some agents to coordinate with uncontrolled teammates, whose policies could be changed within one episode. Then we develop a new framework, fast teammates adaptation (Fastap), to address the problem. Concretely, we first train versatile teammates' policies and assign them to different clusters via the Chinese Restaurant Process (CRP). Then, we train the controlled agent(s) to coordinate with the sampled uncontrolled teammates by capturing their identifications as context for fast adaptation. Finally, each agent applies its local information to anticipate the teammates' context for decision-making accordingly. This process proceeds alternately, leading to a robust policy that can adapt to any teammates during the decentralized execution phase. We show in multiple multi-agent benchmarks that Fastap can achieve superior performance than multiple baselines in stationary and non-stationary scenarios.
Motion Planning for Autonomous Driving: The State of the Art and Future Perspectives
Teng, Siyu, Hu, Xuemin, Deng, Peng, Li, Bai, Li, Yuchen, Yang, Dongsheng, Ai, Yunfeng, Li, Lingxi, Xuanyuan, Zhe, Zhu, Fenghua, Chen, Long
Intelligent vehicles (IVs) have gained worldwide attention due to their increased convenience, safety advantages, and potential commercial value. Despite predictions of commercial deployment by 2025, implementation remains limited to small-scale validation, with precise tracking controllers and motion planners being essential prerequisites for IVs. This paper reviews state-of-the-art motion planning methods for IVs, including pipeline planning and end-to-end planning methods. The study examines the selection, expansion, and optimization operations in a pipeline method, while it investigates training approaches and validation scenarios for driving tasks in end-to-end methods. Experimental platforms are reviewed to assist readers in choosing suitable training and validation strategies. A side-by-side comparison of the methods is provided to highlight their strengths and limitations, aiding system-level design choices. Current challenges and future perspectives are also discussed in this survey.
Visual Tuning
Yu, Bruce X. B., Chang, Jianlong, Wang, Haixin, Liu, Lingbo, Wang, Shijie, Wang, Zhiyu, Lin, Junfan, Xie, Lingxi, Li, Haojie, Lin, Zhouchen, Tian, Qi, Chen, Chang Wen
Fine-tuning visual models has been widely shown promising performance on many downstream visual tasks. With the surprising development of pre-trained visual foundation models, visual tuning jumped out of the standard modus operandi that fine-tunes the whole pre-trained model or just the fully connected layer. Instead, recent advances can achieve superior performance than full-tuning the whole pre-trained parameters by updating far fewer parameters, enabling edge devices and downstream applications to reuse the increasingly large foundation models deployed on the cloud. With the aim of helping researchers get the full picture and future directions of visual tuning, this survey characterizes a large and thoughtful selection of recent works, providing a systematic and comprehensive overview of existing work and models. Specifically, it provides a detailed background of visual tuning and categorizes recent visual tuning techniques into five groups: prompt tuning, adapter tuning, parameter tuning, and remapping tuning. Meanwhile, it offers some exciting research directions for prospective pre-training and various interactions in visual tuning.
Achieving Diversity in Counterfactual Explanations: a Review and Discussion
Laugel, Thibault, Jeyasothy, Adulam, Lesot, Marie-Jeanne, Marsala, Christophe, Detyniecki, Marcin
In the field of Explainable Artificial Intelligence (XAI), counterfactual examples explain to a user the predictions of a trained decision model by indicating the modifications to be made to the instance so as to change its associated prediction. These counterfactual examples are generally defined as solutions to an optimization problem whose cost function combines several criteria that quantify desiderata for a good explanation meeting user needs. A large variety of such appropriate properties can be considered, as the user needs are generally unknown and differ from one user to another; their selection and formalization is difficult. To circumvent this issue, several approaches propose to generate, rather than a single one, a set of diverse counterfactual examples to explain a prediction. This paper proposes a review of the numerous, sometimes conflicting, definitions that have been proposed for this notion of diversity. It discusses their underlying principles as well as the hypotheses on the user needs they rely on and proposes to categorize them along several dimensions (explicit vs implicit, universe in which they are defined, level at which they apply), leading to the identification of further research challenges on this topic.
Towards the Characterization of Representations Learned via Capsule-based Network Architectures
AL-Tawalbeh, Saja, Oramas, José
Capsule Networks (CapsNets) have been re-introduced as a more compact and interpretable alternative to standard deep neural networks. While recent efforts have proved their compression capabilities, to date, their interpretability properties have not been fully assessed. Here, we conduct a systematic and principled study towards assessing the interpretability of these types of networks. Moreover, we pay special attention towards analyzing the level to which part-whole relationships are indeed encoded within the learned representation. Our analysis in the MNIST, SVHN, PASCAL-part and CelebA datasets suggest that the representations encoded in CapsNets might not be as disentangled nor strictly related to parts-whole relationships as is commonly stated in the literature.
Vision-Language Models in Remote Sensing: Current Progress and Future Trends
Wen, Congcong, Hu, Yuan, Li, Xiang, Yuan, Zhenghang, Zhu, Xiao Xiang
The remarkable achievements of ChatGPT and GPT-4 have sparked a wave of interest and research in the field of large language models for Artificial General Intelligence (AGI). These models provide us with intelligent solutions that are more similar to human thinking, enabling us to use general artificial intelligence to solve problems in various applications. However, in the field of remote sensing, the scientific literature on the implementation of AGI remains relatively scant. Existing AI-related research primarily focuses on visual understanding tasks while neglecting the semantic understanding of the objects and their relationships. This is where vision-language models excel, as they enable reasoning about images and their associated textual descriptions, allowing for a deeper understanding of the underlying semantics. Vision-language models can go beyond recognizing the objects in an image and can infer the relationships between them, as well as generate natural language descriptions of the image. This makes them better suited for tasks that require both visual and textual understanding, such as image captioning, text-based image retrieval, and visual question answering. This paper provides a comprehensive review of the research on vision-language models in remote sensing, summarizing the latest progress, highlighting the current challenges, and identifying potential research opportunities. Specifically, we review the application of vision-language models in several mainstream remote sensing tasks, including image captioning, text-based image generation, text-based image retrieval, visual question answering, scene classification, semantic segmentation, and object detection. For each task, we briefly describe the task background and review some representative works. Finally, we summarize the limitations of existing work and provide some possible directions for future development.
Multimodal Learning with Transformers: A Survey
Xu, Peng, Zhu, Xiatian, Clifton, David A.
Transformer is a promising neural network learner, and has achieved great success in various machine learning tasks. Thanks to the recent prevalence of multimodal applications and big data, Transformer-based multimodal learning has become a hot topic in AI research. This paper presents a comprehensive survey of Transformer techniques oriented at multimodal data. The main contents of this survey include: (1) a background of multimodal learning, Transformer ecosystem, and the multimodal big data era, (2) a theoretical review of Vanilla Transformer, Vision Transformer, and multimodal Transformers, from a geometrically topological perspective, (3) a review of multimodal Transformer applications, via two important paradigms, i.e., for multimodal pretraining and for specific multimodal tasks, (4) a summary of the common challenges and designs shared by the multimodal Transformer models and applications, and (5) a discussion of open problems and potential research directions for the community.
A Systematic Literature Review on Hardware Reliability Assessment Methods for Deep Neural Networks
Ahmadilivani, Mohammad Hasan, Taheri, Mahdi, Raik, Jaan, Daneshtalab, Masoud, Jenihhin, Maksim
Artificial Intelligence (AI) and, in particular, Machine Learning (ML) have emerged to be utilized in various applications due to their capability to learn how to solve complex problems. Over the last decade, rapid advances in ML have presented Deep Neural Networks (DNNs) consisting of a large number of neurons and layers. DNN Hardware Accelerators (DHAs) are leveraged to deploy DNNs in the target applications. Safety-critical applications, where hardware faults/errors would result in catastrophic consequences, also benefit from DHAs. Therefore, the reliability of DNNs is an essential subject of research. In recent years, several studies have been published accordingly to assess the reliability of DNNs. In this regard, various reliability assessment methods have been proposed on a variety of platforms and applications. Hence, there is a need to summarize the state of the art to identify the gaps in the study of the reliability of DNNs. In this work, we conduct a Systematic Literature Review (SLR) on the reliability assessment methods of DNNs to collect relevant research works as much as possible, present a categorization of them, and address the open challenges. Through this SLR, three kinds of methods for reliability assessment of DNNs are identified including Fault Injection (FI), Analytical, and Hybrid methods. Since the majority of works assess the DNN reliability by FI, we characterize different approaches and platforms of the FI method comprehensively. Moreover, Analytical and Hybrid methods are propounded. Thus, different reliability assessment methods for DNNs have been elaborated on their conducted DNN platforms and reliability evaluation metrics. Finally, we highlight the advantages and disadvantages of the identified methods and address the open challenges in the research area.