Overview
CogNarr Ecosystem: Facilitating Group Cognition at Scale
Human groups of all sizes and kinds engage in deliberation, problem solving, strategizing, decision making, and more generally, cognition. Some groups are large, and that setting presents unique challenges. The small-group setting often involves face-to-face dialogue, but group cognition in the large-group setting typically requires some form of online interaction. New approaches are needed to facilitate the kind of rich communication and information processing that are required for effective, functional cognition in the online setting, especially for groups characterized by thousands to millions of participants who wish to share potentially complex, nuanced, and dynamic perspectives. This concept paper proposes the CogNarr (Cognitive Narrative) ecosystem, which is designed to facilitate functional cognition in the large-group setting. The paper's contribution is a novel vision as to how recent developments in cognitive science, artificial intelligence, natural language processing, and related fields might be scaled and applied to large-group cognition, using an approach that itself promotes further scientific advancement. A key perspective is to view a group as an organism that uses some form of cognitive architecture to sense the world, process information, remember, learn, predict, make decisions, and adapt to changing conditions. The CogNarr ecosystem is designed to serve as a component within that architecture.
Knowledge Mechanisms in Large Language Models: A Survey and Perspective
Wang, Mengru, Yao, Yunzhi, Xu, Ziwen, Qiao, Shuofei, Deng, Shumin, Wang, Peng, Chen, Xiang, Gu, Jia-Chen, Jiang, Yong, Xie, Pengjun, Huang, Fei, Chen, Huajun, Zhang, Ningyu
Understanding knowledge mechanisms in Large Language Models (LLMs) is crucial for advancing towards trustworthy AGI. This paper reviews knowledge mechanism analysis from a novel taxonomy including knowledge utilization and evolution. Knowledge utilization delves into the mechanism of memorization, comprehension and application, and creation. Knowledge evolution focuses on the dynamic progression of knowledge within individual and group LLMs. Moreover, we discuss what knowledge LLMs have learned, the reasons for the fragility of parametric knowledge, and the potential dark knowledge (hypothesis) that will be challenging to address. We hope this work can help understand knowledge in LLMs and provide insights for future research.
A State-of-the-Art Review of Computational Models for Analyzing Longitudinal Wearable Sensor Data in Healthcare
Wearable devices are increasingly used as tools for biomedical research, as the continuous stream of behavioral and physiological data they collect can provide insights about our health in everyday contexts. Long-term tracking, defined in the timescale of months of year, can provide insights of patterns and changes as indicators of health changes. These insights can make medicine and healthcare more predictive, preventive, personalized, and participative (The 4P's). However, the challenges in modeling, understanding and processing longitudinal data are a significant barrier to their adoption in research studies and clinical settings. In this paper, we review and discuss three models used to make sense of longitudinal data: routines, rhythms and stability metrics. We present the challenges associated with the processing and analysis of longitudinal wearable sensor data, with a special focus on how to handle the different temporal dynamics at various granularities. We then discuss current limitations and identify directions for future work. This review is essential to the advancement of computational modeling and analysis of longitudinal sensor data for pervasive healthcare.
Segment Anything for Videos: A Systematic Survey
Zhang, Chunhui, Cui, Yawen, Lin, Weilin, Huang, Guanjie, Rong, Yan, Liu, Li, Shan, Shiguang
The recent wave of foundation models has witnessed tremendous success in computer vision (CV) and beyond, with the segment anything model (SAM) having sparked a passion for exploring task-agnostic visual foundation models. Empowered by its remarkable zero-shot generalization, SAM is currently challenging numerous traditional paradigms in CV, delivering extraordinary performance not only in various image segmentation and multi-modal segmentation (\eg, text-to-mask) tasks, but also in the video domain. Additionally, the latest released SAM 2 is once again sparking research enthusiasm in the realm of promptable visual segmentation for both images and videos. However, existing surveys mainly focus on SAM in various image processing tasks, a comprehensive and in-depth review in the video domain is notably absent. To address this gap, this work conducts a systematic review on SAM for videos in the era of foundation models. As the first to review the progress of SAM for videos, this work focuses on its applications to various tasks by discussing its recent advances, and innovation opportunities of developing foundation models on broad applications. We begin with a brief introduction to the background of SAM and video-related research domains. Subsequently, we present a systematic taxonomy that categorizes existing methods into three key areas: video understanding, video generation, and video editing, analyzing and summarizing their advantages and limitations. Furthermore, comparative results of SAM-based and current state-of-the-art methods on representative benchmarks, as well as insightful analysis are offered. Finally, we discuss the challenges faced by current research and envision several future research directions in the field of SAM for video and beyond.
SSRFlow: Semantic-aware Fusion with Spatial Temporal Re-embedding for Real-world Scene Flow
Lu, Zhiyang, Chen, Qinghan, Yuan, Zhimin, Cheng, Ming
Scene flow, which provides the 3D motion field of the first frame from two consecutive point clouds, is vital for dynamic scene perception. However, contemporary scene flow methods face three major challenges. Firstly, they lack global flow embedding or only consider the context of individual point clouds before embedding, leading to embedded points struggling to perceive the consistent semantic relationship of another frame. To address this issue, we propose a novel approach called Dual Cross Attentive (DCA) for the latent fusion and alignment between two frames based on semantic contexts. This is then integrated into Global Fusion Flow Embedding (GF) to initialize flow embedding based on global correlations in both contextual and Euclidean spaces. Secondly, deformations exist in non-rigid objects after the warping layer, which distorts the spatiotemporal relation between the consecutive frames. For a more precise estimation of residual flow at next-level, the Spatial Temporal Re-embedding (STR) module is devised to update the point sequence features at current-level. Lastly, poor generalization is often observed due to the significant domain gap between synthetic and LiDAR-scanned datasets. We leverage novel domain adaptive losses to effectively bridge the gap of motion inference from synthetic to real-world. Experiments demonstrate that our approach achieves state-of-the-art (SOTA) performance across various datasets, with particularly outstanding results in real-world LiDAR-scanned situations. Our code will be released upon publication.
Affective Computing in the Era of Large Language Models: A Survey from the NLP Perspective
Zhang, Yiqun, Yang, Xiaocui, Xu, Xingle, Gao, Zeran, Huang, Yijie, Mu, Shiyi, Feng, Shi, Wang, Daling, Zhang, Yifei, Song, Kaisong, Yu, Ge
Affective Computing (AC), integrating computer science, psychology, and cognitive science knowledge, aims to enable machines to recognize, interpret, and simulate human emotions.To create more value, AC can be applied to diverse scenarios, including social media, finance, healthcare, education, etc. Affective Computing (AC) includes two mainstream tasks, i.e., Affective Understanding (AU) and Affective Generation (AG). Fine-tuning Pre-trained Language Models (PLMs) for AU tasks has succeeded considerably. However, these models lack generalization ability, requiring specialized models for specific tasks. Additionally, traditional PLMs face challenges in AG, particularly in generating diverse and emotionally rich responses. The emergence of Large Language Models (LLMs), such as the ChatGPT series and LLaMA models, brings new opportunities and challenges, catalyzing a paradigm shift in AC. LLMs possess capabilities of in-context learning, common sense reasoning, and advanced sequence generation, which present unprecedented opportunities for AU. To provide a comprehensive overview of AC in the LLMs era from an NLP perspective, we summarize the development of LLMs research in this field, aiming to offer new insights. Specifically, we first summarize the traditional tasks related to AC and introduce the preliminary study based on LLMs. Subsequently, we outline the relevant techniques of popular LLMs to improve AC tasks, including Instruction Tuning and Prompt Engineering. For Instruction Tuning, we discuss full parameter fine-tuning and parameter-efficient methods such as LoRA, P-Tuning, and Prompt Tuning. In Prompt Engineering, we examine Zero-shot, Few-shot, Chain of Thought (CoT), and Agent-based methods for AU and AG. To clearly understand the performance of LLMs on different Affective Computing tasks, we further summarize the existing benchmarks and evaluation methods.
An evidence-based methodology for human rights impact assessment (HRIA) in the development of AI data-intensive systems
Mantelero, Alessandro, Esposito, Maria Samantha
Different approaches have been adopted in addressing the challenges of Artificial Intelligence (AI), some centred on personal data and others on ethics, respectively narrowing and broadening the scope of AI regulation. This contribution aims to demonstrate that a third way is possible, starting from the acknowledgement of the role that human rights can play in regulating the impact of data-intensive systems. The focus on human rights is neither a paradigm shift nor a mere theoretical exercise. Through the analysis of more than 700 decisions and documents of the data protection authorities of six countries, we show that human rights already underpin the decisions in the field of data use. Based on empirical analysis of this evidence, this work presents a methodology and a model for a Human Rights Impact Assessment (HRIA). The methodology and related assessment model are focused on AI applications, whose nature and scale require a proper contextualisation of HRIA methodology. Moreover, the proposed models provide a more measurable approach to risk assessment which is consistent with the regulatory proposals centred on risk thresholds. The proposed methodology is tested in concrete case-studies to prove its feasibility and effectiveness. The overall goal is to respond to the growing interest in HRIA, moving from a mere theoretical debate to a concrete and context-specific implementation in the field of data-intensive applications based on AI.
Survey of Design Paradigms for Social Robots
Frieske, Rita, Mo, Xiaoyu, Fang, Yini, Nieles, Jay, Shi, Bertram E.
The demand for social robots in fields like healthcare, education, and entertainment increases due to their emotional adaptation features. These robots leverage multimodal communication, incorporating speech, facial expressions, and gestures to enhance user engagement and emotional support. The understanding of design paradigms of social robots is obstructed by the complexity of the system and the necessity to tune it to a specific task. This article provides a structured review of social robot design paradigms, categorizing them into cognitive architectures, role design models, linguistic models, communication flow, activity system models, and integrated design models. By breaking down the articles on social robot design and application based on these paradigms, we highlight the strengths and areas for improvement in current approaches. We further propose our original integrated design model that combines the most important aspects of the design of social robots. Our approach shows the importance of integrating operational, communicational, and emotional dimensions to create more adaptive and empathetic interactions between robots and humans.
Learn by Selling: Equipping Large Language Models with Product Knowledge for Context-Driven Recommendations
Anand, Sarthak, Jiang, Yutong, Kokaia, Giorgi
In contrast, LLMs, with their ability to understand nuances of language and context, offer a promising solution The rapid evolution of large language models (LLMs) has opened to overcome these limitations. However, a crucial prerequisite up new possibilities for applications such as context-driven product for LLMs to excel in product recommendation is their possession of recommendations. However, the effectiveness of these models in comprehensive knowledge about the entire inventory of products this context is heavily reliant on their comprehensive understanding available for sale. of the product inventory. This paper presents a novel approach In this paper, we propose a novel approach to equip LLMs with to equipping LLMs with product knowledge by training them to respond product knowledge by training them to generate contextual responses contextually to synthetic search queries that include product to synthetic search queries containing product IDs.
Robust Load Prediction of Power Network Clusters Based on Cloud-Model-Improved Transformer
Jiang, Cheng, Lu, Gang, Ma, Xue, Wu, Di
Load data from power network clusters indicates economic development in each area, crucial for predicting regional trends and guiding power enterprise decisions. The Transformer model, a leading method for load prediction, faces challenges modeling historical data due to variables like weather, events, festivals, and data volatility. To tackle this, the cloud model's fuzzy feature is utilized to manage uncertainties effectively. Presenting an innovative approach, the Cloud Model Improved Transformer (CMIT) method integrates the Transformer model with the cloud model utilizing the particle swarm optimization algorithm, with the aim of achieving robust and precise power load predictions. Through comparative experiments conducted on 31 real datasets within a power network cluster, it is demonstrated that CMIT significantly surpasses the Transformer model in terms of prediction accuracy, thereby highlighting its effectiveness in enhancing forecasting capabilities within the power network cluster sector.