Overview
Towards Edge General Intelligence via Large Language Models: Opportunities and Challenges
Chen, Handi, Deng, Weipeng, Yang, Shuo, Xu, Jinfeng, Jiang, Zhihan, Ngai, Edith C. H., Liu, Jiangchuan, Liu, Xue
Edge Intelligence (EI) has been instrumental in delivering real-time, localized services by leveraging the computational capabilities of edge networks. The integration of Large Language Models (LLMs) empowers EI to evolve into the next stage: Edge General Intelligence (EGI), enabling more adaptive and versatile applications that require advanced understanding and reasoning capabilities. However, systematic exploration in this area remains insufficient. This survey delineates the distinctions between EGI and traditional EI, categorizing LLM-empowered EGI into three conceptual systems: centralized, hybrid, and decentralized. For each system, we detail the framework designs and review existing implementations. Furthermore, we evaluate the performance and throughput of various Small Language Models (SLMs) that are more suitable for development on edge devices. This survey provides researchers with a comprehensive vision of EGI, offering insights into its vast potential and establishing a foundation for future advancements in this rapidly evolving field.
Multi-view Fuzzy Graph Attention Networks for Enhanced Graph Learning
Xing, Jinming, Luo, Dongwen, Cheng, Qisen, Xue, Chang, Xing, Ruilin
Fuzzy Graph Attention Network (FGAT), which combines Fuzzy Rough Sets and Graph Attention Networks, has shown promise in tasks requiring robust graph-based learning. However, existing models struggle to effectively capture dependencies from multiple perspectives, limiting their ability to model complex data. To address this gap, we propose the Multi-view Fuzzy Graph Attention Network (MFGAT), a novel framework that constructs and aggregates multi-view information using a specially designed Transformation Block. This block dynamically transforms data from multiple aspects and aggregates the resulting representations via a weighted sum mechanism, enabling comprehensive multi-view modeling. The aggregated information is fed into FGAT to enhance fuzzy graph convolutions. Additionally, we introduce a simple yet effective learnable global pooling mechanism for improved graph-level understanding. Extensive experiments on graph classification tasks demonstrate that MFGAT outperforms state-of-the-art baselines, underscoring its effectiveness and versatility.
LegalAgentBench: Evaluating LLM Agents in Legal Domain
Li, Haitao, Chen, Junjie, Yang, Jingli, Ai, Qingyao, Jia, Wei, Liu, Youfeng, Lin, Kai, Wu, Yueyue, Yuan, Guozhi, Hu, Yiran, Wang, Wuyue, Liu, Yiqun, Huang, Minlie
With the increasing intelligence and autonomy of LLM agents, their potential applications in the legal domain are becoming increasingly apparent. However, existing general-domain benchmarks cannot fully capture the complexity and subtle nuances of real-world judicial cognition and decision-making. Therefore, we propose LegalAgentBench, a comprehensive benchmark specifically designed to evaluate LLM Agents in the Chinese legal domain. LegalAgentBench includes 17 corpora from real-world legal scenarios and provides 37 tools for interacting with external knowledge. We designed a scalable task construction framework and carefully annotated 300 tasks. These tasks span various types, including multi-hop reasoning and writing, and range across different difficulty levels, effectively reflecting the complexity of real-world legal scenarios. Moreover, beyond evaluating final success, LegalAgentBench incorporates keyword analysis during intermediate processes to calculate progress rates, enabling more fine-grained evaluation. We evaluated eight popular LLMs, highlighting the strengths, limitations, and potential areas for improvement of existing models and methods. LegalAgentBench sets a new benchmark for the practical application of LLMs in the legal domain, with its code and data available at \url{https://github.com/CSHaitao/LegalAgentBench}.
Survey on Abstractive Text Summarization: Dataset, Models, and Metrics
Nnadi, Gospel Ozioma, Bertini, Flavio
Readers and scholars often desire a concise summary (Too Long; Didn't Read - TL;DR) of texts to effectively prioritize information. However, creating document summaries is mentally taxing and time-consuming, especially considering the overwhelming volume of documents produced annually, as depicted in Figure 1 by [2], Figure 2, [3] reported over 100,000 scientific articles on the Corona virus pandemic in 2020, though these articles contain brief abstracts of the article, the sheer volume poses challenges for researchers and medical professionals in quickly extracting relevant knowledge on a specific topic. An automatically generated multi-document summarization could be valuable, providing readers with essential information and reducing the need to access original files unless refinement is necessary. Text summarization has garnered significant research attention, proving useful in search engines, news clustering, timeline generation, and various other applications. The objective of text summarization is to create a brief, coherent, factually consistent, and readable document that retains the essential information from the source document, whether it is a single or multi-document. In Single Document Summarization (SDS) only one input document is used, eliminating the need for additional processing to assess relationships between inputs. This method is suitable for summarizing standalone documents such as emails, legal contracts, financial reports and so on. The primary goal of Multi Document Summarization (MDS) is to gather information from several texts addressing the same topic, often composed at different times or representing diverse perspectives. The overarching objective is to produce information reports that are both succinct and comprehensive, consolidating varied opinions from documents that explore a topic through multiple viewpoints.
Semantic Web: Past, Present, and Future
Scherp, Ansgar, Groener, Gerd, Škoda, Petr, Hose, Katja, Vidal, Maria-Esther
Ever since the vision was formulated, the Semantic Web has inspired many generations of innovations. Semantic technologies have been used to share vast amounts of information on the Web, enhance them with semantics to give them meaning, and enable inference and reasoning on them. Throughout the years, semantic technologies, and in particular knowledge graphs, have been used in search engines, data integration, enterprise settings, and machine learning. In this paper, we recap the classical concepts and foundations of the Semantic Web as well as modern and recent concepts and applications, building upon these foundations. The classical topics we cover include knowledge representation, creating and validating knowledge on the Web, reasoning and linking, and distributed querying. We enhance this classical view of the so-called ``Semantic Web Layer Cake'' with an update of recent concepts that include provenance, security and trust, as well as a discussion of practical impacts from industry-led contributions. We conclude with an outlook on the future directions of the Semantic Web.
DragonVerseQA: Open-Domain Long-Form Context-Aware Question-Answering
Lahiri, Aritra Kumar, Hu, Qinmin Vivian
This paper proposes a novel approach to develop an open-domain and long-form Over-The-Top (OTT) Question-Answering (QA) dataset, DragonVerseQA, specifically oriented to the fantasy universe of "House of the Dragon" and "Game Of Thrones" TV series. Most existing QA datasets focus on short, fact-based answers sourced almost solely from Wikipedia articles, devoid of depth and contextual richness for sophisticated narrative understanding. We curate a dataset that combines full episode summaries sourced from HBO and fandom wiki websites, user reviews from sources like IMDb and Rotten Tomatoes, and high-quality, open-domain, legally admissible sources, and structured data from repositories like WikiData into one dataset. The dataset provides a multi-dimensional context, reflecting complex character dynamics and plot developments from these varied sources. That means, on equal footing, only after heavy data preprocessing and filtering methods will meaningful, non-spam unbiased reviews be available in this enriched dataset. The comprehensive insights are given through the long-form answers generated from this enriched context. This is what makes this valuable dataset for improving conversational AI, narrative analysis, sentiment analysis, summarization techniques, and relation extraction. A comparative analysis with state-of-the-art QA datasets such as SQuAD 2.0, TriviaQA, and Natural Questions brings to light the unique advantages of our dataset in terms of contextual complexity and answer length. Detailed reviews add layers to audience sentiment and narrative interpretation, raising the bar for domain-specific QA with a new quality benchmark. Our work also allows a deeper understanding of entertainment-industry content and opens the door to more knowledgeable and creative AI-driven interactions within digital media environments.
A Method for the Runtime Validation of AI-based Environment Perception in Automated Driving System
Aslam, Iqra, Buragohain, Abhishek, Bamal, Daniel, Aniculaesei, Adina, Zhang, Meng, Rausch, Andreas
Environment perception is a fundamental part of the dynamic driving task executed by Autonomous Driving Systems (ADS). Artificial Intelligence (AI)-based approaches have prevailed over classical techniques for realizing the environment perception. Current safety-relevant standards for automotive systems, International Organization for Standardization (ISO) 26262 and ISO 21448, assume the existence of comprehensive requirements specifications. These specifications serve as the basis on which the functionality of an automotive system can be rigorously tested and checked for compliance with safety regulations. However, AI-based perception systems do not have complete requirements specification. Instead, large datasets are used to train AI-based perception systems. This paper presents a function monitor for the functional runtime monitoring of a two-folded AI-based environment perception for ADS, based respectively on camera and LiDAR sensors. To evaluate the applicability of the function monitor, we conduct a qualitative scenario-based evaluation in a controlled laboratory environment using a model car. The evaluation results then are discussed to provide insights into the monitor's performance and its suitability for real-world applications.
From Creation to Curriculum: Examining the role of generative AI in Arts Universities
The age of Artificial Intelligence (AI) is marked by its transformative "generative" capabilities, distinguishing it from prior iterations. This burgeoning characteristic of AI has enabled it to produce new and original content, inherently showcasing its creative prowess. This shift challenges and requires a recalibration in the realm of arts education, urging a departure from established pedagogies centered on human-driven image creation. The paper meticulously addresses the integration of AI tools, with a spotlight on Stable Diffusion (SD), into university arts curricula. Drawing from practical insights gathered from workshops conducted in July 2023, which culminated in an exhibition of AI-driven artworks, the paper aims to provide a roadmap for seamlessly infusing these tools into academic settings. Given their recent emergence, the paper delves into a comprehensive overview of such tools, emphasizing the intricate dance between artists, developers, and researchers in the open-source AI art world. This discourse extends to the challenges and imperatives faced by educational institutions. It presents a compelling case for the swift adoption of these avant-garde tools, underscoring the paramount importance of equipping students with the competencies required to thrive in an AI-augmented artistic landscape.
A Similarity-Based Oversampling Method for Multi-label Imbalanced Text Data
Karaman, Ismail Hakki, Koksal, Gulser, Eriskin, Levent, Salihoglu, Salih
In real-world applications, as data availability increases, obtaining labeled data for machine learning (ML) projects remains challenging due to the high costs and intensive efforts required for data annotation. Many ML projects, particularly those focused on multi-label classification, also grapple with data imbalance issues, where certain classes may lack sufficient data to train effective classifiers. This study introduces and examines a novel oversampling method for multi-label text classification, designed to address performance challenges associated with data imbalance. The proposed method identifies potential new samples from unlabeled data by leveraging similarity measures between instances. By iteratively searching the unlabeled dataset, the method locates instances similar to those in underrepresented classes and evaluates their contribution to classifier performance enhancement. Instances that demonstrate performance improvement are then added to the labeled dataset. Experimental results indicate that the proposed approach effectively enhances classifier performance post-oversampling.
HammerBench: Fine-Grained Function-Calling Evaluation in Real Mobile Device Scenarios
Wang, Jun, Zhou, Jiamu, Wen, Muning, Mo, Xiaoyun, Zhang, Haoyu, Lin, Qiqiang, Jin, Cheng, Wang, Xihuai, Zhang, Weinan, Peng, Qiuying, Wang, Jun
Evaluating the capabilities of large language models (LLMs) in human-LLM interactions remains challenging due to the inherent complexity and openness of dialogue processes. This paper introduces HammerBench, a novel benchmarking framework designed to assess the function-calling ability of LLMs more effectively in such interactions. We model a wide range of real-world user scenarios on mobile devices, encompassing imperfect instructions, diverse question-answer trajectories, intent/argument shifts, and the use of external individual information through pronouns. To construct the corresponding datasets, we propose a comprehensive pipeline that involves LLM-generated data and multiple rounds of human validation, ensuring high data quality. Additionally, we decompose the conversations into function-calling snapshots, enabling a fine-grained evaluation of each turn. We evaluate several popular LLMs using HammerBench and highlight different performance aspects. Our empirical findings reveal that errors in parameter naming constitute the primary factor behind conversation failures across different data types.