Expert Systems
RelBERT: Embedding Relations with Language Models
Ushio, Asahi, Camacho-Collados, Jose, Schockaert, Steven
Many applications need access to background knowledge about how different concepts and entities are related. Although Knowledge Graphs (KG) and Large Language Models (LLM) can address this need to some extent, KGs are inevitably incomplete and their relational schema is often too coarse-grained, while LLMs are inefficient and difficult to control. As an alternative, we propose to extract relation embeddings from relatively small language models. In particular, we show that masked language models such as RoBERTa can be straightforwardly fine-tuned for this purpose, using only a small amount of training data. The resulting model, which we call RelBERT, captures relational similarity in a surprisingly fine-grained way, allowing us to set a new state-of-the-art in analogy benchmarks. Crucially, RelBERT is capable of modelling relations that go well beyond what the model has seen during training. For instance, we obtained strong results on relations between named entities with a model that was only trained on lexical relations between concepts, and we observed that RelBERT can recognise morphological analogies despite not being trained on such examples. Overall, we find that RelBERT significantly outperforms strategies based on prompting language models that are several orders of magnitude larger, including recent GPT-based models and open source models.
A Survey on Image-text Multimodal Models
Guo, Ruifeng, Wei, Jingxuan, Sun, Linzhuang, Yu, Bihui, Chang, Guiyong, Liu, Dawei, Zhang, Sibo, Yao, Zhengbing, Xu, Mingjun, Bu, Liping
Amidst the evolving landscape of artificial intelligence, the convergence of visual and textual information has surfaced as a crucial frontier, leading to the advent of image-text multimodal models. This paper provides a comprehensive review of the evolution and current state of image-text multimodal models, exploring their application value, challenges, and potential research trajectories. Initially, we revisit the basic concepts and developmental milestones of these models, introducing a novel classification that segments their evolution into three distinct phases, based on their time of introduction and subsequent impact on the discipline. Furthermore, based on the tasks' significance and prevalence in the academic landscape, we propose a categorization of the tasks associated with image-text multimodal models into five major types, elucidating the recent progress and key technologies within each category. Despite the remarkable accomplishments of these models, numerous challenges and issues persist. This paper delves into the inherent challenges and limitations of image-text multimodal models, fostering the exploration of prospective research directions. Our objective is to offer an exhaustive overview of the present research landscape of image-text multimodal models and to serve as a valuable reference for future scholarly endeavors. We extend an invitation to the broader community to collaborate in enhancing the image-text multimodal model community, accessible at: \href{https://github.com/i2vec/A-survey-on-image-text-multimodal-models}{https://github.com/i2vec/A-survey-on-image-text-multimodal-models}.
EdgeFD: An Edge-Friendly Drift-Aware Fault Diagnosis System for Industrial IoT
Jiao, Chen, Fengjian, Mao, Zuohong, Lv, Jianhua, Tang
Recent transfer learning (TL) approaches in industrial intelligent fault diagnosis (FD) mostly follow the "pre-train and fine-tuning" paradigm to address data drift, which emerges from variable working conditions. However, we find that this approach is prone to the phenomenon known as catastrophic forgetting. Furthermore, performing frequent models fine-tuning on the resource-constrained edge nodes can be computationally expensive and unnecessary, given the excellent transferability demonstrated by existing models. In this work, we propose the Drift-Aware Weight Consolidation (DAWC), a method optimized for edge deployments, mitigating the challenges posed by frequent data drift in the industrial Internet of Things (IIoT). DAWC efficiently manages multiple data drift scenarios, minimizing the need for constant model fine-tuning on edge devices, thereby conserving computational resources. By detecting drift using classifier confidence and estimating parameter importance with the Fisher Information Matrix, a tool that measures parameter sensitivity in probabilistic models, we introduce a drift detection module and a continual learning module to gradually equip the FD model with powerful generalization capabilities. Experimental results demonstrate that our proposed DAWC achieves superior performance compared to existing techniques while also ensuring compatibility with edge computing constraints. Additionally, we have developed a comprehensive diagnosis and visualization platform.
Counterfactual Explainer Framework for Deep Reinforcement Learning Models Using Policy Distillation
Samadi, Amir, Koufos, Konstantinos, Debattista, Kurt, Dianati, Mehrdad
Deep Reinforcement Learning (DRL) has demonstrated promising capability in solving complex control problems. However, DRL applications in safety-critical systems are hindered by the inherent lack of robust verification techniques to assure their performance in such applications. One of the key requirements of the verification process is the development of effective techniques to explain the system functionality, i.e., why the system produces specific results in given circumstances. Recently, interpretation methods based on the Counterfactual (CF) explanation approach have been proposed to address the problem of explanation in DRLs. This paper proposes a novel CF explanation framework to explain the decisions made by a black-box DRL. To evaluate the efficacy of the proposed explanation framework, we carried out several experiments in the domains of automated driving systems and Atari Pong game. Our analysis demonstrates that the proposed framework generates plausible and meaningful explanations for various decisions made by deep underlying DRLs. Source codes are available at: \url{https://github.com/Amir-Samadi/Counterfactual-Explanation}
Artificial Intelligence Index Report 2023
Maslej, Nestor, Fattorini, Loredana, Brynjolfsson, Erik, Etchemendy, John, Ligett, Katrina, Lyons, Terah, Manyika, James, Ngo, Helen, Niebles, Juan Carlos, Parli, Vanessa, Shoham, Yoav, Wald, Russell, Clark, Jack, Perrault, Raymond
Welcome to the sixth edition of the AI Index Report! This year, the report introduces more original data than any previous edition, including a new chapter on AI public opinion, a more thorough technical performance chapter, original analysis about large language and multimodal models, detailed trends in global AI legislation records, a study of the environmental impact of AI systems, and more. The AI Index Report tracks, collates, distills, and visualizes data related to artificial intelligence. Our mission is to provide unbiased, rigorously vetted, broadly sourced data in order for policymakers, researchers, executives, journalists, and the general public to develop a more thorough and nuanced understanding of the complex field of AI. The report aims to be the world's most credible and authoritative source for data and insights about AI.
Automatic Quality Assessment of Wikipedia Articles -- A Systematic Literature Review
Moás, Pedro Miguel, Lopes, Carla Teixeira
Wikipedia is the world's largest online encyclopedia, but maintaining article quality through collaboration is challenging. Wikipedia designed a quality scale, but with such a manual assessment process, many articles remain unassessed. We review existing methods for automatically measuring the quality of Wikipedia articles, identifying and comparing machine learning algorithms, article features, quality metrics, and used datasets, examining 149 distinct studies, and exploring commonalities and gaps in them. The literature is extensive, and the approaches follow past technological trends. However, machine learning is still not widely used by Wikipedia, and we hope that our analysis helps future researchers change that reality.
HUST bearing: a practical dataset for ball bearing fault diagnosis
Thuan, Nguyen Duc, Hong, Hoang Si
In this work, we introduce a practical dataset named HUST bearing, that provides a large set of vibration data on different ball bearings. This dataset contains 90 raw vibration data of 6 types of defects (inner crack, outer crack, ball crack, and their 2-combinations) on 5 types of bearing at 3 working conditions with the sample rate of 51,200 samples per second. We established the envelope analysis and order tracking analysis on the introduced dataset to allow an initial evaluation of the data. A number of classical machine learning classification methods are used to identify bearing faults of the dataset using features in different domains. The typical advanced unsupervised transfer learning algorithms also perform to observe the transferability of knowledge among parts of the dataset. The experimental results of examined methods on the dataset gain divergent accuracy up to 100% on classification task and 60-80% on unsupervised transfer learning task.
Knowledge Engineering for Wind Energy
Marykovskiy, Yuriy, Clark, Thomas, Day, Justin, Wiens, Marcus, Henderson, Charles, Quick, Julian, Abdallah, Imad, Sempreviva, Anna Maria, Calbimonte, Jean-Paul, Chatzi, Eleni, Barber, Sarah
To this end, vast amounts of data generated by various sources, including sensors and other monitoring systems, need to be effectively structured and represented in a way that can be easily understood and processed by both Artificial Intelligence (AI) systems and humans. The digitalisation of the wind energy sector is one of the key drivers for reducing costs and risks over the whole wind energy project life cycle [2]. The digitalisation process encompasses solutions such as digital twins, decision support systems and AI systems, some of which need to still be developed, in order to contribute to reducing operation and maintenance costs, for increasing the amount of energy delivered, as well as for maximising the efficiency of wind energy systems. In this context, the term Knowledge-Based Systems (KBS) refers to AI systems that formalize knowledge as rules, logical expressions, and conceptualisations [3, 4]. Such systems can be realised as AI-enabled digital twins or decision support systems that rely on databases of knowledge (also referred to as knowledge bases or knowledge graphs), which contain machine-readable facts, rules, and logics about a domain of interest, to assist with problem-solving and decision-making [5].
Knowledge Engineering using Large Language Models
Allen, Bradley P., Stork, Lise, Groth, Paul
Knowledge engineering is a discipline that focuses on the creation and maintenance of processes that generate and apply knowledge. Traditionally, knowledge engineering approaches have focused on knowledge expressed in formal languages. The emergence of large language models and their capabilities to effectively work with natural language, in its broadest sense, raises questions about the foundations and practice of knowledge engineering. Here, we outline the potential role of LLMs in knowledge engineering, identifying two central directions: 1) creating hybrid neuro-symbolic knowledge systems; and 2) enabling knowledge engineering in natural language. Additionally, we formulate key open research questions to tackle these directions.
Holistic Evaluation of Language Models
Liang, Percy, Bommasani, Rishi, Lee, Tony, Tsipras, Dimitris, Soylu, Dilara, Yasunaga, Michihiro, Zhang, Yian, Narayanan, Deepak, Wu, Yuhuai, Kumar, Ananya, Newman, Benjamin, Yuan, Binhang, Yan, Bobby, Zhang, Ce, Cosgrove, Christian, Manning, Christopher D., Ré, Christopher, Acosta-Navas, Diana, Hudson, Drew A., Zelikman, Eric, Durmus, Esin, Ladhak, Faisal, Rong, Frieda, Ren, Hongyu, Yao, Huaxiu, Wang, Jue, Santhanam, Keshav, Orr, Laurel, Zheng, Lucia, Yuksekgonul, Mert, Suzgun, Mirac, Kim, Nathan, Guha, Neel, Chatterji, Niladri, Khattab, Omar, Henderson, Peter, Huang, Qian, Chi, Ryan, Xie, Sang Michael, Santurkar, Shibani, Ganguli, Surya, Hashimoto, Tatsunori, Icard, Thomas, Zhang, Tianyi, Chaudhary, Vishrav, Wang, William, Li, Xuechen, Mai, Yifan, Zhang, Yuhui, Koreeda, Yuta
Language models (LMs) are becoming the foundation for almost all major language technologies, but their capabilities, limitations, and risks are not well understood. We present Holistic Evaluation of Language Models (HELM) to improve the transparency of language models. First, we taxonomize the vast space of potential scenarios (i.e. use cases) and metrics (i.e. desiderata) that are of interest for LMs. Then we select a broad subset based on coverage and feasibility, noting what's missing or underrepresented (e.g. question answering for neglected English dialects, metrics for trustworthiness). Second, we adopt a multi-metric approach: We measure 7 metrics (accuracy, calibration, robustness, fairness, bias, toxicity, and efficiency) for each of 16 core scenarios when possible (87.5% of the time). This ensures metrics beyond accuracy don't fall to the wayside, and that trade-offs are clearly exposed. We also perform 7 targeted evaluations, based on 26 targeted scenarios, to analyze specific aspects (e.g. reasoning, disinformation). Third, we conduct a large-scale evaluation of 30 prominent language models (spanning open, limited-access, and closed models) on all 42 scenarios, 21 of which were not previously used in mainstream LM evaluation. Prior to HELM, models on average were evaluated on just 17.9% of the core HELM scenarios, with some prominent models not sharing a single scenario in common. We improve this to 96.0%: now all 30 models have been densely benchmarked on the same core scenarios and metrics under standardized conditions. Our evaluation surfaces 25 top-level findings. For full transparency, we release all raw model prompts and completions publicly for further analysis, as well as a general modular toolkit. We intend for HELM to be a living benchmark for the community, continuously updated with new scenarios, metrics, and models.