Memory-Based Learning
Does Learning Require Memorization? A Short Tale about a Long Tail
State-of-the-art results on image recognition tasks are achieved using over-parameterized learning algorithms that (nearly) perfectly fit the training set. This phenomenon is referred to as data interpolation or, informally, as memorization of the training data. The question of why such algorithms generalize well to unseen data is not adequately addressed by the standard theoretical frameworks and, as a result, significant theoretical and experimental effort has been devoted to understanding the properties of such algorithms. We provide a simple and generic model for prediction problems in which interpolating the dataset is necessary for achieving close-to-optimal generalization error. The model is motivated and supported by the results of several recent empirical works. In our model, data is sampled from a mixture of subpopulations and the frequencies of these subpopulations are chosen from some prior. The model allows to quantify the effect of not fitting the training data on the generalization performance of the learned classifier and demonstrates that memorization is necessary whenever frequencies are long-tailed. Image and text data are known to follow such distributions and therefore our results establish a formal link between these empirical phenomena. To the best of our knowledge, this is the first general framework that demonstrates statistical benefits of plain memorization for learning. Our results also have concrete implications for the cost of ensuring differential privacy in learning.
Approaching Adaptation Guided Retrieval in Case-Based Reasoning through Inference in Undirected Graphical Models
In Case-Based Reasoning, when the similarity assumption does not hold, the retrieval of a set of cases structurally similar to the query does not guarantee to get a reusable or revisable solution. Knowledge about the adaptability of solutions has to be exploited, in order to define a method for adaptation-guided retrieval. We propose a novel approach to address this problem, where knowledge about the adaptability of the solutions is captured inside a metric Markov Random Field (MRF). Nodes of the MRF represent cases and edges connect nodes whose solutions are close in the solution space. States of the nodes represent different adaptation levels with respect to the potential query. Metric-based potentials enforce connected nodes to share the same state, since cases having similar solutions should have the same adaptability level with respect to the query. The main goal is to enlarge the set of potentially adaptable cases that are retrieved without significantly sacrificing the precision and accuracy of retrieval. We will report on some experiments concerning a retrieval architecture where a simple kNN retrieval (on the problem description) is followed by a further retrieval step based on MRF inference.
Similarity Measure Development for Case-Based Reasoning- A Data-driven Approach
Verma, Deepika, Bach, Kerstin, Mork, Paul Jarle
In this paper, we demonstrate a data-driven methodology for modelling the local similarity measures of various attributes in a dataset. We analyse the spread in the numerical attributes and estimate their distribution using polynomial function to showcase an approach for deriving strong initial value ranges of numerical attributes and use a non-overlapping distribution for categorical attributes such that the entire similarity range [0,1] is utilized. We use an open source dataset for demonstrating modelling and development of the similarity measures and will present a case-based reasoning (CBR) system that can be used to search for the most relevant similar cases.
The Twin-System Approach as One Generic Solution for XAI: An Overview of ANN-CBR Twins for Explaining Deep Learning
Keane, Mark T., Kenny, Eoin M.
The notion of twin systems is proposed to address the eXplainable AI (XAI) problem, where an uninterpretable black-box system is mapped to a white-box 'twin' that is more interpretable. In this short paper, we overview very recent work that advances a generic solution to the XAI problem, the so called twin system approach. The most popular twinning in the literature is that between an Artificial Neural Networks (ANN ) as a black box and Case Based Reasoning (CBR) system as a white-box, where the latter acts as an interpretable proxy for the former. We outline how recent work reviving this idea has applied it to deep learning methods. Furthermore, we detail the many fruitful directions in which this work may be taken; such as, determining the most (i) accurate feature-weighting methods to be used, (ii) appropriate deployments for explanatory cases, (iii) useful cases of explanatory value to users.
Prediction of Construction Cost for Field Canals Improvement Projects in Egypt
Field canals improvement projects (FCIPs) are one of the ambitious projects constructed to save fresh water. To finance this project, Conceptual cost models are important to accurately predict preliminary costs at the early stages of the project. The first step is to develop a conceptual cost model to identify key cost drivers affecting the project. Therefore, input variables selection remains an important part of model development, as the poor variables selection can decrease model precision. The study discovered the most important drivers of FCIPs based on a qualitative approach and a quantitative approach. Subsequently, the study has developed a parametric cost model based on machine learning methods such as regression methods, artificial neural networks, fuzzy model and case-based reasoning.
How Case Based Reasoning Explained Neural Networks: An XAI Survey of Post-Hoc Explanation-by-Example in ANN-CBR Twins
This paper proposes a theoretical analysis of one approach to the eXplainable AI (XAI) problem, using post-hoc explanation-by-example, that relies on the twinning of artificial neural networks (ANNs) with case-based reasoning (CBR) systems; so-called ANN-CBR twins. It surveys these systems to advance a new theoretical interpretation of previous work and define a road map for CBR's further role in XAI. A systematic survey of 1102 papers was conducted to identify a fragmented literature on this topic and trace its influence to more recent work involving deep neural networks (DNNs). The twin-system approach is advanced as one possible coherent, generic solution to the XAI problem. The paper concludes by road-mapping future directions for this XAI solution, considering (i) further tests of feature-weighting techniques, (ii) how explanatory cases might be deployed (e.g., in counterfactuals, a fortori cases), and (iii) the unwelcome, much-ignored issue of user evaluation.
Jaiswal
Jaiswal, Amar (Norwegian University of Science and Technology) | Bach, Kerstin (Norwegian University of Science and Technology) | Meisingset, Ingebrigt (Norwegian University of Science and Technology) | Vasseljen, Ottar (Norwegian University of Science and Technology)
This paper presents a case-based reasoning (CBR) application for discovering similar patients with non-specific musculoskeletal disorders (MSDs) and recommending treatment plans using previous experiences. From a medical perspective, MSD is a complex disorder as its cause is often bounded to a combination of physiological and psychological factors. Likewise, the features describing the condition and outcome measures vary throughout studies. However, healthcare professionals in the field work in an experience-based way, therefore we chose CBR as the core methodology for developing a decision support system for physiotherapists which would assist them in the process of their co-decision making and treatment planning. In this paper, we focus on case representation and similarity modeling for the non-specific MSD patient data as well as we conducted initial experiments on comparing patient profiles.
Case Representation and Similarity Modeling for Non-Specific Musculoskeletal Disorders - a Case-Based Reasoning Approach
Jaiswal, Amar (Norwegian University of Science and Technology) | Bach, Kerstin (Norwegian University of Science and Technology) | Meisingset, Ingebrigt (Norwegian University of Science and Technology) | Vasseljen, Ottar (Norwegian University of Science and Technology)
This paper presents a case-based reasoning (CBR) application for discovering similar patients with non-specific musculoskeletal disorders (MSDs) and recommending treatment plans using previous experiences. From a medical perspective, MSD is a complex disorder as its cause is often bounded to a combination of physiological and psychological factors. Likewise, the features describing the condition and outcome measures vary throughout studies. However, healthcare professionals in the field work in an experience-based way, therefore we chose CBR as the core methodology for developing a decision support system for physiotherapists which would assist them in the process of their co-decision making and treatment planning. In this paper, we focus on case representation and similarity modeling for the non-specific MSD patient data as well as we conducted initial experiments on comparing patient profiles.
Exploiting Markov Random Fields to Enhance Retrieval in Case-Based Reasoning
Portinale, Luigi (Universitá del Piemonte Orientale)
The similarity assumption in Case-Based Reasoning (similar problems have similar solutions) has been questioned by several researchers. If knowledge about the adaptability of solutions is available, it can be exploited in order to guide retrieval. Several approaches have been proposed in this context, often assuming a similarity or cost measure defined over the solution space. In this paper, we propose a novel approach where the adaptability of the solutions is captured inside a metric Markov Random Field (MRF). Each case is represented as a node in the MRF, and edges connect cases whose solutions are close in the solution space. States of the nodes represent the adaptability effort with respect to the query. Potentals are defined to enforce connected nodes to share the same state; this models the fact that cases having similar solutions should have the same adaptability effort with respect to the query. The main goal is to enlarge the set of potentially adaptable cases that are retrieved (the recall) without significantly sacrificing the precision of retrieval. We will report on some experiments concerning a retrieval architecture where a simple kNN retrieval is followed by a further retrieval step based on MRF inference.
What Is the Next Step? Supporting Architectural Room Configuration Process with Case-Based Reasoning and Recurrent Neural Networks
Eisenstadt, Viktor (University of Hildesheim) | Althoff, Klaus-Dieter (University of Hildesheim)
This paper presents the first results of the research into AI-based support of the room configuration process during the early design phases in architecture. Room configuration (also: room layout or space layout) is an essential stage of the initial design phase: its results are crucial for user-friendliness and success of the planned utilization of the architectural object. Our approach takes into account different possible actions of the configuration process, such as adding, removing, or (re)assigning of the room type. Its mode of operation is based on specific process chain clusters, where each cluster represents a contextual subset of previous configuration steps and provides a recurrent neural network trained on this cluster data only to suggest the next step, and a case base that is used to determine if the current process chain belongs to this cluster. The most similar cluster then tries to suggest the next step of the process. The approach is implemented in a distributed CBR framework for support of early conceptual design in architecture and was evaluated with a high number of process chain queries to prove its general suitability.