Memory-Based Learning
Applying the Case Difference Heuristic to Learn Adaptations from Deep Network Features
Ye, Xiaomeng, Zhao, Ziwei, Leake, David, Wang, Xizi, Crandall, David
The case difference heuristic (CDH) approach is a knowledge-light method for learning case adaptation knowledge from the case base of a case-based reasoning system. Given a pair of cases, the CDH approach attributes the difference in their solutions to the difference in the problems they solve, and generates adaptation rules to adjust solutions accordingly when a retrieved case and new query have similar problem differences. As an alternative to learning adaptation rules, several researchers have applied neural networks to learn to predict solution differences from problem differences. Previous work on such approaches has assumed that the feature set describing problems is predefined. This paper investigates a two-phase process combining deep learning for feature extraction and neural network based adaptation learning from extracted features. Its performance is demonstrated in a regression task on an image data: predicting age given the image of a face. Results show that the combined process can successfully learn adaptation knowledge applicable to nonsymbolic differences in cases. The CBR system achieves slightly lower performance overall than a baseline deep network regressor, but better performance than the baseline on novel queries.
Monotonicity and Noise-Tolerance in Case-Based Reasoning with Abstract Argumentation (with Appendix)
Paulino-Passos, Guilherme, Toni, Francesca
Recently, abstract argumentation-based models of case-based reasoning ($AA{\text -} CBR$ in short) have been proposed, originally inspired by the legal domain, but also applicable as classifiers in different scenarios. However, the formal properties of $AA{\text -} CBR$ as a reasoning system remain largely unexplored. In this paper, we focus on analysing the non-monotonicity properties of a regular version of $AA{\text -} CBR$ (that we call $AA{\text -} CBR_{\succeq}$). Specifically, we prove that $AA{\text -} CBR_{\succeq}$ is not cautiously monotonic, a property frequently considered desirable in the literature. We then define a variation of $AA{\text -} CBR_{\succeq}$ which is cautiously monotonic. Further, we prove that such variation is equivalent to using $AA{\text -} CBR_{\succeq}$ with a restricted casebase consisting of all "surprising" and "sufficient" cases in the original casebase. As a by-product, we prove that this variation of $AA{\text -} CBR_{\succeq}$ is cumulative, rationally monotonic, and empowers a principled treatment of noise in "incoherent" casebases. Finally, we illustrate $AA{\text -} CBR$ and cautious monotonicity questions on a case study on the U.S. Trade Secrets domain, a legal casebase.
More Play and Less Prep: Flamel.AI Automates Role-Playing Games with IBM Watson
Alex Migitko started playing tabletop role-playing games (RPGs) 15 years ago. But as life got more demanding, he couldn't commit to the time needed for preparation and play, both as a game facilitator and player. Though passionate about gaming, he ultimately stopped. These "aging out" stories are all too common. Players fall in love with gaming because it provides such depth and breadth of creativity and escape.
Informed Machine Learning for Improved Similarity Assessment in Process-Oriented Case-Based Reasoning
Hoffmann, Maximilian, Bergmann, Ralph
Currently, Deep Learning (DL) components within a Case-Based Reasoning (CBR) application often lack the comprehensive integration of available domain knowledge. The trend within machine learning towards so-called Informed machine learning can help to overcome this limitation. In this paper, we therefore investigate the potential of integrating domain knowledge into Graph Neural Networks (GNNs) that are used for similarity assessment between semantic graphs within process-oriented CBR applications. We integrate knowledge in two ways: First, a special data representation and processing method is used that encodes structural knowledge about the semantic annotations of each graph node and edge. Second, the message-passing component of the GNNs is constrained by knowledge on legal node mappings. The evaluation examines the quality and training time of the extended GNNs, compared to the stock models. The results show that both extensions are capable of providing better quality, shorter training times, or in some configurations both advantages at once.
Quran Memorization Course. A Proven System To Do It Easy NOW
In this Course you will learn and gain 6 new habits. Each habit will make big change in your Memorization Ability. Many people who have taken this course before were able to memorize the whole holy Quran short Time. Even some of them were able to memorize the whole Quran in short Time. This course helped myself and when I noticed the amazing results, I have decided to do this course publicly to help million of Muslims around the world.
Zillow Group uses machine learning to improve Zestimate algorithm for changing market trends
Seattle real estate giant Zillow Group announced new tweaks to its Zestimate tool that provides home value data on more than 104 million properties. The company now uses machine learning-based neural networks and additional data that improve how quickly the algorithm reacts to market trends. Zillow said Zestimate's national median error rate for off-market homes is now 6.9%, an improvement of nearly a full percentage point. The median error rate for on-market homes is 1.9%. Using neural networks was a technique used in 2019 by the winners of the ZIllow Prize, a competition to improve the Zestimate.
An Exponential Improvement on the Memorization Capacity of Deep Threshold Networks
Rajput, Shashank, Sreenivasan, Kartik, Papailiopoulos, Dimitris, Karbasi, Amin
It is well known that modern deep neural networks are powerful enough to memorize datasets even when the labels have been randomized. Recently, Vershynin (2020) settled a long standing question by Baum (1988), proving that \emph{deep threshold} networks can memorize $n$ points in $d$ dimensions using $\widetilde{\mathcal{O}}(e^{1/\delta^2}+\sqrt{n})$ neurons and $\widetilde{\mathcal{O}}(e^{1/\delta^2}(d+\sqrt{n})+n)$ weights, where $\delta$ is the minimum distance between the points. In this work, we improve the dependence on $\delta$ from exponential to almost linear, proving that $\widetilde{\mathcal{O}}(\frac{1}{\delta}+\sqrt{n})$ neurons and $\widetilde{\mathcal{O}}(\frac{d}{\delta}+n)$ weights are sufficient. Our construction uses Gaussian random weights only in the first layer, while all the subsequent layers use binary or integer weights. We also prove new lower bounds by connecting memorization in neural networks to the purely geometric problem of separating $n$ points on a sphere using hyperplanes.
Rethink the Connections among Generalization, Memorization and the Spectral Bias of DNNs
Zhang, Xiao, Xiong, Haoyi, Wu, Dongrui
Over-parameterized deep neural networks (DNNs) with sufficient capacity to memorize random noise can achieve excellent generalization performance, challenging the bias-variance trade-off in classical learning theory. Recent studies claimed that DNNs first learn simple patterns and then memorize noise; some other works showed a phenomenon that DNNs have a spectral bias to learn target functions from low to high frequencies during training. However, we show that the monotonicity of the learning bias does not always hold: under the experimental setup of deep double descent, the high-frequency components of DNNs diminish in the late stage of training, leading to the second descent of the test error. Besides, we find that the spectrum of DNNs can be applied to indicating the second descent of the test error, even though it is calculated from the training set only.
Fundamental tradeoffs between memorization and robustness in random features and neural tangent regimes
This work studies the (non)robustness of two-layer neural networks in various high-dimensional linearized regimes. We establish fundamental trade-offs between memorization and robustness, as measured by the Sobolev-seminorm of the model w.r.t the data distribution, i.e the square root of the average squared $L_2$-norm of the gradients of the model w.r.t the its input. More precisely, if $n$ is the number of training examples, $d$ is the input dimension, and $k$ is the number of hidden neurons in a two-layer neural network, we prove for a large class of activation functions that, if the model memorizes even a fraction of the training, then its Sobolev-seminorm is lower-bounded by (i) $\sqrt{n}$ in case of infinite-width random features (RF) or neural tangent kernel (NTK) with $d \gtrsim n$; (ii) $\sqrt{n}$ in case of finite-width RF with proportionate scaling of $d$ and $k$; and (iii) $\sqrt{n/k}$ in case of finite-width NTK with proportionate scaling of $d$ and $k$. Moreover, all of these lower-bounds are tight: they are attained by the min-norm / least-squares interpolator (when $n$, $d$, and $k$ are in the appropriate interpolating regime). All our results hold as soon as data is log-concave isotropic, and there is label-noise, i.e the target variable is not a deterministic function of the data / features. We empirically validate our theoretical results with experiments. Accidentally, these experiments also reveal for the first time, (iv) a multiple-descent phenomenon in the robustness of the min-norm interpolator.
Exploring Memorization in Adversarial Training
Dong, Yinpeng, Xu, Ke, Yang, Xiao, Pang, Tianyu, Deng, Zhijie, Su, Hang, Zhu, Jun
It is well known that deep learning models have a propensity for fitting the entire training set even with random labels, which requires memorization of every training sample. In this paper, we investigate the memorization effect in adversarial training (AT) for promoting a deeper understanding of capacity, convergence, generalization, and especially robust overfitting of adversarially trained classifiers. We first demonstrate that deep networks have sufficient capacity to memorize adversarial examples of training data with completely random labels, but not all AT algorithms can converge under the extreme circumstance. Our study of AT with random labels motivates further analyses on the convergence and generalization of AT. We find that some AT methods suffer from a gradient instability issue, and the recently suggested complexity measures cannot explain robust generalization by considering models trained on random labels. Furthermore, we identify a significant drawback of memorization in AT that it could result in robust overfitting. We then propose a new mitigation algorithm motivated by detailed memorization analyses. Extensive experiments on various datasets validate the effectiveness of the proposed method.