AI deals with the area of developing computing systems which are capable of performing tasks that humans are very good at, for example recognising objects, recognising and making sense of speech, and decision making in a constrained environment. Narrow AI: the field of AI where the machine is designed to perform a single task and the machine gets very good at performing that particular task. However, once the machine is trained, it does not generalise to unseen domains. This is the form of AI that we have today, for example Google Translate. Artificial General Intelligence (AGI): a form of AI that can accomplish any intellectual task that a human being can do.
Continual Learning (CL) algorithms incrementally learn a predictor or representation across multiple sequentially observed tasks. Designing CL algorithms that perform reliably and avoid so-called catastrophic forgetting has proven a persistent challenge. The current paper develops a theoretical approach that explains why. In particular, we derive the computational properties which CL algorithms would have to possess in order to avoid catastrophic forgetting. Our main finding is that such optimal CL algorithms generally solve an NP-hard problem and will require perfect memory to do so. The findings are of theoretical interest, but also explain the excellent performance of CL algorithms using experience replay, episodic memory and core sets relative to regularization-based approaches.
The problem of reinforcement learning in a non-Markov environment is explored using a dynamic Bayesian network, where conditional independence assumptionsbetween random variables are compactly represented by network parameters. The parameters are learned online, and approximations areused to perform inference and to compute the optimal value function. The relative effects of inference and value function approximations onthe quality of the final policy are investigated, by learning to solve a moderately difficult driving task. The two value function approximations, linearand quadratic, were found to perform similarly, but the quadratic model was more sensitive to initialization. Both performed below thelevel of human performance on the task.
We study online combinatorial decision problems, where one must make sequential decisions in some combinatorial space without knowing in advance the cost of decisions on each trial; the goal is to minimize the total regret over some sequence of trials relative to the best fixed decision in hindsight. Such problems have been studied mostly in settings where decisions are represented by Boolean vectors and costs are linear in this representation. Here we study a general setting where costs may be linear in any suitable low-dimensional vector representation of elements of the decision space. We give a general algorithm for such problems that we call low-dimensional online mirror descent (LDOMD); the algorithm generalizes both the Component Hedge algorithm of Koolen et al. (2010), and a recent algorithm of Suehiro et al. (2012). Our study offers a unification and generalization of previous work, and emphasizes the role of the convex polytope arising from the vector representation of the decision space; while Boolean representations lead to 0-1 polytopes, more general vector representations lead to more general polytopes.
A BSTRACT We show how neural models can be used to realize piece-wise constant functions such as decision trees. Our approach builds on ReLU networks that are piece-wise linear and hence their associated gradients with respect to the inputs are locally constant. We formally establish the equivalence between the classes of locally constant networks and decision trees. Moreover, we highlight several advantageous properties of locally constant networks, including how they realize decision trees with parameter sharing across branching / leaves. Indeed, only M neurons suffice to implicitly model an oblique decision tree with 2 M leaf nodes. The neural representation also enables us to adopt many tools developed for deep networks (e.g., DropConnect (Wan et al., 2013)) while implicitly training decision trees. We demonstrate that our method outperforms alternative techniques for training oblique decision trees in the context of molecular property classification and regression tasks. 1 I NTRODUCTION Decision trees (Breiman et al., 1984) employ a series of simple decision nodes, arranged in a tree, to transparently capture how the predicted outcome is reached. Functionally, such tree-based models, including random forest (Breiman, 2001), realize piece-wise constant functions. Beyond their status as de facto interpretable models, they have also persisted as the state of the art models in some tabular (Sandulescu & Chiru, 2016) and chemical datasets (Wu et al., 2018). Deep neural models, in contrast, are highly flexible and continuous, demonstrably effective in practice, though lack transparency. We merge these two contrasting views by introducing a new family of neural models that implicitly learn and represent oblique decision trees. Prior work has attempted to generalize classic decision trees by extending coordinate-wise cuts to be weighted, linear classifications.