information-theoretic objective function
Information-Theoretic Objective Functions for Lifelong Learning
Zhang, Byoung-Tak (Seoul National University)
Conventional paradigms of machine learning assume all the training data are available when learning starts. However, in lifelong learning, the examples are observed sequentially as learning unfolds, and the learner should continually explore the world and reorganize and refine the internal model or knowledge of the world. This leads to a fundamental challenge: How to balance long-term and short-term goals and how to trade-off between information gain and model complexity? These questions boil down to โwhat objective functions can best guide a lifelong learning agent?โ Here we develop a sequential Bayesian framework for lifelong learning, build a taxonomy of lifelong-learning paradigms, and examine information-theoretic objective functions for each paradigm, with an emphasis on predictive and active learning. The objective functions can provide theoretical criteria for designing algorithms and determining effective strategies for selective sampling, representation discovery, knowledge transfer, and continual update over a lifetime of experience.
Design of Experiments via Information Theory
We discuss an idea for collecting data in a relatively efficient manner. Our point of view is Bayesian and information-theoretic: on any given trial, we want to adaptively choose the input in such a way that the mutual information between the (unknown) state of the system and the (stochastic) output is maximal, given any prior information (including data collected on any previous trials). We prove a theorem that quantifies the effectiveness of this strategy and give a few illustrative examples comparing the performance of this adaptive technique to that of the more usual nonadaptive experimental design. For example, we are able to explicitly calculate the asymptotic relative efficiency of the "staircase method" widely employed in psychophysics research, and to demonstrate the dependence of this efficiency on the form of the "psychometric function" underlying the output responses.
Design of Experiments via Information Theory
We discuss an idea for collecting data in a relatively efficient manner. Our point of view is Bayesian and information-theoretic: on any given trial, we want to adaptively choose the input in such a way that the mutual information between the (unknown) state of the system and the (stochastic) output is maximal, given any prior information (including data collected on any previous trials). We prove a theorem that quantifies the effectiveness of this strategy and give a few illustrative examples comparing the performance of this adaptive technique to that of the more usual nonadaptive experimental design. For example, we are able to explicitly calculate the asymptotic relative efficiency of the "staircase method" widely employed in psychophysics research, and to demonstrate the dependence of this efficiency on the form of the "psychometric function" underlying the output responses.
Design of Experiments via Information Theory
We discuss an idea for collecting data in a relatively efficient manner. Our point of view is Bayesian and information-theoretic: on any given trial, we want to adaptively choose the input in such a way that the mutual information betweenthe (unknown) state of the system and the (stochastic) output is maximal, given any prior information (including data collected on any previous trials). We prove a theorem that quantifies the effectiveness ofthis strategy and give a few illustrative examples comparing the performance of this adaptive technique to that of the more usual nonadaptive experimentaldesign. For example, we are able to explicitly calculate the asymptotic relative efficiency of the "staircase method" widely employed inpsychophysics research, and to demonstrate the dependence of this efficiency on the form of the "psychometric function" underlying the output responses.