prediction 1
L2T-DLN: Learning to Teach with Dynamic Loss Network
Hai, Zhoyang, Pan, Liyuan, Liu, Xiabi, Liu, Zhengzheng, Yunita, Mirna
With the concept of teaching being introduced to the machine learning community, a teacher model start using dynamic loss functions to teach the training of a student model. The dynamic intends to set adaptive loss functions to different phases of student model learning. In existing works, the teacher model 1) merely determines the loss function based on the present states of the student model, i.e., disregards the experience of the teacher; 2) only utilizes the states of the student model, e.g., training iteration number and loss/accuracy from training/validation sets, while ignoring the states of the loss function. In this paper, we first formulate the loss adjustment as a temporal task by designing a teacher model with memory units, and, therefore, enables the student learning to be guided by the experience of the teacher model. Then, with a dynamic loss network, we can additionally use the states of the loss to assist the teacher learning in enhancing the interactions between the teacher and the student model. Extensive experiments demonstrate our approach can enhance student learning and improve the performance of various deep models on real-world tasks, including classification, objective detection, and semantic segmentation scenarios.
Compositional Law Parsing with Latent Random Functions
Shi, Fan, Li, Bin, Xue, Xiangyang
Human cognition has compositionality. We understand a scene by decomposing the scene into different concepts (e.g., shape and position of an object) and learning the respective laws of these concepts, which may be either natural (e.g., laws of motion) or man-made (e.g., laws of a game). The automatic parsing of these laws indicates the model's ability to understand the scene, which makes law parsing play a central role in many visual tasks. This paper proposes a deep latent variable model for Compositional LAw Parsing (CLAP), which achieves the human-like compositionality ability through an encoding-decoding architecture to represent concepts in the scene as latent variables. CLAP employs concept-specific latent random functions instantiated with Neural Processes to capture the law of concepts. Our experimental results demonstrate that CLAP outperforms the baseline methods in multiple visual tasks such as intuitive physics, abstract visual reasoning, and scene representation. The law manipulation experiments illustrate CLAP's interpretability by modifying specific latent random functions on samples. For example, CLAP learns the laws of position-changing and appearance constancy from the moving balls in a scene, making it possible to exchange laws between samples or compose existing laws into novel laws.
Completion Reasoning Emulation for the Description Logic EL+
Eberhart, Aaron, Ebrahimi, Monireh, Zhou, Lu, Shimizu, Cogan, Hitzler, Pascal
We present a new approach to integrating deep learning with knowledge-based systems that we believe shows promise. Our approach seeks to emulate reasoning structure, which can be inspected part-way through, rather than simply learning reasoner answers, which is typical in many of the black-box systems currently in use. We demonstrate that this idea is feasible by training a long short-term memory (LSTM) artificial neural network to learn EL+ reasoning patterns with two different data sets. We also show that this trained system is resistant to noise by corrupting a percentage of the test data and comparing the reasoner's and LSTM's predictions on corrupt data with correct answers.
Visual explanation for video recognition – twentybn – Medium
This post describes how temporally-sensitive saliency maps can be obtained for deep neural networks designed for video recognition. It is evident from the previous works [2, 3, 4] that saliency maps help visualize why a model produced a given prediction and can uncover artifacts in the data and point towards better model architectures. Task: Recognizing human actions in videos from our recently released dataset requires a fine-grained understanding of concepts like three-dimensional geometry, material properties, object permanence, affordance and gravity [1]. The dataset, dubbed "Something-Something", consists of 100,000 videos across 174 categories containing concepts such as dropping, picking, pushing etc. Grad-CAM or Gradient-weighted Class Activation Mapping, proposed by [4], allows us to obtain a localization map for any target class. Please refer [4] for more details.